Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drastically simplify the default project template #2149

Closed
idanov opened this issue Dec 20, 2022 · 6 comments
Closed

Drastically simplify the default project template #2149

idanov opened this issue Dec 20, 2022 · 6 comments
Labels
Issue: Feature Request New feature or improvement to existing feature Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Type: Parent Issue

Comments

@idanov
Copy link
Member

idanov commented Dec 20, 2022

Description & Context

Our users have the perception that Kedro is a complicated and needs a lot of boilerplate. This perception is to a large extend fuelled by the overly complete template we generate by default. While having the complete template is still useful to some users, many find it fairly complicated.

We should look into drastically simplifying the current default template and move the current one to the list of officially supported starters under the name of legacy-default-starter, full-template, full-kedro-project or something like that. Having a smaller and simpler initial template will significantly reduce the cognitive overload to new users of Kedro and thus ease the onboarding experience.

Possible Implementation

Currently our template includes the following folders and files:

  • conf/
  • data/
  • docs/
  • logs/
  • notebooks/
  • src/
  • .gitignore
  • pyproject.toml
  • README.md
  • setup.cfg

Most of those folders are not needed and can be safely removed, namely:

  • all data/ subfolders (and even the data/ folder itself)
  • docs/
  • logs/
  • notebooks/

Moreover, we should move away from using setup.py and make all Kedro starters use only pyproject.toml and setup.cfg, since this is how the Python ecosystem is evolving.

We should also revise the README.md and make sure it has up-to-date quick-start instructions on how to use Kedro and where to write your project code, how to load data and how to deploy it.

Possible Alternatives

Create a minimal starter and recommend it always in all trainings over the full template.

@idanov idanov added the Issue: Feature Request New feature or improvement to existing feature label Dec 22, 2022
@antonymilne
Copy link
Contributor

antonymilne commented Jan 13, 2023

Generally agree with this, but just a couple of notes on removing the logs folder. I would love to do this but previous user research indicated that people used the file-based logs (or at least info.log): #1472. The easiest immediate solution here if we want to simplify the template would be to just log to project root rather than the logs folder. Maybe in 0.19 we could change to disable file logging by default and thereby remove the project-side logging.yml altogether, which would be another nice simplification to the project template.

For completeness, here's all my thoughts on it in one place. From #1461

I think the problems with file-based logging are:

  • practical: it crashes on read-only file systems like databricks repos. I don't think it's possible to just catch and ignore these errors (but we should check that) [this is now fine because it can be disabled in project-side logging.yml]
  • philosophical: a 12 factor app shouldn't have responsibility for logging (although in practice maybe it's best to make user lives easier by logging to a file)
    ...

The outcome of #1472, which showed that people do use the log files, unfortunately makes some of the simplification that was planned here difficult:

  1. We should maintain the current behaviour in which we log to files.
    ...
  2. Hence we should define the file-based handlers in the project-side logging.yml. The downside of this is that the default template cannot immediately be simplified to remove the project-side logging.yml or the logs directory. Users who need to disable file-based logging will need to modify their project-side logging.yml as in point 2.

One advantage of this approach is that it is quite minimal compared to how things currently stand. Hence future changes that could simplify the project template here are still possible, e.g. to direct logs to project root info.log/errors.log instead of the logs folder and remove the logs directory.
...

@jmholzer jmholzer added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Jan 30, 2023
@sbrugman
Copy link
Contributor

Hope this is the right issue to place this.

The current project configuration is a needlessly complicating our monorepo setup.

Looking at the default template:
https://github.com/kedro-org/kedro/tree/main/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D

The setup.cfg, src/setup.py and src/requirements.txt should be replaced with pyproject.toml. pip/setuptools support pyproject.toml out of the box. Would love to have this by default in new projects.

@astrojuanlu
Copy link
Member

xref #2152

@merelcht
Copy link
Member

merelcht commented Mar 1, 2023

We discussed this issue in technical design on the 1st of March:

The action point that came out of the discussion:

Longer summary of what was said:
The proposal to remove all directories was met with various opinions. Some points discussed:

  • Can we even remove the logs/ folder? This was pending user research, but the verdict is it can be removed as confirmed by @AntonyMilneQB and @amandakys
  • If we remove all these directories, should we then remove them from the starters as well? No, the starters are a good way of showing how you could potentially order your data, notebooks etc. So the staters should stay as they are.
  • We could potentially keep the data/ folder, but just remove the sub-directories. Or even for all directories we could add a README.md which explains the directory can be removed or how else it could be used.
  • Would removing these directories actually make future steps e.g. packaging a project harder, because users might then have added files in wrong places e.g. notebooks in src/ because there wasn't any other obvious place

A large part of the discussion also revolved around the Kedro value proposition:

  • Kedro being opinionated is an important part of the value proposition, and some people have voiced confusion when some of the commands we deprecated using a data driven approach actually went away
  • Removing parts of the template made several people wonder if that also reduces the value of Kedro as an opinionated framework that uses SWE best practices and guides users in doing the same
  • If we do reduce the template we need to update our recommendation and messaging so it's consistent with this slimmed down version of Kedro.
  • @amandakys rightfully said that this topic of opt-in vs opt-out comes up in different forms and shapes from time to time and that we should take a step back and try to have a more coherent vision to avoid having such discussions

@astrojuanlu
Copy link
Member

Going back to a simplification that doesn't touch the essence of kedro: we could merge pyproject.toml, setup.cfg, and setup.py into one file #2280 (comment)

@yetudada
Copy link
Contributor

yetudada commented Sep 4, 2023

I'm going to close this issue because it's been replaced with tickets that we're working on 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Type: Parent Issue
Projects
Archived in project
Development

No branches or pull requests

7 participants