Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Kedro with existing project? #410

Closed
ericmjl opened this issue Jun 11, 2020 · 10 comments
Closed

Using Kedro with existing project? #410

ericmjl opened this issue Jun 11, 2020 · 10 comments
Assignees
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation TD: implementation Tech Design topic on implementation of the issue Type: Parent Issue

Comments

@ericmjl
Copy link

ericmjl commented Jun 11, 2020

What are you trying to do?

I'm wondering if it's possible to use Kedro with an existing project that was not created with kedro new? For example, I already have a project that has a data/ directory, a src/ directory and more, and I'd like to start by only using the Pipelining capabilities.

I've been unsuccessful on my first two attempts; I installed Kedro into the project conda environment, but the only commands available to me are docs, info, and new.

@idanov
Copy link
Member

idanov commented Jun 18, 2020

@ericmjl It is very likely that most of Kedro's functionality will be available to you even with a custom project folder structure, however you might need to work a bit harder to get it to that point. The main way for Kedro to work is by using the load_context which has a base set of assumptions about your folder structure and all else should be free for modification.

To learn more about how Kedro's internal architecture and equip yourself with the skills needed to build your own framework on top of Kedro or simply use a new folder structure, this page in the documentation can give you a high-level overview and hopefully clarify some bits.

From your experience, I can see that it is possible that your project doesn't have a .kedro.yml file and thus the Kedro CLI cannot recognise your project as a Kedro project. You can find how you need to structure your .kedro.yml file here.

@limdauto
Copy link
Contributor

limdauto commented Jul 7, 2020

Hi @ericmjl I'm closing this issue now as it's been a while. But please feel free to reopen it if you still need help.

@limdauto limdauto closed this as completed Jul 7, 2020
@fmfreeze
Copy link

Hi to all on an old issue.
@idanov your links on more infos are not working anymore.

Is there an example or docs out there on how to integrate kedro into an existing project which I failed to find?
Thanks for any responses.

@astrojuanlu
Copy link
Member

@fmfreeze have a look at https://kedro.readthedocs.io/en/stable/faq/architecture_overview.html. The second link does not apply anymore, .kedro.yml was gone in d8e6ace.

You might want to explore how to use some of the Kedro components, for example Kedro as a data registry.

In any case, "how to use Kedro with an existing project" was the very first question I had when I started using the project and I'd like to see us documenting it better. @idanov @stichbury @yetudada what do you think about reopening this issue?

@merelcht merelcht reopened this Feb 13, 2023
@merelcht merelcht added the Component: Documentation 📄 Issue/PR for markdown and API documentation label Feb 13, 2023
@yetudada
Copy link
Contributor

Hey everyone! We're going to start looking at this in two ways, how do I use Kedro when I have:

  • A project that has a data/ directory, a src/ directory and more already e.g. what is the most minimal things I could add to my existing work to turn into a Kedro project
  • A Jupyter notebook that I want to turn into a Kedro project

@stichbury
Copy link
Contributor

I'm looking at this as it's the oldest docs ticket we have outstanding.

It seems to be addressed by #2512 which suggests adding a kedro init command (plus documentation) but in the absence of that work, there's still an opportunity to write about how to convert a basic project to Kedro. So I'm keeping this ticket alive and bumping its priority.

There's also #2461 to convert a notebook into a project. I didn't find the draft addressed the reader and suggested a new approach and ticket (kedro-org/kedro-devrel#80) so closed #2461 and things have stalled now. It's still important but not part of this issue, so I'll follow that rabbit down a separate hole.

@astrojuanlu
Copy link
Member

If we were to write these docs now, the story would be something like "do kedro new somewhere and copy-paste your files over". It's worth pondering if that's better than nothing.

I'm leaving a comment on gh-2512.

@astrojuanlu
Copy link
Member

We have identified this issue as a high priority work stream.

At the moment, our current advice for users who want to convert an existing project to Kedro is "run kedro new and copy-paste the files over". This is less than ideal, is perceived as too complex, and possibly constitutes a barrier of adoption.

Today in a training, after I showed how to do it, the consultant leading the engagement said "I didn’t know it was that involved [...] slightly underestimated the effort associated with setting up a Kedro pipeline".

More evidence of users trying to convert an existing project to Kedro: https://www.linen.dev/s/kedro/t/13221641/hi-everyone-i-wanted-first-to-thank-all-the-participants-of-#29e7e696-5078-4ccc-a28f-907a3f132696

Related issues:

and possibly more.

Tangentially related: #2818

@astrojuanlu
Copy link
Member

We now have

This is a work in progress that will require some API changes and end with documentation improvements. I will wait a bit to close this issue.

@stichbury
Copy link
Contributor

+1 to closing this and thanks for the useful summary 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation TD: implementation Tech Design topic on implementation of the issue Type: Parent Issue
Projects
None yet
Development

No branches or pull requests

9 participants