-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Option to Include Configuration (conf
) in the Kedro Package for Source Code?
#4316
Comments
So we've resisted this for years since it violates the 12factor app the addition of I'd be fully in favour of adding an explicit flag to package conf... but throw a warning on why we believe it to be bad practice and not fit for production. |
Hmm.. Yes, agreed. I think this might actually be the best solution to the problem? It would resolve the issues I am facing in a distributed environment as well. I am happy to open a PR for both, but I just wanted to confirm if we won't to add an option to include the configuration in the package? |
@MinuraPunchihewa, Adding remote cloud options to Thanks 💯 |
Thanks, @ankatiyar. I was not aware that the Yes, sure. I will open a PR for passing in remote configurations. Got it. Let me know what you decide. |
|
@MinuraPunchihewa a possible reference implementation can be found here where we do something similar for micropackaging (even if that feature is currently deprecated!) kedro/kedro/framework/cli/micropkg.py Line 421 in 075d59b
|
A few thoughts aside from #3982: (1) About bundling configuration with the source code, I know that some users do this already. Managing the two separately is cumbersome. For example, deploying on Airflow (and I bet in most other platforms) requires an extra step because of this https://docs.kedro.org/en/stable/deployment/airflow.html (2) Some users just don't see the point of (3) We've long known that not all our configuration is created equal, see this discussion by @Galileo-Galilei #770 In particular, dataset types are intimately coupled with business logic (because they define the in-memory representation of the data) and arguably they are not "in the same league" as, say, model parameters. (4) I think our interpretation of the 12factor app is quite idiosyncratic. From the text, it reads
But the subtitle of section III is
And further down it says
The reluctance of Kedro against environment variables is well known, we almost only allow it for credentials by default #2623 (and that's not even enough for many use cases, see #1621, long discussion at #3811). This is just a quick summary but long story short I think our approach to packaging and bundling needs a refresh. |
I'd add the word pragmatic somewhere in there, but I agree wholeheartedly |
I will add more thoughts on what I think we should do regarding this topic in #770 as promised to @astrojuanlu. That said (and as @noklam said), notice that it is already very easy to implement in
# settings.py
from pathlib import Path
CONFIG_LOADER_ARGS = {
"base_env": (Path(__file__).parents[1] / "conf_app").as_posix(),
"default_run_env": "local",
} Good news is, this does work even if you override partially You can get much more detail and explanations in this demo repository. PS: I haven't tried with |
Description
At the moment, when packaging a Kedro project using
kedro package
, two artifacts are created; a .whl file containing the source code and atar.gz
file consisting of the configuration inconf
directory.I understand that it is (in general) a best practice, however, from a user's point of view, these are two artifacts to be maintained. In my CI/CD pipelines as well as in the mechanism for running the code, I need to account for them, which makes the process a little more complicated.
Moreover, when executing the code in a distributed environment like Databricks, performing file system operations such as unzipping the
tar.gz
file is not exactly straightforward. For instance, the file paths that would be used when dealing with an interactive cluster and a job cluster change slightly.It would be great if it would be possible to include the conf folder as part of the
.whl
file (at the user's request). This would make the maintenance, installation and usage of the artifacts easier.Context
As I mentioned, this could go a long way in improving the usage of package Kedro projects, especially when running in distributed computing environments.
Possible Implementation
I suggest adding a flag to the
kedro package
command to allow users to choose whether to include the conf folder in the.whl
file. The default mechanism can be to avoid doing this.Possible Alternatives
A possible alternative might be to improve the documentation to explain how Kedro projects can be packaged and run in different systems (with examples), however, this might not be very extensible given the large number of different options that are available for running pipelines.
The text was updated successfully, but these errors were encountered: