-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easier to use the Config Loader #2819
Comments
The workflow that @astrojuanlu referenced that I did. What’s happening?If I wanted to use Kedro as a library - and specifically use our ConfigLoader and DataCatalog. The two features that users want the most, then this does not work: my-project
├── my-notebook.ipynb
├── Customer-Churn-Records.csv
├── parameters.yml
├── catalog.yml
└── requirements.txt What do you need to do?We have some assumptions built into the OmegaConfigLoader, it assumes that you have:
All of this assumes that you are familiar with Kedro’s project template - how else would you know about this folder structure? Your project has to look like this: my-project
|-- conf/
| |-- base/
| |-- catalog.yml
| |-- parameters.yml
| |-- local/
├── my-notebook.ipynb
├── Customer-Churn-Records.csv
└── requirements.txt And then you need to have the following:
from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog
# Load configuration for the catalog and parameters
# OmegaConfigLoader assumes you have directories for conf, conf/base and conf/local
# Users need to know that they need to put the catalog.yml in conf/base
conf_loader = OmegaConfigLoader(conf_source= "conf")
conf_catalog = conf_loader.get("catalog")
catalog = DataCatalog.from_config(conf_catalog)
customer_data = catalog.load("customers") What errors did I run into before I got this working?I first tried to put
|
The issue described here refers to the config loader and not the data catalog itself. The data catalog. You can create a catalog object directly if you have a catalog dict object (mapping a dataset name to its dataset class config) and a credentials config. This means that you don’t need a config loader and its complex file structure to work with a data catalog. What still bothers me though is the fact that you need that credentials config file. Ideally we can do something with that to abstract away the credentials part… |
Good point @MatthiasRoels
You can instantiate a |
I know you can use an empty credentials file, but is would be nicer to inject credentials in a different way though… |
Are you refering this specific case using standalone or in general? There are ways to inject credentials without |
I did the same exercise as @yetudada recently. In a notebook case, using I try to argue that #2861 is one way of making these components more useful in a standalone notebook. |
I was referring to this specific use case for a standalone catalog. I know there are other ways to inject credentials in a regular kedro project. But if we can make the UX nicer for a standalone, we might be able to leverage the same functionality in a kedro project too, making that part easier. I don’t have an immediate solution in mind for this, but more than happy to brainstorm about it! |
Currently if you try to create kedro/kedro/framework/context/context.py Lines 255 to 274 in 6773529
|
(comments for myself to refer in the future) config_loader = ConfigLoader("../conf")
conf_catalog = config_loader["catalog"]
catalog = DataCatalog.from_config(catalog_conf)
catalog.load("companies") # fail - No such file or directory: 'data/01_raw/companies.csv' This will fail - because you need to use from kedro.framework.context.context import _convert_paths_to_absolute_posix
config_loader = ConfigLoader("../conf")
conf_catalog = config_loader["catalog"]
conf_catalog = _convert_paths_to_absolute_posix(
project_path=Path("../").resolve(), conf_dictionary=conf_catalog
)
catalog = DataCatalog.from_config(conf_catalog)
catalog.load("companies") This will pass |
I see what you mean @noklam, but shouldn’t we address this to make the catalog more self-contained? This should be an easy change (moving that |
@MatthiasRoels Hey! I wrote this comments mainly for future reference. I am not working on this issue at the moment. Rest assure, we are not settled with this ugly workaround - I don't know yet what's the final solution for this yet. |
With all the info from this issue, do we feel we are already in a position to shape it further in another issue or even a PR? |
Tangentially related: a |
Reassigned this issue to the milestone on Using Kedro with existing projects, this is a solution to that. |
Tangentially related: #2512 |
In our internal discussions it was noted by @yetudada that, despite the changes proposed in #2971, it is still cumbersome to instantieate the catalog_conf = OmegaConfigLoader(conf_source=".")["catalog"] Compared to catalog_conf = OmegaConf.load("catalog.yml") These two examples are not 100 % equivalent, but that's the point - we still need a simpler way to onboard users to the assumptions we make in our config loader. It was proposed to add a convenience static method: catalog_conf = OmegaConfigLoader.load_file("catalog.yml") Which then nicely complements the use case explored in #2967: catalog = DataCatalog.from_config(OmegaConfigLoader.load_file("catalog.yml")) |
Still not very convinced by this, I think adding arbitrary helper methods to the API of classes is definitely a direction which will make things inconsistent and confusing over the long run. I highly doubt that changing the API to provide an odd catalog = DataCatalog.from_config(OmegaConfigLoader.load_file("catalog.yml")) vs catalog = DataCatalog.from_config(OmegaConfigLoader(conf_source=".").get("catalog")) Let's not forget that in the current Kedro design, we aim to make no distinction between file-based configurations and configuration taken from a remote system, e.g. a database or an API. What would the Could we make a step back here and explore alternative roads for solving this problem? One way to go forward is to provide a micro-framework approach as taken by https://github.com/ellwise/kedro-light. Here a full micro-framework approach is taken, where each of the workflow steps is considered and eventually re-designed to fit a more notebook friendly workflow, instead of relying on the session and the context. This requires no API changes to the individual components, but rather creating a micro-framework module with helper methods. |
Quick comment, OmegaConfigLoader does not follow ConfigLoader get() spec #3092 I'll address the rest later if I can (others feel free to chime in) |
@astrojuanlu |
I'm going to mention my thoughts here and why I raised more questions. So during our recent showcase, we showed that if I had the following structure:
Then I'd be able to load my local dataset,
And this prompted a few questions:
So then I looked at OmegaConf and Intake as sources of inspiration, I'm sure they're not the only places we can look and then had more questions:
|
tl;dr: No I don't think a
That's only half correct as far as I understand? Yes, kedro/kedro/config/omegaconf_config.py Lines 126 to 137 in bbed0f1
But it always relies on directory abstractions, whether local or remote, and definitely not "database or an API". Or at least I don't see how one is supposed to extract config in Kedro from a database or an API, and we have zero examples on our docs on how to do such a thing. I'd be happy to be proven wrong.
For continuity, allowing
First, "odd" is subjective (let's agree to disagree). Second, I think you are underestimating the cognitive load needed to process OmegaConfigLoader(
conf_source="." # *sigh* okay
).get("catalog") # what? for beginners that just learned about Kedro and just want to load The price to pay is maintaining this: diff --git a/kedro/config/omegaconf_config.py b/kedro/config/omegaconf_config.py
index 5d07cb64..11f8a4b0 100644
--- a/kedro/config/omegaconf_config.py
+++ b/kedro/config/omegaconf_config.py
@@ -73,6 +73,17 @@ class OmegaConfigLoader(AbstractConfigLoader):
"""
+ @classmethod
+ def load_file(cls, path: str):
+ path_obj = Path(path)
+ config_loader = cls(
+ conf_source=path_obj.parent,
+ config_patterns={path_obj.stem: [path_obj.name]},
+ base_env="",
+ default_run_env="",
+ )
+ return config_loader[path_obj.stem]
+
def __init__( # noqa: too-many-arguments
self,
conf_source: str, Sample code:
This sounds to me like saying "Kedro is too complicated, so I'm going to show you something else (that you will have to install separately) so that you don't have to deal with those complications". I think this is the wrong message to send, and seems to go against Kedro Principle 2 "2. Grow beginners into experts 🌱". If anything, we should make Kedro easier to use without resorting to extra layers that add more indirection. Just from a practical standpoint, if we decide to go down this path, we'd have to design it (months), build it (weeks), change our documentation (weeks), let alone maintain it (forever), and fix bugs for it (forever). I don't think we can afford the luxury of spending time and resources on this. This is a big discussion for a small issue that deserves a small solution. Footnotes
|
The parent issue is #3029 |
I totally agree with @idanov both affirmations here:
I don't feel the first one is significantly self explanatory to have user writing it straight out of the box without looking at the documentation and I feel it is quite bad to introduce inconsistency. That said, it is clear that the whole problem boils down to the user experience: it is not intuitive to know how to load a file from the config loader.
I think we can break this in several steps:
I remember someone made the suggestion (but I can't find the relevant issue) to :
# very dirty example
class OmegaConfigLoader():
...
def load(filename):
conf_key=filename.split(".yml")[0] # optional, we can also jsut alias get without enabling ``.yml`` extension to reduce confusion
return self.get(conf_key) hence this will work : I dont really like this solution however, because it goes against the zen of Python
|
Here's evidence of a user who just tried to do
Yes, that's being tracked in #2971 and there's overall agreement to move forward with the proposal, although only for
Thanks for sharing, good insight. I agree the I'm going back to square one: what other ideas do we have that (1) address the well documented fact that users want to load individual config files in an easy way (2) by making a small incremental addition to what we already have in a way that we can fit it for 0.19? |
Maybe I'm missing something, but In the external user session we briefly touched on differentiating between loading and resolving configuration, which I think is the path we should pursue. |
Easy, the interface has zero assumptions about folders, files or anything like that https://github.com/kedro-org/kedro/blob/main/kedro/config/abstract_config.py QED 😂 Jokes aside, implementing a
I didn't mean to install it separately, but rather include a module in Kedro which will give you a set of lightweight functions and use that plugin as an inspiration. It's just a wild idea, which is worth exploring imho.
I think this is coming in for 0.19, but it's not a "non-breaking" one, because any scripts relying on different defaults will break. But that's not a problem, because 0.19 is just around the corner. As for your point 2., there's a pretty extensive documentation when using the class and it should popup easily for you. Maybe we need to move it to the constructor (if not easy to see)? https://github.com/kedro-org/kedro/blob/main/kedro/config/omegaconf_config.py#L26-L77 The goal of this move was to make the rest of Kedro unattached to a particular implementation, because essentially we need a dictionary with different keys. This makes it super easy to plugin developers, because your only interface is a key in a dictionary, regardless where the content for that key comes from. If we go back to the original point for the long discussion here, when people only want to use one file, why don't they just use the following then? from omegaconf import OmegaConf
catalog = DataCatalog.from_config(OmegaConf.load('catalog.yaml')) The only point of config loaders in Kedro is to simplify situations where:
When you don't need any of that, you shouldn't use Kedro's config loader. |
Outcome of the conversation we had @merelcht , @idanov and @yetudada :
|
Decision turned into #3247, closing this ticket. |
We have evidence from users that want to use specific parts of Kedro without having to use the rest of the Framework.
For example, #2741
And #2409
On the other hand, @yetudada tried to use the Kedro Config Loader on an existing project with a flat structure and had a frustrating experience because of the amount of boilerplate needed (with #2593 being only the last straw). This is something I've experienced myself in my trainings:
It's a lot to type, and I sense that most users are probably impressed with my live demo skills but desperate at the number of concepts they have to learn to just use a Kedro component.
This is closely related to #2818, which is concerned with documentation. This issue is about exploring the necessary code changes for this to happen. The key frame of reference is:
In particular:
The text was updated successfully, but these errors were encountered: