-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maybe add a DataCatalog.from_file
method
#2967
Comments
This is bloody brilliant and I'm annoyed we didn't think of it before |
I'd say it's fine for a "syntactic sugar" method to introduce some coupling - in this case with the
|
What about: |
Maybe even use simple |
I think a simple single file mode defeats the object of this |
I don't think it's a good idea to couple the catalog with the loader. The tradeoff is not great - do bad engineering only to save users from typing or reading 1 extra line of code (which usually occurs only once ever in your notebook).
https://en.wikipedia.org/wiki/Coupling_(computer_programming)#Disadvantages_of_tight_coupling |
As a user I will say that being able to directly instantiate a catalog from a single config file, skipping the environment replacing and stuff would be a very cool way to minimize the entry barrier to start using the catalog, as defining datasets in a single yaml is a much lower-friction action than having to setup a It would then be trivial to add some jupyter line magic which will instantiate a catalog in the notebook if there's a "catalog.yml" file in the same directory or something along those lines. You do run into the credentials issue though, as those wouldn't be injected, but as a one-off analysis a lot of the time I'll just hardcode whatever credentials at the top (we don't usually use tokens when coding interactively, as all our data is hosted on azure which allows for interactive AAD Authentication) |
Using Arguably, this |
I contend that this could be useful enough for simple cases, and we could tell our users "if you want to use As an alternative syntactic sugar that (1) doesn't introduce coupling between DataCatalog and OmegaConfigLoader and (2) retains functionality beyond plain YAML: from kedro.util import catalog_from_file
catalog = catalog_from_file("catalog.yml") (exact location and name of this function to be defined) Slightly annoying that this requires an extra import and that the |
Spotted today in the Intake docs: from intake import open_catalog
cat = open_catalog('catalog.yaml') (granted, they don't support advanced features) |
Internal users seemed to love this API, but from our internal discussions, it has become apparent that we cannot do this without either (a) introducing coupling between the For cases in which someone wants to teach or explain Kedro concepts (and this will be ourselves when we have better "how to use Kedro from Python" docs, see #2855 and #410), it was noted that one can instantiate the catalog = DataCatalog.from_config({
"reviews": {
"type": "pandas.CSVDataset",
"filepath": "data/01_raw/reviews.csv",
}
}) Let's close this issue. |
Description
Add a convenience method that creates a catalog from a single file
From
to
Context
Originally posted by @astrojuanlu in #2819 (comment)
The text was updated successfully, but these errors were encountered: