-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running a pipeline mutates the original catalog object #4340
Comments
@ElenaKhaustova would this still be the case in the new KedroDataCatalog object? |
Yes, this will be the case for the new catalog as well. Related to #4235 There was a suggestion to remove patterns after run: 5813d01 But we might need to think on something better to address this. |
Hi @FlorianGD thanks for flagging this. Looks like this is a regression, because we added exactly that functionality earlier in the year with #3475. I'll add this to the backlog to be fixed. |
Apologies! I forgot to try this with the latest Kedro |
This issue has been closed due to lack of information. Feel free to re-open this issue if you're facing a similar problem. Please provide as much information as possible so we can help resolve your issue. |
This issue has been closed due to lack of information. Feel free to re-open this issue if you're facing a similar problem. Please provide as much information as possible so we can help resolve your issue. |
According to #3995 (comment), this will remain in the new catalog, for now. Re-labeling it as a bug. |
Description
When a pipeline is run, a
shallow_copy
of a catalog is made with theextra_dataset_pattern
of the runner, and the original object is mutated. It adds anextra_dataset_patterns
that is a "catch all". This prevents for example running a pipeline twice with the same catalog, as the second one will not have any free outputs.Context
I have a pipeline that exists with 2 namespaces. I wanted to test it for both, and I parametrized the tests. The tests used the same catalog as a fixture. This fixture had the
module
scope so as not to re create it every time. When updating to kedro 0.19, the test fails for the second parameter, when my test pipeline does not have any free output. After investigating, I found that the pipeline does not have any free output because the second time the catalog hascatalog._extra_dataset_patterns={'{default}': {'type': 'MemoryDataset'}}
which matches all the datasets.Steps to Reproduce
To reproduce the symptoms (running the pipeline twice does not have any free output the second time):
I think the problem comes from this behavior:
Expected Result
I want that
bar
was kept as a free output when running the pipeline the second timeActual Result
The second time the pipeline runs, the output is an empty dictionary.
Your Environment
pip show kedro
orkedro -V
): v0.19.9python -V
): Python 3.11.8The text was updated successfully, but these errors were encountered: