Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable adding new attributes to datasets #2440

Closed
merelcht opened this issue Mar 20, 2023 · 4 comments · Fixed by kedro-org/kedro-plugins#189
Closed

Enable adding new attributes to datasets #2440

merelcht opened this issue Mar 20, 2023 · 4 comments · Fixed by kedro-org/kedro-plugins#189
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@merelcht
Copy link
Member

merelcht commented Mar 20, 2023

Description

Implement the feature to allow users to add new attributes to datasets. Use the syntax decided in #2439

Context

Sub-task of #1076

To be done after: Decide on syntax to allow adding new attributes#2439

Open question

Where should the metadata (or other name) attribute go?

  • On the individual datasets
  • On the AbstractDataSet
  • On the DataCatalog like layers?
@merelcht merelcht added the Issue: Feature Request New feature or improvement to existing feature label Mar 20, 2023
@merelcht merelcht moved this to To Do in Kedro Framework Mar 20, 2023
@merelcht
Copy link
Member Author

I think the handling of the metadata attribute should happen on the individual datasets. Any processing of the info in the metadata will be done by external plugins and not the Kedro framework itself.

The only complication here is that layers are processed inside the DataCatalog. It used to be on the individual datasets as well, but was moved to the catalog later on: https://github.com/quantumblacklabs/private-kedro/pull/548/files

If layers needs to go under metadata like this:

my_dataset:
       ....
       metadata:
            kedro-viz:
                 layer: raw

We'll need to ensure they're still processed correctly and also make sure the implementation is backwards compatible with having layers outside the metadata key.

@AhdraMeraliQB AhdraMeraliQB self-assigned this Mar 28, 2023
@AhdraMeraliQB AhdraMeraliQB moved this from To Do to In Progress in Kedro Framework Mar 30, 2023
@AhdraMeraliQB
Copy link
Contributor

After discussion with @merelcht and @AntonyMilneQB the implementation should be as follows:

  • metadata, being a top-level key, will be introduced as an attribute on the individual datasets
  • The layer attribute will move to within metadata -> kedro-viz and will no longer be handled by Kedro's DataCatalog - reversing the decision made as part of https://github.com/quantumblacklabs/private-kedro/pull/548. As a result the logic for resolving this will need to move to kedro-viz (as currently they are the consumers of layer, not Kedro itself)
  • Dropping layer as a top-level key will be a breaking change forcing the implementation to happen in two stages: the first handling the nested layer on Viz, adding deprecation warnings to the top-level layer (non-breaking), and the second being the removal of the top-level layer entirely

These require further subtasks that I will define in separate issues, but I would like to get @idanov's and @rashidakanchwala's opinions here first.

The order of tasks would look like this:

  1. Add metadata attribute to datasets
  2. Add logic to Viz to check for layer from metadata->kedro-viz->layer before catalog.layer & move over this check
  3. Add deprecation warnings to the top-level layer attribute on Kedro framework
  • should the deprecation warnings be triggered just whenever a top-level layer is defined, or upon both definition and access?
  1. Remove the top-level layer attribute (breaking) and the logic introduced to Viz in step 2

Questions to consider:

  1. Why was layer moved to the DataCatlog originally (in the ticket above)? Is there something that hasn't been addressed?
  2. Are there any oversights on moving the handling of the layer attribute to Viz?
  3. Are there any other users of the top-levellayer attribute?

An alternative approach:

Alternatively, we could just keep layer where it is and not process it if defined within the metadata. This doesn't address the DataCatalog processing an attribute that isn't used within Kedro, but allows for the introduction of the metadata attribute to be made without any additional implementation on the Kedro framework.

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Apr 17, 2023

Questions to consider:
- Why was layer moved to the DataCatlog originally (in the ticket above)? Is there something that hasn't been addressed?
This is just my assumption but layers have been on Kedro-viz since 2020 and prior to this there was no other extra information that was passed in catalog from kedro project to kedro-viz, that's why probably there was no need to have it nested? - not a great idea for sure.

Are there any oversights on moving the handling of the layer attribute to Viz?
Let me revert on this.

Are there any other users of the top-level layer attribute?
I doubt, it seems very specific to Kedro-viz.

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Apr 17, 2023

Currently the layer logic resides in Kedro Framework in DataCatalog (https://github.com/kedro-org/kedro/blob/main/kedro/io/data_catalog.py#L270-L275). Kedro basically sends to kedro-viz - all the layers as a dict in the format below

dict_items([('raw', {'companies', 'shuttles', 'reviews'}), ('intermediate', {'ingestion.int_typed_companies', 'ingestion.int_typed_shuttles@pandas2', 'ingestion.int_typed_reviews', 'ingestion.int_typed_shuttles@pandas1'}), ('primary', {'prm_spine_table', 'prm_shuttle_company_reviews'}) .... )])

On kedro-viz we simply read the above, and match the dataset name to the key it belongs to. (https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/data_access/repositories/catalog.py#L40-L47)

I suppose now this will change so Kedro will only send a dict called metadata to Kedro-viz. And Kedro-viz will extract all the layer information and map it correctly.

This is fine but I am not sure how to make it backward compatible in a clean way @merelcht @AntonyMilneQB ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants