Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hook example to access metadata #2998

Merged
merged 8 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions docs/source/hooks/common_use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,3 +200,22 @@ HOOKS = (AzureSecretsHook(),)
```{note}
Note: `DefaultAzureCredential()` is Azure's recommended approach to authorise access to data in your storage accounts. For more information, consult the [documentation about how to authenticate to Azure and authorize access to blob data](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python).
```

## Use Hook to read `metadata` from `DataCatalog`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Use Hook to read `metadata` from `DataCatalog`
## Use a Hook to read `metadata` from `DataCatalog`

We recommend to use the `after_catalog_created` hook if you need to access `metadata` to extend Kedro.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We recommend to use the `after_catalog_created` hook if you need to access `metadata` to extend Kedro.
Use the `after_catalog_created` Hook to access `metadata` to extend Kedro.


```python
class MetadataHook:
@hook_impl
def after_catalog_created(
self,
catalog: DataCatalog,
conf_catalog: Dict[str, Any],
conf_creds: Dict[str, Any],
feed_dict: Dict[str, Any],
save_version: str,
load_versions: Dict[str, str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these parameters are not used right? Maybe we can remove them from the function. Pluggy allows this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. We do this inconsistently, sometimes we do include signature that wasn't used (I was asked to add that before, can't find it exactly which PR it is).

For example, after_node_run has a redundant node argument.
https://docs.kedro.org/en/stable/hooks/examples.html#add-observability-to-your-pipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stichbury do you think we should keep all our example minimals? We can do a clean up if this is agreed.

In this case

   def after_catalog_created(
        self,
        catalog: DataCatalog,
        conf_catalog: Dict[str, Any],
        conf_creds: Dict[str, Any],
        feed_dict: Dict[str, Any],
        save_version: str,
        load_versions: Dict[str, str])

or

   def after_catalog_created(
        self,
        catalog: DataCatalog)

will both work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to remove them too and keep it minimal.

):
for dataset_name, dataset in catalog.datasets.__dict__.items():
print(f"{dataset_name} metadata: \n {str(dataset.metadata)}")
```
2 changes: 2 additions & 0 deletions kedro/framework/hooks/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ def _register_hooks(hook_manager: PluginManager, hooks: Iterable[Any]) -> None:
if not hook_manager.is_registered(hooks_collection):
hook_manager.register(hooks_collection)

hook_manager.check_pending() # Validate hook_impl respect hook_spec


def _register_hooks_setuptools(
hook_manager: PluginManager, disabled_plugins: Iterable[str]
Expand Down