Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dict unpacking with "**params:foo.bar" in node(inputs=) #2749

Open
m-gris opened this issue Jun 30, 2023 · 17 comments
Open

dict unpacking with "**params:foo.bar" in node(inputs=) #2749

m-gris opened this issue Jun 30, 2023 · 17 comments
Labels
Help Wanted 🙏 Contribution task, outside help would be appreciated! Issue: Feature Request New feature or improvement to existing feature

Comments

@m-gris
Copy link

m-gris commented Jun 30, 2023

When defining a node, it would be really nice and convenient if the params dictionary could be unpacked with something like:

node(foo, inputs=['dataset', '**params:bar'] ...

This would not only make the node definition more concise than

inputs=['dataset', "params:bar.a", "params:bar.b", "params:bar.c" ... 

But it would also result in a more dynamic and flexible node definition, "immune" to changes in the function's signature.
(in which case all that would require an update would be the params file, as opposed to both the node def. and the params file in the other case)

I hope that this request / suggestion does not go against Kedro philosophy :)

Thx & regards
M.

@m-gris m-gris added the Issue: Feature Request New feature or improvement to existing feature label Jun 30, 2023
@astrojuanlu
Copy link
Member

Thanks @m-gris for opening this feature request! We'll discuss about it soon.

@astrojuanlu astrojuanlu added the Community Issue/PR opened by the open-source community label Jul 3, 2023
@m-gris
Copy link
Author

m-gris commented Jul 3, 2023

Thanks @m-gris for opening this feature request! We'll discuss about it soon.

Thanks for considering it ! :)

@noklam
Copy link
Contributor

noklam commented Aug 9, 2023

I have evidence this could be useful too. But it may requires something more than just unpacking

# model_predict/functions.py
def make_prediction(model, a, b, c, d, e):
    ...


# model_predict_pipeline/nodes.py
def make_prediction_node(model, params: Dict):
    make_prediction(
        model,
        params.get("something", None),
        params.get("something", None),
        params.get("something", []),
        params.get("something", []),
    )

Noted that here two things are done:

  1. Dict unpacking
  2. Assign Default

Or maybe 2. is bad and we shouldn't support it because passing a list into function argument is anti-pattern and None should be default in make_prediction already. Is there use case where 2. is also valid?

edit(deepyaman): format code example to make it easier to read

@deepyaman
Copy link
Member

In short, I think it makes sense to pass a parameters dictionary to a node (i.e. the current supported behavior).

While I agree that dict unpacking sounds nice, I wonder how often people would mess up because configuration keys don't match parameter names. Also, how often would the value of supporting this syntax be realized? I feel like it's most beneficial when you have a node (say, def make_prediction(model, a, b, c, d, e): ... from the example posted by @noklam) that you want to call both by passing a parameter dictionary and ALSO in cases by explicitly passing different parameters to the various arguments. Does this happen often in a codebase? One could argue that Python supports the unpacking syntax, but I don't think it's fair to think Kedro nodes should be near as flexible.

On the last example of assigning a default, I tend to agree that you could just use keyword arguments with default.

@noklam
Copy link
Contributor

noklam commented Aug 9, 2023

I tend to think we shouldn't add arbitrary constrains unless there are good reason. And often users will try to find their way to do it anyway (e.g. Using environment variable for everything).

@nok I agree about using keyword arguments, and think you could then simplify it to:

some_package/model_predict_pipeline/nodes.py

def make_prediction_node(model, params: Dict):
make_prediction(model, **params)

I responded more fully on the linked issue.

I think this will be more common if you need to reuse the function, which is something Kedro values. This will work, but as a package it hides all the function signature and it becomes less clear what parameters are available.

On the note that you mentioned people may mess up. In that case, I expect it is the same when you use a python function that takes keyword arguments but you passed a wrong keyword. You should get an explicit error and I don't see any problem with that.

@m-gris since you created this issue, what's your thoughts on this?

@datajoely
Copy link
Contributor

In general I believe we need this so that modular pipeline reuse can be dynamic

@noklam
Copy link
Contributor

noklam commented Aug 10, 2023

@datajoely do you have an example?

@datajoely
Copy link
Contributor

See this thread
#1228

@noklam
Copy link
Contributor

noklam commented Aug 11, 2023

Adding more evidence that this would be useful - https://linen-slack.kedro.org/t/13902185/if-kedro-supports-unpacking-parameters-dictionary-would-it-c#fe7ba273-2708-4764-8473-920b0636e78f

Markus Sagen and marcccin is in favor of this too

@marrrcin
Copy link
Contributor

marrrcin commented Aug 11, 2023

Imho, unpacking at the pipeline compilation level would be also beneficial from the perspective of pipeline validation. If I understand correctly -let's say we have parameters.yml:

some_params:
   a: 1
   b: 2
def node(data_input, a, b, c):
    # use c
    pass

Using the "unpacking" inputs=["data", "**params:some_params"] will actually validate the function signature vs the params (count and names) - raising immediately that c is missing,

whereas for node

def node(data_input, params: dict):
    # use params["c"]
    pass

with inputs=["data", "params:some_params"] it can result in a runtime error if c is missing. 🤔

@noklam
Copy link
Contributor

noklam commented Aug 11, 2023

good point!

@m-gris
Copy link
Author

m-gris commented Aug 11, 2023

Hi
Great to see that this is sparking some interest / debate :)
@nok: Aside from what I said in my original message I do not have much to add... But I must say that marrrcin point is indeed a good / important one !

@noklam
Copy link
Contributor

noklam commented Aug 14, 2023

@m-gris Would you be able to open a PR on this? We have a brief discussion today and we think it would be a useful feature to add. It's not our priority immediately, but we are happy to accept PR as this is something useful to the community.

@noklam noklam added the Help Wanted 🙏 Contribution task, outside help would be appreciated! label Aug 14, 2023
@m-gris
Copy link
Author

m-gris commented Aug 15, 2023

@noklam
I wouldn't be able to work on it immediately.
But happy to try as soon as my workload will allow.
Hopefully Soon :)

@noklam
Copy link
Contributor

noklam commented Aug 15, 2023

@m-gris Amazing! That's totally fine and looking forward to the PR 🚀

@inigohidalgo
Copy link
Contributor

inigohidalgo commented Aug 25, 2023

Just adding additional info where this could be useful. This isn't on the **kwargs unpacking but the *args unpacking:

https://linen-slack.kedro.org/t/9703505/hey-all-simple-question-is-it-possible-to-pass-both-position
#2408 (comment)

@noklam
Copy link
Contributor

noklam commented Aug 25, 2023

@inigohidalgo good point, I have raised the same issue in #2924 (comment)

More broadly I think we should look at this as how different a node is versus a Python function. I also not sure how hard is it to implement a non breaking change.

@merelcht merelcht removed the Community Issue/PR opened by the open-source community label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help Wanted 🙏 Contribution task, outside help would be appreciated! Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

8 participants