-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Var spec obj #161
Var spec obj #161
Conversation
Currently a new object, MetaVarSpec has been added. It can be used as such: Previously: ```python from metasynth.distribution import FakerDistribution, DiscreteUniformDistribution, RegexDistribution from metasynth import MetaVarSpec demo_file_path = demo_file() demo_types={ "Sex": pl.Categorical, "Embarked": pl.Categorical } df = pl.read_csv(demo_file_path, try_parse_dates=True, dtypes=demo_types) var_spec = { "PassengerId": {"unique": True}, "Name": {"distribution": FakerDistribution("name")}, "Fare": {"distribution": "ExponentialDistribution"}, "Age": { "distribution": DiscreteUniformDistribution(20, 40), }, "Cabin": {"distribution": RegexDistribution(r"[ABCDEF]\d{2,3}")} } mf = MetaFrame.fit_dataframe(df, spec=var_spec) print(mf.synthesize(10)) ``` New approach: ```python new_spec = MetaVarSpec(df) new_spec.PassengerId.unique = True new_spec.Name.distribution = FakerDistribution("name") new_spec.Fare.distribution = "ExponentialDistribution" new_spec.Age.distribution = DiscreteUniformDistribution(20, 40) new_spec.Cabin.distribution = RegexDistribution(r"[ABCDEF]\d{2,3}") mf2 = MetaFrame.fit_dataframe(df, spec=new_spec.to_dict()) print(mf2.synthesize(10)) ``` Far from perfect, but a good start.
Last approach was having a lot of bugs, this new approach should be simpler to use AND understand
|
||
|
||
def type_check(key, value, type_dict=AVAILABLE_TYPES): | ||
if not any(isinstance(value, allowed_type) for allowed_type in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use isinstance(value, type_dict[key])
to check for multiple types at once.
It looks really good! One thing that might be good to discuss is whether to do type checking here. I think there is a good reason to do it here, but generally it's not considered "pythonic" (whatever it means) as far as I understand. Let's discuss it later today. |
@Samuwhale What are the plans with this PR? I would like to merge this PR so that we can start with renaming the package. Is it working to the point where it can be optimized/improved later, or does it still need a lot of work you think? |
let's maybe hold off on implementing this one until the rename is done. I'd like to have a bit more conversation around the interface before we finalize this one. |
7d00997
to
212bb76
Compare
I just pushed a commit that fixed all workflow warnings. The code is fully functional without changing any of the current code-base (it only adds to it), so in theory, it can be merged. That said, I think it probably is indeed nicer to work on this a bit further before implementing it! |
EDIT: Due to rebasing issues I have moved to a new branch, see #194 for the up to date PR.
Since last discussing this with @qubixes I have completely rewritten the code, as it was not behaving as it should. Now, however, it should work pretty well, it's also pretty easy to read/understand since I have added docstrings. It's still VERY basic, but it's a good first step.
Here's an example of how to use it: