-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthesize binary and categorical features as strings or seeds #21
Comments
I expanded this issue to include other binary and categorical features, which should be synthesized either as seeds or as strings to avoid decimals. Here's my proposal for features with cardinality < 10, also captured in the pufvars Google sheet:
We'll test out different specifications of seed vs. classification so that's less important right now. Does this sound right, in that all regression features will be rounded? I think capturing the ordinal nature of low-cardinality features like |
synpuf5 and 6 use all the classification/seed variables in the above table as seed variables, as I need to revise the rf_synth function to support classification. Other variables are rounded. These datasets also fix #17 and use 50 instead of 20 trees. |
The synthesized MARS values are non-integer when they should be integer, and occasionally they fall far from the nearest integer. I round the values to the nearest integer.
The text was updated successfully, but these errors were encountered: