Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use factor levels of data not RF #30

Merged
merged 6 commits into from
Jun 12, 2024
Merged

Use factor levels of data not RF #30

merged 6 commits into from
Jun 12, 2024

Conversation

mnwright
Copy link
Member

@mnwright mnwright commented Jun 7, 2024

If ranger is called with respect.unordered.factors = TRUE (as we do), it is reordering the factor levels. We shouldn't use those factors for the synthetic data, because it also reorders those factor levels.

@mnwright mnwright requested a review from jkapar June 7, 2024 06:34
Copy link
Collaborator

@jkapar jkapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically regarding your question in Slack: In forge, this is not affected:

arf/R/forge.R

Line 262 in c88612c

setorder(NA_share[,variable := factor(variable, levels = params$meta[,variable])], variable, idx)

However, I think that we might have problems now here:
https://github.com/bips-hb/arf/pull/30/files#diff-801e0397141f9e243eabf2215df8271056fb53a3aedc149a7f8ec83d11b9af1aL322

https://github.com/bips-hb/arf/pull/30/files#diff-801e0397141f9e243eabf2215df8271056fb53a3aedc149a7f8ec83d11b9af1aL344-R372

We use the bounds from the rf in both cases:
https://github.com/bips-hb/arf/pull/30/files#diff-801e0397141f9e243eabf2215df8271056fb53a3aedc149a7f8ec83d11b9af1aL198

Maybe it's easier to use the old definition here and use the new one only to save it in the output?

@jkapar jkapar self-requested a review June 11, 2024 15:23
@jkapar jkapar self-requested a review June 11, 2024 15:29
@mnwright mnwright merged commit 6b8aafd into main Jun 12, 2024
6 checks passed
@mnwright mnwright deleted the fix_levels branch June 12, 2024 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants