-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: integer data #13
Comments
You can't just insert a rounding step during training, as the sampling step must be differentiable. With the current package, I'd recommend treating integers as continuous variables and rounding ex-post. For any ordered categorical, I'd recommend mapping them to integers and doing the same. We could automate this by modifying the In your experience, is there anything wrong with the data generated in that way? I wouldn't expect there to be any, if the GAN is flexible enough it should spit out numbers that are really close to integers anyway. If you observe specific failure case with that approach, we can think of modified sampling procedures for integers during training. But any approach that immediately comes to mind ends up being either really close to the categorical case or to the continuous case with ex-post rounding, which is why I didn't consider it worth implementing so far. |
I was hoping that one could do something like a Doing the rounding in the I have a data set with some integer, some binary, and some continuous variable(s). Loosely speaking, when looking at conditional means of one variable conditional on another, the curves look more different when using the integer variable. But of course, there may be many other reasons for that, so I'll need to do more testing to see what's really going on. My expectation is that some "internal" rounding is likely to yield a better result, though I don't know if that improvement is worth the effort of implementing it. |
Yeah, I'd say using the soft-round function would be pretty close to the continuous-case with ex-post rounding. After all, you do generate a continuous variable with real-valued support, and if you want it to be an actual integer you have to round ex-post. On the one hand, Another option would be the following function with biased gradients:
which is equal to But yeah, I'm not surprised that training is a bit harder for integers. Does the slightly worse fit you observe for integers go away if you just train longer and increase the GAN size? Otherwise we could use your experiment as testing ground to see if a |
That's a good explanation, thank you! |
it would be great if there was a way to simulate integer / ordered categorical data, say age in years. Treating it as a categorical variable seems to yield data sets where other variables are less smooth in age than desired and probably also increases the complexity of the training task (by turning each value into a dummy?). Treating it as a continuous variable requires rounding ex post, but ideally the rounding would happen even in training?
The text was updated successfully, but these errors were encountered: