-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove assumption that all epochs have same number of batches #411
Conversation
Mixins and experiments that vary the batch size (e.g. VaryBatchSize mixin) or the samplers (e.g. ContinualLearningExperiment) will now be compatible with code that relies on step counts:(e.g. OneCycleLR).
FYI I'm currently trying to make this more functional. The PR as-is mutates a central dataloader. It would be easier to understand and more robust if the dataloader was created in a functional way, rather than being reused, though its internals like the dataset and task indices should be reused. (That was my original plan, and now I'm trying it again.) |
fc57ad3
to
7a321fc
Compare
This PR depends on #414 (to avoid breaking the meta_cl_experiment). The regression tests pass, and there's no measurable slowdown in CPU profiles of the imagenet experiment. (pre_epoch is negligible) Pasting second commit message: Functional programming approach to dataloaders |
74d8c76
to
d7f6f36
Compare
Rather than mutating a central dataloader over time, create new dataloaders on-demand (while storing internal immutable state like datasets). This reduces potential for bugs, and it also makes the create_train_dataloader method fully capable even when called as a classmethod.
d7f6f36
to
b17d13a
Compare
What are the instances where we vary the dataloader per epoch? Of course, VaryBatchSize is one of them, but are there others? I'm not sure how I feel about such a fundamental change for only one class of experiments. I'm sure you thought extensively about what features you're trying to support, so I'm curious what you have in mind with these changes. |
I also was a fan of calling |
Any ContinualLearningExperiment also may vary the number of batches per epoch, even if unintentionally. In general, this opens us up to running different tasks or groups of tasks in different epochs, without breaking built-in assumptions (like (Plus, VaryBatchSize isn't that obscure of a case. We happen to implement it as a mixin, but we could have just as easily built it in.) |
Is it any less explicit here? We're still calling set_active_tasks. When we create the sampler, we create it for a specific epoch number. It's different, but still explicit. And now we don't have to store a bunch of state in our heads to understand and debug the code. |
I’m curious if there are other ways of dealing with VaryBatchSize and OneCycleLR. I'm wary of creating functionality in anticipation of a need that hasn't fully realized just yet. As of now, we haven't had a desire to train with OneCylceLR and VaryBatchSize with samplers that change every epoch. In the past, I’ve often fallen victim to overgeneralizing. The code can become harder to understand and harder to maintain at the expense of a feature that may not be used as much as originally thought. I just want to be sure there’s a strong need given the changes. Here are the downsides that I see:
To avoid these issues, I’d like to minimize the changes needed to accommodate the varying steps per epoch. Here’s a general route we may take: We could keep the samplers as they were, have one Here's a rough draft of what that may look like
It may also be possible to eliminate the need for the |
I'll spend some more time thinking about that, but here are a couple initial thoughts:
|
Gotcha. I'm glad we'll adjust the MetaCL experiment for consistency. And I can appreciate not wanting to have experiment-specific code in VaryBatchSize. When I proposed the opposite, I thought your option 5 from here would be well suited for that, maybe for later. For this PR right now, I would still like to keep things as simple as possible. I believe something similar to what I wrote above would work. But instead of having |
We'll keep this idea in our back pocket. |
Mixins and experiments that vary the batch size (e.g. VaryBatchSize mixin) or the samplers (e.g. ContinualLearningExperiment) will now be compatible with code that relies on step counts:(e.g. OneCycleLR).