Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the loss used for optimizing the proxy model #25

Open
clarkkent0618 opened this issue Jan 9, 2024 · 3 comments
Open

Comments

@clarkkent0618
Copy link

@sangmichaelxie It seems that the loss used for optimizing the proxy model in the code is different from the one described in the paper.

loss = (pertoken_loss * curr_domain_weights.detach()).sum() / normalizer

In the code, you directly use the proxy model's own loss here to optimize. But in the paper, the loss seems to be the minimax loss which uses the excess loss. So which one should I conform? Or there is something wrong with my understanding. Thanks.

image
@yuzc19
Copy link

yuzc19 commented Jan 16, 2024

I have another question. When training the main model, what is the difference between resampling the data from the new distribution or using new weights to re-weight loss? Will these two have a significant performance gap?
Screenshot 2024-01-16 at 12 20 24

@sangmichaelxie
Copy link
Owner

In the code, you directly use the proxy model's own loss here to optimize. But in the paper, the loss seems to be the minimax loss which uses the excess loss. So which one should I conform? Or there is something wrong with my understanding. Thanks.

The reference model loss is a constant with respect to the proxy model's parameters, so it doesn't affect the proxy model update and we omit it. The reference model loss does affect the domain weight update.

When training the main model, what is the difference between resampling the data from the new distribution or using new weights to re-weight loss? Will these two have a significant performance gap?

Check out the main paragraph in pg 9 and Table 3b in the paper.

@yuzc19
Copy link

yuzc19 commented Jan 17, 2024

I checked it, and it makes sense. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants