You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the code, you directly use the proxy model's own loss here to optimize. But in the paper, the loss seems to be the minimax loss which uses the excess loss. So which one should I conform? Or there is something wrong with my understanding. Thanks.
The text was updated successfully, but these errors were encountered:
I have another question. When training the main model, what is the difference between resampling the data from the new distribution or using new weights to re-weight loss? Will these two have a significant performance gap?
In the code, you directly use the proxy model's own loss here to optimize. But in the paper, the loss seems to be the minimax loss which uses the excess loss. So which one should I conform? Or there is something wrong with my understanding. Thanks.
The reference model loss is a constant with respect to the proxy model's parameters, so it doesn't affect the proxy model update and we omit it. The reference model loss does affect the domain weight update.
When training the main model, what is the difference between resampling the data from the new distribution or using new weights to re-weight loss? Will these two have a significant performance gap?
Check out the main paragraph in pg 9 and Table 3b in the paper.
@sangmichaelxie It seems that the loss used for optimizing the proxy model in the code is different from the one described in the paper.
In the code, you directly use the proxy model's own loss here to optimize. But in the paper, the loss seems to be the minimax loss which uses the excess loss. So which one should I conform? Or there is something wrong with my understanding. Thanks.
The text was updated successfully, but these errors were encountered: