-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bugs #5
Comments
Interesting that our loss is actually the mean of the two losses. I wonder if we should modify and then re-run? What exactly does ``version'' refer to and where is it referenced/used? I can look through the code later to find the answer to this one |
Maybe we should re-run, but perhaps first verify that this problem exists in the PyTorch version we've been using? On the "version" - I'm not sure. We can ask Rabeeh (https://github.com/rabeehk). |
I dont think its pytorch version because there is no pytorch version 2. I'm also ok with keeping the pytorch version as is since we do release the conda env file we used which explicitly relies on a pytorch version. But yup, asking would be good. |
Hi Adam, |
What exactly do you mean by “current pytorch implementation for
cross-entropy loss”? Is this pytorch v1.2 and is this the implementation in
pytorch v0.4 as well?
Thanks so much for helping us with this!
Adam
…On Tue, Feb 4, 2020 at 7:08 PM Rabeeh Karimi Mahabadi < ***@***.***> wrote:
Hi Adam,
Thanks for looking into this, on line 381, there is
"loss_fn_nli.size_average = False", in the original InferSent code they did
it to compute the loss as a sum over batches, then they needed line 205
"p.grad.data.div_(k)" to divide the value for the batch size, but if you
look into the current pytorch implementation for cross-entropy loss, they
have now introduced a parameters "reduction" which is the way they decide
to sum or average over the batches, so basically, the parameters
"size_average" can only have impact if this is set during the definition of
the cross-entropy loss, but not later on when this is called after the loss
is defined in the line 380, so basically line 381 does not have any impact
here, meaning that the loss is computed as average over the batches
(default case), so line 205 needs to be removed now. In the original codes
of InferSent there was not this issue, since at that point, cross entropy
implementation was different.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AA3XBFXYHPLVBFXV2ACH6E3RBH7ODA5CNFSM4KPQQTM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKZVGTI#issuecomment-582177613>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3XBFUXUE6G3HQTX5R7UVLRBH7ODANCNFSM4KPQQTMQ>
.
|
Hi Adam, Anyway, still line 381 does not have impact and still I think this is was a bug in the original code, since they wanted perhaps a sum over loss in the batches, and when I removed line 205, my results in other projects got improved. |
The following messages from Rabeeh Karimi may indicate potential bugs with our training code, which is based on InferSent.
First, in the loss definition:
In the current version of pytorch,
size_average
is only set if this is an argument to the initialization method, soloss_fn.size_average = False
after loss is created would not impact, therefore the loss is computed based on "mean" not "sum".Since now the loss is computed based on mean, the line
p.grad.data.div_(k)
needs to be removed.Second, there was a bug in padding, and one needs to set
version=2
to have the correct implementation.The text was updated successfully, but these errors were encountered: