Potential bugs #5

boknilev · 2020-02-04T04:44:41Z

The following messages from Rabeeh Karimi may indicate potential bugs with our training code, which is based on InferSent.

First, in the loss definition:

loss_fn = nn.CrossEntropyLoss(weight=weight)
loss_fn.size_average = False

In the current version of pytorch, size_average is only set if this is an argument to the initialization method, so loss_fn.size_average = False after loss is created would not impact, therefore the loss is computed based on "mean" not "sum".

for p in nli_net.parameters():
     if p.requires_grad: 
         p.grad.data.div_(k)  # this line needs to be removed
     # divide by the actual batch size 
     total_norm += p.grad.data.norm() ** 2

Since now the loss is computed based on mean, the line p.grad.data.div_(k) needs to be removed.

Second, there was a bug in padding, and one needs to set version=2 to have the correct implementation.

The text was updated successfully, but these errors were encountered:

azpoliak · 2020-02-04T04:47:21Z

Interesting that our loss is actually the mean of the two losses. I wonder if we should modify and then re-run?

What exactly does ``version'' refer to and where is it referenced/used? I can look through the code later to find the answer to this one

boknilev · 2020-02-04T04:58:40Z

Maybe we should re-run, but perhaps first verify that this problem exists in the PyTorch version we've been using?

On the "version" - I'm not sure. We can ask Rabeeh (https://github.com/rabeehk).

azpoliak · 2020-02-04T05:08:00Z

I dont think its pytorch version because there is no pytorch version 2. I'm also ok with keeping the pytorch version as is since we do release the conda env file we used which explicitly relies on a pytorch version. But yup, asking would be good.

rabeehk · 2020-02-05T00:08:00Z

Hi Adam,
Thanks for looking into this, on line 381, there is "loss_fn_nli.size_average = False", in the original InferSent code they did it to compute the loss as a sum over batches, then they needed line 205 "p.grad.data.div_(k)" to divide the value for the batch size, but if you look into the current pytorch implementation for cross-entropy loss, they have now introduced a parameters "reduction" which is the way they decide to sum or average over the batches, so basically, the parameters "size_average" can only have impact if this is set during the definition of the cross-entropy loss, but not later on when this is called after the loss is defined in the line 380, so basically line 381 does not have any impact here, meaning that the loss is computed as average over the batches (default case), so line 205 needs to be removed now. In the original codes of InferSent there was not this issue, since at that point, cross entropy implementation was different.

azpoliak · 2020-02-05T00:10:24Z

What exactly do you mean by “current pytorch implementation for cross-entropy loss”? Is this pytorch v1.2 and is this the implementation in pytorch v0.4 as well? Thanks so much for helping us with this! Adam

…

On Tue, Feb 4, 2020 at 7:08 PM Rabeeh Karimi Mahabadi < ***@***.***> wrote: Hi Adam, Thanks for looking into this, on line 381, there is "loss_fn_nli.size_average = False", in the original InferSent code they did it to compute the loss as a sum over batches, then they needed line 205 "p.grad.data.div_(k)" to divide the value for the batch size, but if you look into the current pytorch implementation for cross-entropy loss, they have now introduced a parameters "reduction" which is the way they decide to sum or average over the batches, so basically, the parameters "size_average" can only have impact if this is set during the definition of the cross-entropy loss, but not later on when this is called after the loss is defined in the line 380, so basically line 381 does not have any impact here, meaning that the loss is computed as average over the batches (default case), so line 205 needs to be removed now. In the original codes of InferSent there was not this issue, since at that point, cross entropy implementation was different. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5?email_source=notifications&email_token=AA3XBFXYHPLVBFXV2ACH6E3RBH7ODA5CNFSM4KPQQTM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKZVGTI#issuecomment-582177613>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA3XBFUXUE6G3HQTX5R7UVLRBH7ODANCNFSM4KPQQTMQ> .

rabeehk · 2020-02-05T13:38:26Z

Hi Adam,
I went and checked older version of pytorch, and cross-entropy loss is the same in all I checked. Sorry I was thinking initially that the implementation for this loss must have changed. Please accept my apologies.

Anyway, still line 381 does not have impact and still I think this is was a bug in the original code, since they wanted perhaps a sum over loss in the batches, and when I removed line 205, my results in other projects got improved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bugs #5

Potential bugs #5

boknilev commented Feb 4, 2020

azpoliak commented Feb 4, 2020

boknilev commented Feb 4, 2020

azpoliak commented Feb 4, 2020

rabeehk commented Feb 5, 2020

azpoliak commented Feb 5, 2020 via email

rabeehk commented Feb 5, 2020

Potential bugs #5

Potential bugs #5

Comments

boknilev commented Feb 4, 2020

azpoliak commented Feb 4, 2020

boknilev commented Feb 4, 2020

azpoliak commented Feb 4, 2020

rabeehk commented Feb 5, 2020

azpoliak commented Feb 5, 2020 via email

rabeehk commented Feb 5, 2020