Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bugs #5

Open
boknilev opened this issue Feb 4, 2020 · 6 comments
Open

Potential bugs #5

boknilev opened this issue Feb 4, 2020 · 6 comments

Comments

@boknilev
Copy link
Collaborator

boknilev commented Feb 4, 2020

The following messages from Rabeeh Karimi may indicate potential bugs with our training code, which is based on InferSent.

First, in the loss definition:

loss_fn = nn.CrossEntropyLoss(weight=weight)
loss_fn.size_average = False

In the current version of pytorch, size_average is only set if this is an argument to the initialization method, so loss_fn.size_average = False after loss is created would not impact, therefore the loss is computed based on "mean" not "sum".

for p in nli_net.parameters():
     if p.requires_grad: 
         p.grad.data.div_(k)  # this line needs to be removed
     # divide by the actual batch size 
     total_norm += p.grad.data.norm() ** 2

Since now the loss is computed based on mean, the line p.grad.data.div_(k) needs to be removed.

Second, there was a bug in padding, and one needs to set version=2 to have the correct implementation.

@azpoliak
Copy link
Owner

azpoliak commented Feb 4, 2020

Interesting that our loss is actually the mean of the two losses. I wonder if we should modify and then re-run?

What exactly does ``version'' refer to and where is it referenced/used? I can look through the code later to find the answer to this one

@boknilev
Copy link
Collaborator Author

boknilev commented Feb 4, 2020

Maybe we should re-run, but perhaps first verify that this problem exists in the PyTorch version we've been using?

On the "version" - I'm not sure. We can ask Rabeeh (https://github.com/rabeehk).

@azpoliak
Copy link
Owner

azpoliak commented Feb 4, 2020

I dont think its pytorch version because there is no pytorch version 2. I'm also ok with keeping the pytorch version as is since we do release the conda env file we used which explicitly relies on a pytorch version. But yup, asking would be good.

@rabeehk
Copy link

rabeehk commented Feb 5, 2020

Hi Adam,
Thanks for looking into this, on line 381, there is "loss_fn_nli.size_average = False", in the original InferSent code they did it to compute the loss as a sum over batches, then they needed line 205 "p.grad.data.div_(k)" to divide the value for the batch size, but if you look into the current pytorch implementation for cross-entropy loss, they have now introduced a parameters "reduction" which is the way they decide to sum or average over the batches, so basically, the parameters "size_average" can only have impact if this is set during the definition of the cross-entropy loss, but not later on when this is called after the loss is defined in the line 380, so basically line 381 does not have any impact here, meaning that the loss is computed as average over the batches (default case), so line 205 needs to be removed now. In the original codes of InferSent there was not this issue, since at that point, cross entropy implementation was different.

@azpoliak
Copy link
Owner

azpoliak commented Feb 5, 2020 via email

@rabeehk
Copy link

rabeehk commented Feb 5, 2020

Hi Adam,
I went and checked older version of pytorch, and cross-entropy loss is the same in all I checked. Sorry I was thinking initially that the implementation for this loss must have changed. Please accept my apologies.

Anyway, still line 381 does not have impact and still I think this is was a bug in the original code, since they wanted perhaps a sum over loss in the batches, and when I removed line 205, my results in other projects got improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants