-
Notifications
You 8000 must be signed in to change notification settings - Fork 216
Incorrect Masked Huber Loss calculation #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
No, the code is correct, there are two variables |
I got a question about this too, so to clarify:
Why is the gradient not divided by the number of possible hands? |
@dmorrill10 I have excatly the same question as you, did you figure it out? |
@JaysenStark no, not yet. |
@dmorrill10 @lifrordi I think the correct loss should be loss = avg(sum( |pred_i- actual_i| )/mask_i). That is, each sample should compute its own average loss, and then compute the avg batch loss. The implementation is confusing. Because mask_multiplier actually is used to weight batch loss, when using stochastic gradient descent method, this mask_multiplier will change the scale of derivative, then so that it's hard for optimizer such as adam to compute which learning rate should be used in the next iteration. |
The gradient doesn't need to divided by the number of possible hands because the loss is already regularized by this number. According to the theory of derivative, the gradient is computed by the regularized loss. |
in line 57 of masked_huber_loss.lua, it says 1 is for impossible features.
it is actually 0 for impossible features.
So line 65 should actually be
(batch_size * feature_size) / self.mask_sum:sum()
Lines 58-60 should also be changed
The text was updated successfully, but these errors were encountered: