8000 Incorrect Masked Huber Loss calculation · Issue #14 · lifrordi/DeepStack-Leduc · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Incorrect Masked Huber Loss calculation #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bobbiesbob opened this issue Feb 20, 2018 · 6 comments
Open

Incorrect Masked Huber Loss calculation #14

bobbiesbob opened this issue Feb 20, 2018 · 6 comments

Comments

@bobbiesbob
Copy link

in line 57 of masked_huber_loss.lua, it says 1 is for impossible features.

it is actually 0 for impossible features.

So line 65 should actually be (batch_size * feature_size) / self.mask_sum:sum()

Lines 58-60 should also be changed

@lifrordi
Copy link
Owner
lifrordi commented May 11, 2018

No, the code is correct, there are two variables mask and mask_multiplier that have different semantics of 0 and 1

@dmorrill10
Copy link

I got a question about this too, so to clarify:

  • mask_sum has the number of possible hands, since it's the sum over columns of mask.
  • mask_multiplier = (feature_size - mask_sum) / feature_size is then the number of impossible hands divided by the number of total hands.
  • The loss gradients, dloss_doutput, are divided by mask_multiplier, so the adjusted gradient is the original gradient multiplied by the number of total hands, divided by the number of impossible hands.

Why is the gradient not divided by the number of possible hands?

@JaysenStark
Copy link

@dmorrill10 I have excatly the same question as you, did you figure it out?

@dmorrill10
Copy link

@JaysenStark no, not yet.

@KK666-AI
Copy link
KK666-AI commented May 9, 2019

@dmorrill10 @lifrordi I think the correct loss should be loss = avg(sum( |pred_i- actual_i| )/mask_i). That is, each sample should compute its own average loss, and then compute the avg batch loss.

The implementation is confusing. Because mask_multiplier actually is used to weight batch loss, when using stochastic gradient descent method, this mask_multiplier will change the scale of derivative, then so that it's hard for optimizer such as adam to compute which learning rate should be used in the next iteration.

@KK666-AI
Copy link
KK666-AI commented May 9, 2019

I got a question about this too, so to clarify:

  • mask_sum has the number of possible hands, since it's the sum over columns of mask.
  • mask_multiplier = (feature_size - mask_sum) / feature_size is then the number of impossible hands divided by the number of total hands.
  • The loss gradients, dloss_doutput, are divided by mask_multiplier, so the adjusted gradient is the original gradient multiplied by the number of total hands, divided by the number of impossible hands.

Why is the gradient not divided by the number of possible hands?

The gradient doesn't need to divided by the number of possible hands because the loss is already regularized by this number. According to the theory of derivative, the gradient is computed by the regularized loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
0