Description
I'm a little rusty at this stuff but I believe there might be a bug in the implementation of the position bias EM algorithm here for p_r1 = reshaped_train_labels + (1 - reshaped_train_labels) * p_e0_r1_c0
. In particular, I think p_r1
for unclicked documents, (the 1-reshaped_train_labels
case), is incorrect.
Using notation from the paper, page 614, we want to compute
The implementation just has
That is, it's missing the
Am I missing something or is this a bug? Unfortunately, the paper just says "From this, we can compute the marginals P(E = 1|c, q, d, k) and P(R = 1|c, q, d, k)" without additional details and I can't find any other implementations of this paper to see how they're working. Like I said, I'm rusty at this so I may be missing something obvious.
Note, I think the same issue comes up here with p_r1
, but is computed as reshaped_train_labels + (1 - reshaped_train_labels) * p_e1_r0_c0
if I'm reading things correctly.
I think it should instead be
Of course, these depend on each other, but with these two equations we can solve for gamma
and self.propensity
.