8000 Bug in regression_EM.py? · Issue #10 · ULTR-Community/ULTRA_pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Bug in regression_EM.py? #10
Open
Open
@lendle

Description

@lendle

I'm a little rusty at this stuff but I believe there might be a bug in the implementation of the position bias EM algorithm here for p_r1 = reshaped_train_labels + (1 - reshaped_train_labels) * p_e0_r1_c0. In particular, I think p_r1 for unclicked documents, (the 1-reshaped_train_labels case), is incorrect.

Using notation from the paper, page 614, we want to compute $P(R=1|C=0, q, k, d)$ by marginalizing $P(R=1, E|C=0, q, k, d)$ over $E$:

$$ \begin{aligned} P(R=1|C=0, q, k, d) =& \underbrace{P(R=1, E=1|C=0, q, k, d)}_{=0} P(E=1|C=0, q, k, d) \\ & + P(R=1, E=0|C=0, q, k, d) P(E=0|C=0, q, k, d) \\ =& P(R=1, E=0|C=0, q, k, d) (1- P(E=1|C=0, q, k, d)) \end{aligned} $$

The implementation just has $P(R=1|C=0, q, k, d) = P(R=1, E=0|C=0, q, k, d)$.
That is, it's missing the $(1- P(E=1|C=0, q, k, d))$ term.

Am I missing something or is this a bug? Unfortunately, the paper just says "From this, we can compute the marginals P(E = 1|c, q, d, k) and P(R = 1|c, q, d, k)" without additional details and I can't find any other implementations of this paper to see how they're working. Like I said, I'm rusty at this so I may be missing something obvious.


Note, I think the same issue comes up here with $P(E=1|C=0, q,k, d)$, which is not explicitly named in the code like p_r1, but is computed as reshaped_train_labels + (1 - reshaped_train_labels) * p_e1_r0_c0 if I'm reading things correctly.

I think it should instead be

$$ P(E=1|C=0, q, k, d) = P(R=0, E=1|C=0, q, k, d) (1- P(R=1|C=0, q, k, d)) $$

Of course, these depend on each other, but with these two equations we can solve for $P(E=1|C=0, q, k, d)$ and $P(R=1|C=0, q, k, d)$ in terms of $P(R, E|C=0, q, k, d)$ which we can compute from gamma and self.propensity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0