8000 [RLlib] Categorical action dist incorrectly uses tf.random.categorical · Issue #24055 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[RLlib] Categorical action dist incorrectly uses tf.random.categorical #24055
Open
@HJasperson

Description

@HJasperson

What happened + What you expected to happen

The problem is here:

def _build_sample_op(self) -> TensorType:
return tf.squeeze(tf.random.categorical(self.inputs, 1), axis=1)

tf.random.categorical takes in log probabilities even though the name of the input variable is 'logits'. Here, self.inputs are the logits, so masked actions (where logits[a]=0) are considered valid samples by tf. This also impacts MultiCategorical since it ultimately calls this same method.

Versions / Dependencies

ray 1.11.0 (still present in 1.12, though)
python 3.9
tf 2.7
rhel 7.9

Reproduction script

# mask last 3 actions
z = tf.constant([[0.5,0.5,0.5,0,0,0]])

# current - will sample masked actions
tf.random.categorical(z,10)

# corrected - won't sample masked actions
tf.random.categorical(tf.math.log(z),10)

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tpending-cleanupThis issue is pending cleanup. It will be removed in 2 weeks after being assigned.rllibRLlib related issuesrllib-modelsAn issue related to RLlib (default or custom) Models.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0