8000 Bug in assert statement in RewardNet that prevents using dict observations even if properly linearized by a feature extractor. · Issue #868 · HumanCompatibleAI/imitation · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Bug in assert statement in RewardNet that prevents using dict observations even if properly linearized by a feature extractor. #868
Open
@WilliamDormer

Description

@WilliamDormer

Bug description

In predict_th, there's an assert statement that says the following:
assert rew_th.shape == state.shape[:1]
This will fail, even if you've modified state_th using self.preprocess to be a valid tensor.

Steps to reproduce

Attempt to pre-process a dictionary style state with the preprocess function of the reward network. Even if you return a valid state_th, it checks against the original state, which is incorrect.

Instead I believe it should be:
assert rew_th.shape == state_th.shape[:1]

I believe this was a typo in the implementation.

Environment

  • Linux 24.04
  • Python version: 3.10.17

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0