8000 [rllib] action from policy with Tuple action space has wrong shape · Issue #10516 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[rllib] action from policy with Tuple action space has wrong shape #10516
Open
@rusu24edward

Description

@rusu24edward

What is the problem?

I have an environment with a tuple action space. During training, there is no problem. However, when I attempt to demonstrate the learned policy, I get actions of the wrong shape. Like in #3048, I get actions that are 2d arrays instead of the expected 1d arrays.

Ray version and other system information (Python version, TensorFlow version, OS):
Ray: 0.8.5
Python: 3.7.7
TF: 2.3.0
OS: Mac 10.14

Reproduction (REQUIRED)

# Dummy test case

import gym
from gym.spaces import Tuple, Box, Discrete
import numpy as np

class TupleCorridorEnv(gym.Env):
    def __init__(self, config={}):
        self.size = 5

        self.observation_space = Box(low=0, high=self.size-1, shape=(2,), dtype=np.int)
        self.action_space = Tuple((Discrete(2), Box(low=0, high=1, shape=(2,), dtype=np.int)))

    def reset(self):
        self.num_steps = 0
        self.pos = np.array([0, 0])
        return self.pos
    
    def step(self, action):
        self.num_steps += 1
        movement = action[1] # The second array in the tuple
        self.pos[0] += movement[0]
        self.pos[1] += movement[1]
        if self.pos[0] == self.size-1 and self.pos[1] == self.size-1:
            return self.pos, 1, True, {}
        else:
            if self.num_steps >= 10:
                return self.pos, -1, True, {}
            else:
                return self.pos, 0, False, {}

ray_tune = {
    'run_or_experiment': 'PG',
    'checkpoint_at_end': True,
    'stop': {
        'episodes_total': 2,
    },
    'config': {
        'env': TupleCorridorEnv,
        'env_config': {},
    }
}

import ray
from ray import tune
ray.init()
tune.run(**ray_tune)

alg = ray.rllib.agents.registry.get_agent_class('PG')
agent = alg(
    env = TupleCorridorEnv,
    config = {},
)
env = TupleCorridorEnv()

# agent.restore(...), not needed to reproduce the error
obs = env.reset()
while True:
    action = agent.compute_action(obs) # Get the action

    print('\nAction is: ')
    print(action)
    print('\n')

    obs, reward, done, info = env.step(action)
    if done == True:
        break

ray.shutdown()

Notice that the printed action is a tuple where the second element is a 2d array instead of a 1d array, just like #3048. This only appears to happen via agent.compute_action(obs) and not during training.

If we cannot run your script, we cannot fix your issue.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tpending-cleanupThis issue is pending cleanup. It will be removed in 2 weeks after being assigned.rllibRLlib related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0