[rllib] action from policy with Tuple action space has wrong shape

What is the problem?

I have an environment with a tuple action space. During training, there is no problem. However, when I attempt to demonstrate the learned policy, I get actions of the wrong shape. Like in #3048, I get actions that are 2d arrays instead of the expected 1d arrays.

Ray version and other system information (Python version, TensorFlow version, OS):
Ray: 0.8.5
Python: 3.7.7
TF: 2.3.0
OS: Mac 10.14

Reproduction (REQUIRED)

# Dummy test case

import gym
from gym.spaces import Tuple, Box, Discrete
import numpy as np

class TupleCorridorEnv(gym.Env):
    def __init__(self, config={}):
        self.size = 5

        self.observation_space = Box(low=0, high=self.size-1, shape=(2,), dtype=np.int)
        self.action_space = Tuple((Discrete(2), Box(low=0, high=1, shape=(2,), dtype=np.int)))

    def reset(self):
        self.num_steps = 0
        self.pos = np.array([0, 0])
        return self.pos
    
    def step(self, action):
        self.num_steps += 1
        movement = action[1] # The second array in the tuple
        self.pos[0] += movement[0]
        self.pos[1] += movement[1]
        if self.pos[0] == self.size-1 and self.pos[1] == self.size-1:
            return self.pos, 1, True, {}
        else:
            if self.num_steps >= 10:
                return self.pos, -1, True, {}
            else:
                return self.pos, 0, False, {}

ray_tune = {
    'run_or_experiment': 'PG',
    'checkpoint_at_end': True,
    'stop': {
        'episodes_total': 2,
    },
    'config': {
        'env': TupleCorridorEnv,
        'env_config': {},
    }
}

import ray
from ray import tune
ray.init()
tune.run(**ray_tune)

alg = ray.rllib.agents.registry.get_agent_class('PG')
agent = alg(
    env = TupleCorridorEnv,
    config = {},
)
env = TupleCorridorEnv()

# agent.restore(...), not needed to reproduce the error
obs = env.reset()
while True:
    action = agent.compute_action(obs) # Get the action

    print('\nAction is: ')
    print(action)
    print('\n')

    obs, reward, done, info = env.step(action)
    if done == True:
        break

ray.shutdown()

Notice that the printed action is a tuple where the second element is a 2d array instead of a 1d array, just like #3048. This only appears to happen via agent.compute_action(obs) and not during training.

If we cannot run your script, we cannot fix your issue.

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

I have not upgraded to ray 0.8.7 because that introduces this bug [rllib] function unbatch in ray/rllib/utils/spaces/space_utils.py does not work as intended #10100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the problem?

Reproduction (REQUIRED)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

What is the problem?

Reproduction (REQUIRED)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions