Open
Description
What is the problem?
I have an environment with a tuple action space. During training, there is no problem. However, when I attempt to demonstrate the learned policy, I get actions of the wrong shape. Like in #3048, I get actions that are 2d arrays instead of the expected 1d arrays.
Ray version and other system information (Python version, TensorFlow version, OS):
Ray: 0.8.5
Python: 3.7.7
TF: 2.3.0
OS: Mac 10.14
Reproduction (REQUIRED)
# Dummy test case
import gym
from gym.spaces import Tuple, Box, Discrete
import numpy as np
class TupleCorridorEnv(gym.Env):
def __init__(self, config={}):
self.size = 5
self.observation_space = Box(low=0, high=self.size-1, shape=(2,), dtype=np.int)
self.action_space = Tuple((Discrete(2), Box(low=0, high=1, shape=(2,), dtype=np.int)))
def reset(self):
self.num_steps = 0
self.pos = np.array([0, 0])
return self.pos
def step(self, action):
self.num_steps += 1
movement = action[1] # The second array in the tuple
self.pos[0] += movement[0]
self.pos[1] += movement[1]
if self.pos[0] == self.size-1 and self.pos[1] == self.size-1:
return self.pos, 1, True, {}
else:
if self.num_steps >= 10:
return self.pos, -1, True, {}
else:
return self.pos, 0, False, {}
ray_tune = {
'run_or_experiment': 'PG',
'checkpoint_at_end': True,
'stop': {
'episodes_total': 2,
},
'config': {
'env': TupleCorridorEnv,
'env_config': {},
}
}
import ray
from ray import tune
ray.init()
tune.run(**ray_tune)
alg = ray.rllib.agents.registry.get_agent_class('PG')
agent = alg(
env = TupleCorridorEnv,
config = {},
)
env = TupleCorridorEnv()
# agent.restore(...), not needed to reproduce the error
obs = env.reset()
while True:
action = agent.compute_action(obs) # Get the action
print('\nAction is: ')
print(action)
print('\n')
obs, reward, done, info = env.step(action)
if done == True:
break
ray.shutdown()
Notice that the printed action is a tuple where the second element is a 2d array instead of a 1d array, just like #3048. This only appears to happen via agent.compute_action(obs)
and not during training.
If we cannot run your script, we cannot fix your issue.
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
- I have not upgraded to ray 0.8.7 because that introduces this bug [rllib] function unbatch in ray/rllib/utils/spaces/space_utils.py does not work as intended #10100