openai · matthiasplappert · Feb 26, 2018 · Nov 8, 2017 · Nov 8, 2017 · Nov 13, 2017
diff --git a/README.rst b/README.rst
@@ -15,12 +15,12 @@ If you're not sure where to start, we recommend beginning with the
 
 A whitepaper for OpenAI Gym is available at http://arxiv.org/abs/1606.01540, and here's a BibTeX entry that you can use to cite it in a publication::
 
-	@misc{1606.01540,
-		Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
-		Title = {OpenAI Gym},
-		Year = {2016},
-		Eprint = {arXiv:1606.01540},
-	}
+  @misc{1606.01540,
+    Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
+    Title = {OpenAI Gym},
+    Year = {2016},
+    Eprint = {arXiv:1606.01540},
+  }
 
 .. contents:: **Contents of this document**
    :depth: 2
@@ -50,15 +50,15 @@ You can perform a minimal install of ``gym`` with:
 
 .. code:: shell
 
-	  git clone https://github.com/openai/gym.git
-	  cd gym
-	  pip install -e .
+    git clone https://github.com/openai/gym.git
+    cd gym
+    pip install -e .
 
 If you prefer, you can do a minimal install of the packaged version directly from PyPI:
 
 .. code:: shell
 
-	  pip install gym
+    pip install gym
 
 You'll be able to run a few environments right away:
 
@@ -80,13 +80,13 @@ On OSX:
 
 .. code:: shell
 
-	  brew install cmake boost boost-python sdl2 swig wget
+    brew install cmake boost boost-python sdl2 swig wget
 
 On Ubuntu 14.04:
 
 .. code:: shell
 
-	  apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
+    apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
 
 MuJoCo has a proprietary dependency we can't set up for you. Follow
 the
@@ -102,7 +102,7 @@ We currently support Linux and OS X running Python 2.7 or 3.5. Some users on OSX
 
 .. code:: shell
 
-	  brew install boost-python --with-python3
+    brew install boost-python --with-python3
 
 If you want to access Gym from languages other than python, we have limited support for non-python
 frameworks, such as lua/Torch, using the OpenAI Gym `HTTP API <https://github.com/openai/gym-http-api>`_.
@@ -154,10 +154,10 @@ sequence.
 
 .. code:: python
 
-	  import gym
-	  env = gym.make('Copy-v0')
-	  env.reset()
-	  env.render()
+    import gym
+    env = gym.make('Copy-v0')
+    env.reset()
+    env.render()
 
 Atari
 -----
@@ -166,10 +166,10 @@ The Atari environments are a variety of Atari video games. If you didn't do the
 
 .. code:: python
 
-	  import gym
-	  env = gym.make('SpaceInvaders-v0')
-	  env.reset()
-	  env.render()
+    import gym
+    env = gym.make('SpaceInvaders-v0')
+    env.reset()
+    env.render()
 
 This will install ``atari-py``, which automatically compiles the `Arcade Learning Environment <http://www.arcadelearningenvironment.org/>`_. This can take quite a while (a few minutes on a decent laptop), so just be prepared.
 
@@ -180,10 +180,10 @@ Box2d is a 2D physics engine. You can install it via  ``pip install -e '.[box2d]
 
 .. code:: python
 
-	  import gym
-	  env = gym.make('LunarLander-v2')
-	  env.reset()
-	  env.render()
+    import gym
+    env = gym.make('LunarLander-v2')
+    env.reset()
+    env.render()
 
 Classic control
 ---------------
@@ -192,10 +192,10 @@ These are a variety of classic control tasks, which would appear in a typical re
 
 .. code:: python
 
-	  import gym
-	  env = gym.make('CartPole-v0')
-	  env.reset()
-	  env.render()
+    import gym
+    env = gym.make('CartPole-v0')
+    env.reset()
+    env.render()
 
 MuJoCo
 ------
@@ -208,10 +208,27 @@ to set it up. You'll have to also run ``pip install -e '.[mujoco]'`` if you didn
 
 .. code:: python
 
-	  import gym
-	  env = gym.make('Humanoid-v1')
-	  env.reset()
-	  env.render()
+    import gym
+    env = gym.make('Humanoid-v2')
+    env.reset()
+    env.render()
+
+
+Robotics
+------
+
+`MuJoCo <http://www.mujoco.org/>`_ is a physics engine which can do
+very detailed efficient simulations with contacts and we use it for all robotics environments. It's not
+open-source, so you'll have to follow the instructions in `mujoco-py
+<https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
+to set it up. You'll have to also run ``pip install -e '.[mujoco]'`` if you didn't do the full install.
+
+.. code:: python
+
+    import gym
+    env = gym.make('HandManipulateBlock-v0')
+    env.reset()
+    env.render()
 
 Toy text
 --------
@@ -220,10 +237,10 @@ Toy environments which are text-based. There's no extra dependency to install, s
 
 .. code:: python
 
-	  import gym
-	  env = gym.make('FrozenLake-v0')
-	  env.reset()
-	  env.render()
+    import gym
+    env = gym.make('FrozenLake-v0')
+    env.reset()
+    env.render()
 
 Examples
 ========
@@ -241,14 +258,15 @@ We are using `pytest <http://doc.pytest.org>`_ for tests. You can run them via:
 
 .. code:: shell
 
-	  pytest
+    pytest
 
 
 .. _See What's New section below:
 
 What's new
 ==========
 
+- 2018-02-28: Release of a set of new robotics environments.
 - 2018-01-25: Made some aesthetic improvements and removed unmaintained parts of gym. This may seem like a downgrade in functionality, but it is actually a long-needed cleanup in preparation for some great new things that will be released in the next month.
 
     + Now your `Env` and `Wrapper` subclasses should define `step`, `reset`, `render`, `close`, `seed` rather than underscored method names.

diff --git a/bin/render.py b/bin/render.py
@@ -0,0 +1,21 @@
+#!/usr/bin/env python3
+import argparse
+import gym
+
+
+parser = argparse.ArgumentParser(description='Renders a Gym environment for quick inspection.')
+parser.add_argument('env_id', type=str, help='the ID of the environment to be rendered (e.g. HalfCheetah-v1')
+parser.add_argument('--step', type=int, default=1)
+args = parser.parse_args()
+
+env = gym.make(args.env_id)
+env.reset()
+
+step = 0
+while True:
+    if args.step:
+        env.step(env.action_space.sample())
+    env.render()
+    if step % 10 == 0:
+        env.reset()
+    step += 1
diff --git a/gym/__init__.py b/gym/__init__.py
@@ -7,7 +7,7 @@
 from gym.utils import reraise
 from gym.version import VERSION as __version__
 
-from gym.core import Env, Space, Wrapper, ObservationWrapper, ActionWrapper, RewardWrapper
+from gym.core import Env, GoalEnv, Space, Wrapper, ObservationWrapper, ActionWrapper, RewardWrapper
 from gym.envs import make, spec
 from gym import wrappers, spaces, logger
 

diff --git a/gym/core.py b/gym/core.py
@@ -1,6 +1,7 @@
 from gym import logger
 import numpy as np
 
+import gym
 from gym import error
 from gym.utils import closer
 
@@ -150,6 +151,46 @@ def __str__(self):
         else:
             return '<{}<{}>>'.format(type(self).__name__, self.spec.id)
 
+
+class GoalEnv(Env):
+    """A goal-based environment. It functions just as any regular OpenAI Gym environment but it
+    imposes a required structure on the observation_space. More concretely, the observation
+    space is required to contain at least three elements, namely `observation`, `desired_goal`, and
+    `achieved_goal`. Here, `desired_goal` specifies the goal that the agent should attempt to achieve.
+    `achieved_goal` is the goal that it currently achieved instead. `observation` contains the
+    actual observations of the environment as per usual.
+    """
+
+    def reset(self):
+        # Enforce that each GoalEnv uses a Goal-compatible observation space.
+        if not isinstance(self.observation_space, gym.spaces.Dict):
+            raise error.Error('GoalEnv requires an observation space of type gym.spaces.Dict')
+        result = super(GoalEnv, self).reset()
+        for key in ['observation', 'achieved_goal', 'desired_goal']:
+            if key not in result:
+                raise error.Error('GoalEnv requires the "{}" key to be part of the observation dictionary.'.format(key))
+        return result
+
+    def compute_reward(self, achieved_goal, desired_goal, info):
+        """Compute the step reward. This externalizes the reward function and makes
+        it dependent on an a desired goal and the one that was achieved. If you wish to include
+        additional rewards that are independent of the goal, you can include the necessary values
+        to derive it in info and compute it accordingly.
+
+        Args:
+            achieved_goal (object): the goal that was achieved during execution
+            desired_goal (object): the desired goal that we asked the agent to attempt to achieve
+            info (dict): an info dictionary with additional information
+
+        Returns:
+            float: The reward that corresponds to the provided achieved goal w.r.t. to the desired
+            goal. Note that the following should always hold true:
+
+                ob, reward, done, info = env.step()
+                assert reward == env.compute_reward(ob['achieved_goal'], ob['goal'], info)
+        """
+        raise NotImplementedError()
+
 # Space-related abstractions
 
 class Space(object):
@@ -249,6 +290,9 @@ def close(self):
     def seed(self, seed=None):
         return self.env.seed(seed)
 
+    def compute_reward(self, achieved_goal, desired_goal, info):
+        return self.env.compute_reward(achieved_goal, desired_goal, info)
+
     def __str__(self):
         return '<{}{}>'.format(type(self).__name__, self.env)