TheMTank · beduffy · Sep 29, 2018 · Sep 30, 2018 · Sep 30, 2018 · Sep 30, 2018
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,4 @@
-
 \.idea/
 
 *.pyc
+/experiments/*
diff --git a/README.md b/README.md
@@ -18,24 +18,30 @@ More detailed information on ai2thor environment can be found on their
 
 <div align="center">
   <img src="docs/bowls_fp_404_compressed_gif.gif" width="294px" />
-  <p>A3C agent learning during training on NaturalLanguagePickUpMultipleObjectTask in one of our customized scenes and tasks with the target object being CUPS!</p>
+  <p>A3C agent training on NaturalLanguagePickUpMultipleObjectTask in one of our customized scenes and tasks with the target object being CUPS!</p>
 </div>
 
-## Overview
+## Running algorithms on ai2thor
 
 This project will include implementations and adaptations of the following papers as a benchmark of 
 the current state of the art approaches to the problem:
 
-- [Ikostrikov's A3C](https://github.com/ikostrikov/pytorch-a3c)
+- [A3C](https://arxiv.org/abs/1602.01783) [Code from Ikostrikov](https://github.com/ikostrikov/pytorch-a3c)
 - [Gated-Attention Architectures for Task-Oriented Language Grounding](https://arxiv.org/abs/1706.07230) 
--- *Original code available on [DeepRL-Grounding](https://github.com/devendrachaplot/DeepRL-Grounding)* 
-also based on Ikostrikov's A3C
+-- A3C with gated attention (A3C_LSTM_GA) *Original code available on [DeepRL-Grounding](https://github.com/devendrachaplot/DeepRL-Grounding)* 
+also based on A3C made by Ikostrikov.
 
 Implementations of these can be found in the algorithms folder and a3c can be run on AI2ThorEnv with:  
-`python algorithms/a3c/main.py`
+- `python algorithms/a3c/main.py`
+- For running a config file which is set to the BowlsVsCups variant of the NaturalLanguagePickUpObjectTask in tasks.py for running A3C_LSTM_GA model:  
+`python algorithms/a3c/main.py --config-file-name NL_pickup_bowls_vs_cups_fp1_config.json --verbose-num-steps True --num-random-actions-at-init 4`
+- For running [ViZDoom](https://github.com/mwydmuch/ViZDoom) (you will need to install ViZDoom) synchronous with 1 process:  
+`python algorithms/a3c/main.py --verbose-num-steps True --sync --vizdoom -v 1`
+- For running atari with 8 processes:  
+`python algorithms/a3c/main.py --atari --num-processes 8`
 
-Check the argparse help for more details and variations of running the algorithm with different 
-hyperparams and on the atari environment as well.
+For A3C's `-eid` param you can specify experiment names which will create folders for checkpointing and hyperparameters, otherwise experiment name is the current date and a concatenated random guid. Check the argparse help for more details and variations of running the algorithm with different 
+hyperparams. 
 
 ## Installation
 
@@ -74,9 +80,11 @@ for episode in range(N_EPISODES):
 
 ### Environment and Task configurations
 
+##### JSON config files and config_dict
+
 The environment is typically defined by a JSON configuration file located on the `gym_ai2thor/config_files` 
-folder. You can find an example `config_example.json` to see how to customize it. Here there is one
-as well:
+folder. You can find a full example at `default_config.json` to see how to customize it. Here there is 
+another one as well:
 
 ```
 # gym_ai2thor/config_files/myconfig.json
@@ -86,6 +94,8 @@ as well:
  'acceptable_receptacles': ['CounterTop', 'TableTop', 'Sink'],
  'openable_objects': ['Microwave'],
  'scene_id': 'FloorPlan28',
+ 'gridSize': 0.1,
+ 'continuous_movement': true,
  'grayscale': True,
  'resolution': (300, 300),
  'task': {'task_name': 'PickUp',
@@ -95,7 +105,11 @@ as well:
 For experimentation it is important to be able to make slight modifications of the environment 
  without having to create a new config file each time. The class `AI2ThorEnv` includes the keyword 
  argument `config_dict`, that allows to input a python dictionary **in addition to** the config file 
- that overrides the parameters described in the config.
+ that overrides the parameters described in the config. In summary, the full interface to the constructor:  
+
+ `env = AI2ThorEnv(env = AI2ThorEnv(config_file=config_file_name, config_dict=config_dict))` 
+
+##### Tasks and TaskFactory
 
 The tasks are defined in `envs/tasks.py` and allow for particular configurations regarding the 
 rewards given and termination conditions for an episode. You can use the tasks that we defined
@@ -128,11 +142,19 @@ class MoveAheadTask(BaseTask):
 
     def reset(self):
         self.step_num = 0
-``` 
+```
+
+Some tasks allow you return extra state by filling in the get_extra_state() function (e.g. for returning a Natural Language instruction within the state). Again, check 
+tasks.py for more details.
+
+##### Examples and Task variants
 
 We encourage you to explore the scripts on the `examples` folder to guide you on the wrapper
  functionalities and explore how to create more customized versions of ai2thor environments and 
  tasks. 
+
+ And most importantly, config files and tasks can be combined together to form **Task variants** e.g. NaturalLanguagePickUpObjectTask but only allowing 
+ cups and bowls to be picked up hence: `gym_ai2thor/config_files/NL_pickup_bowls_vs_cups_fp1_config.json`
 
 Here is the desired result of an example task in which the goal of the agent is to place a cup in the 
 sink.
@@ -145,7 +167,7 @@ sink.
 
 ## The Team
 
-[The M Tank](http://www.themtank.org/) is a non-partisan organisation that works solely to recognise the multifaceted 
+[MTank](http://www.themtank.org/) is a non-partisan organisation that works solely to recognise the multifaceted 
 nature of Artificial Intelligence research and to highlight key developments within all sectors affected by these 
 advancements. Through the creation of unique resources, the combination of ideas and their provision to the public, 
 this project hopes to encourage the dialogue which is beginning to take place globally. 

diff --git a/algorithms/a3c/envs.py → algorithms/a3c/env_atari.py b/algorithms/a3c/envs.py → algorithms/a3c/env_atari.py
@@ -4,11 +4,17 @@
 This contains auxiliary wrappers for the atari openAI gym environment e.g. proper resizing of the
 input frame and a running average normalisation of said frame after resizing
 """
+
+from __future__ import print_function
+
 import cv2
 import gym
 import numpy as np
-from gym.spaces.box import Box
+from gym import spaces
 
+# -----------------
+# Atari preprocessing and wrappers below
+# -----------------
 
 # Taken from https://github.com/openai/universe-starter-agent
 def create_atari_env(env_id):
@@ -18,27 +24,27 @@ def create_atari_env(env_id):
     return env
 
 
-def _process_frame42(frame):
-    frame = frame[34:34 + 160, :160]
-    # Resize by half, then down to 42x42 (essentially mipmapping). If
-    # we resize directly we lose pixels that, when mapped to 42x42,
-    # aren't close enough to the pixel boundary.
-    frame = cv2.resize(frame, (80, 80))
-    frame = cv2.resize(frame, (42, 42))
-    frame = frame.mean(2, keepdims=True)
-    frame = frame.astype(np.float32)
-    frame *= (1.0 / 255.0)
-    frame = np.moveaxis(frame, -1, 0)
-    return frame
-
-
 class AtariRescale42x42(gym.ObservationWrapper):
     def __init__(self, env=None):
         super(AtariRescale42x42, self).__init__(env)
-        self.observation_space = Box(0.0, 1.0, [1, 42, 42])
+        self.observation_space = spaces.Box(0.0, 1.0, [1, 42, 42])
+
+    @staticmethod
+    def _process_frame42(frame):
+        frame = frame[34:34 + 160, :160]
+        # Resize by half, then down to 42x42 (essentially mipmapping). If
+        # we resize directly we lose pixels that, when mapped to 42x42,
+        # aren't close enough to the pixel boundary.
+        frame = cv2.resize(frame, (80, 80))
+        frame = cv2.resize(frame, (42, 42))
+        frame = frame.mean(2, keepdims=True)
+        frame = frame.astype(np.float32)
+        frame *= (1.0 / 255.0)
+        frame = np.moveaxis(frame, -1, 0)
+        return frame
 
     def _observation(self, observation):
-        return _process_frame42(observation)
+        return self._process_frame42(observation)
 
 
 class NormalizedEnv(gym.ObservationWrapper):