This is the code for the MSc Project "Learning to Explore Via Meta Reinforcement Learning", submitted by the student Pietro Mazzaglia for the degree of Master of Science in ACS: Artificial Intelligence to the University of Manchester.
The repo mainly contains the implementation of MIME. a gradient-based meta-RL model augmented with strong exploration capabilities.
- Python 3.6 or above (e.g. 3.6.10)
- PyTorch 1.3.1
- Gym 0.17.2
To avoid any conflict with your existing Python setup, I suggest to use pyenv with virtualenv.
Create a virtual environment, activate it and install the requirements in requirements.txt
, with the command:
pip install -r requirements.txt
The environment used in the MSc Project Results chapter are available through the configs file under the folder configs/msc_project
, and they are:
2D Navigation
- Wide-Ring Goals
- Hard-Exploration Goals
- Close-Ring Goals
- Dog-Feeding Robot
Swing-Up Pendulum
- Dense
- Sparse
Meta-World
- ML10
You can use the train.py
script in order to train the model:
python train.py --use-vime --adapt-eta --config configs/msc_project/dog-feeding-robot-navigation.yaml --output-folder ../mime-experiments/metaworld_ml10/mime/0 --seed 0 --num-workers 6
The relevant flags to switch models are:
--add-noise
: add to the rewards
--use-vime
: activates the exploration module
--adapt-eta
: activates the meta-learning of the parameter
--e-maml
: trains using the E-MAML meta-objective reformulation
To clarify:
- MAML (paper): does not use any of the above flag
- MAML+noise: needs
--add-noise
- MAML+expl: needs
--use-vime
- MIME: needs both
--use-vime
and--adapt-eta
- E-MAML (paper): needs
--e-maml
Other useful flags are:
--seed
: to set the training seed
--output-folder
: where to save the files
Once you have meta-trained the policy, you can test it on the same environment using test.py
:
python test.py --output-folder ../mime-experiments/metaworld_ml10/mime/4
Both the training and testing results are saved in the output folder indicated, providing:
- config.json : containing the model training's parameters
- policy.th : the saved policy model
- dynamics.th : the saved dynamics model (only if
--use-vime
was used) - train_log.txt : command-line logs for training
- train_result.csv : training data (returns, dynamics loss, eta, ...)
- test_log.txt : command-line logs for testing
- test_result.csv : testing data (returns, dynamics loss, eta, ...)
- test: folder containing the testing Tensorboard files
- train_trajectories: folder containing the Tensorboard files to visualize the trajectories (2D Navigation environments only)
To visualize Tensorboard results you can use:
tensorboard --logdir ../mime-experiments/ --samples_per_plugin images=100
The environments and the config files for PEARL are available under the pearl
folder.
The original PEARL implementation can be found here.
The initial MAML implementation that has been here reworked and expandend in many ways was originally developed by Tristan Deleu as a PyTorch re-implementation of MAML.
I would like to sincerely thank him for its clean and well-organised code. Its work is available at this repository.
I also would like to thank the researchers that created the Meta-World (paper) benchmark