8000 GitHub - nrjc/LEOC
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

nrjc/LEOC

Repository files navigation

Locally Enforced Optimal Controller (LEOC)

This repo contains an implementation of the paper LEOC: A Principled Method in Integrating Reinforcement Learning and Classical Control Theory submitted to the 3rd Annual Learning for Dynamics & Control Conference (L4DC 2021).

The implementation is based off the PILCO and DDPG frameworks and written in Python 3. TensorFlow v2, Tensorflow Agents and GPflow v2 packages have also been used for optimisation and learning.

The rest of this document brings the reader through on how to set up the implementation and reproduce some of the results in the paper.

Installation

  1. Install venv
virtualenv -p python3 venv
source venv/bin/activate
  1. Install requirements
pip install -r requirements.txt
python setup.py develop

File directory

Before we move on, we take a tour of the file directory. The main training loop is called in run.py, configured by a config file in data and defined in dao/trainer.py. In this loop, objects associated with the policies are imported from DDPG and pilco. The main training loop would also save the trained models in controllers, and training rewards as byte stream in pickle. Scripts to perform miscellaneous experiments on the trained policies are kept in plotting.

LEOC
│   README.md
│   LICENSE
│   requirements.txt
│   setup.py
│   run.py
│
└───DDPG  # files for implementing the DDPG network
│   └───...
│   
└───pilco  # files for implementing the PILCO framework
│   └───...
│   
└───dao  # util files for dependency injection
│   │   trainer.py
│   │   ...
│   │
│   └───envs  # environment files
│       │   cartpole_env.py
│       └───...
│       
└───data  # .gin configuration files
│   │   Cartpole_DDPG_Baseline.gin
│   └───...
│   
└───controllers  # saved trained controllers
│   └───...
│   
└───pickle  # saved training rewards
│   └───...
│   
└───plotting  # scripts for plotting graphs in the paper
│   │   plotter.py
│   └───...
│
└───resources  # resources for README.md
│   └───...
│
└───...

Example of usage

Once the dependecies have been installed, one can run the code to train the policies and reproduce the results in the paper.

Training of policies

The experimental sections 5 & 6 of the paper require the trained baseline PILCO and DDPG controllers as well as their hybrid counterparts for subsequent analysis.

Since there are three environments, each of which sees a baseline and a hybrid policy in the PILCO and DDPG frameworks, in addition to a linear controller, there are therefore altogether 3 x (2 x 2 + 1) = 15 policies. Each of these policies is configured in a .gin file.

Briefly, the linear controller is an engineered controller designed around the operating point. It can be viewed as a non-updating network with only one layer.

drawing

The PILCO and DDPG baseline policies are learnt, multi-layer networks.

drawing

Finally, our LEOC hybrid policies integrate the linear and non-linear controllers.

drawing

For more technical details, please refer to Section 4 of the paper.

Training a policy is easy. Simply run run.py in the root directory with the appropriate .gin config file. For instance, to obtain a trained DDPG baseline policy for CartPole, one could run

python3 run.py -file data/Cartpole_DDPG_Baseline.gin

Analogous commands to train other controllers could be run with the respective .gin configuration files.

Demos

The trained stable policies in each of the experimental environments would behave like the following: drawing drawing drawing

Rewards curve: Figure 6

When the policies are trained, their training rewards have been dumped in the pickle folder. To visualise these rewards, run

python3 plotting/plot_learning_curve.py

Transient responses: Figure 7 & Table 1

One could obtain the impulse and step responses of the trained controllers. Note that the current setup requires a controller in each of the environments.

python3 plotting/plot_response.py

To output the metrics presented in Table 1, run

python3 plotting/compute_metrics.py

Robustness analysis: Figure 8

Finally, with multiple trained policies in each configuration, we could test their robustness.

python3 plotting/plot_robustness.py

Credits:

The pilco folder of this implementation is forked off an existing PILCO repo. Similarly, the DDPG implementation relies heavily on Tensorflow Agents. Credits also go out to OpenAI for and its gym environments for making testing possible. Neural network/architecture plots have been made with the amazing tools built at PlotNeuralNet.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

0