Implemented for Tensorflow 2.0+
- All agents have tensorboard logs during training!!
- SAC
- DDPG OU Noise
- SAC Discrete
- Install dependancies imported (my tf2 conda env as reference)
- Each file contains example code that runs training on CartPole env
- Training:
python3 TF2_DDPG_LSTM.py
- Tensorboard:
tensorboard --logdir=DDPG/logs
- Install hyperopt https://github.com/hyperopt/hyperopt
- Optional: switch agent used and configure param space in
hyperparam_tune.py
- Run:
python3 hyperparam_tune.py
All agents tested using CartPole env
Name | On/off policy | Model | Action space support | Exploration method |
---|---|---|---|---|
DQN | off-policy | Dense, LSTM | discrete | e-greedy |
DDPG | off-policy | Dense, LSTM | discrete, continuous | OU or Gaussian noise |
AE-DDPG | off-policy | Dense | discrete, continuous | Random walk noise |
SAC | off-policy | Dense | continuous | Maximum entropy |
Models used to generate the demos are included in the repo, you can also find q value and reward graphs
DQN Basic, time step = 4, 500 reward | DQN LSTM, time step = 4, 500 reward |
---|---|
DDPG Basic, 222 reward | DDPG LSTM, time step = 5, 500 reward |
---|---|
AE-DDPG Basic, 500 reward |
---|