APPO : Adversarial Preference-based Policy Optimization

The official implementation for <Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning>

Dependencies

Install pacakges with environment.yml file

conda env create -f environment.yml
pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld

To install packages manually,

conda create -n appo python=3.8
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install tensorboard ipykernel matplotlib seaborn
pip install "gym[mujoco_py,classic_control]==0.23.0"
pip install pyrallis tqdm
pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld

Datasets

Meta-world medium-replay dataset is available in the official repository of LiRE. Meta-world medium-expert dataset was collected by the code provided in the official repository of IPL.

Training

The parameters are in the configuration files under configs/. Set learning rates, network architectures, batch sizes, and other algorithmic hyperparameter by modifying config files.

To train reward model in dial-turn task,

python reward_learning/learn_reward.py --config=configs/dial-turn-v2/reward.yaml

To train APPO in dial-turn task,

python appo.py --config=configs/dial-turn-v2/appo.yaml

To train MR in dial-turn task,

python mr.py --config=configs/dial-turn-v2/mr.yaml

Results

The training results are stored in log/. All experiments were run for 5 random seeds each and learning curves are smoothed by exponential averaging with factor 0.5. Plots are created with plotter.ipynb.

Reference

Our code is based on the official implementation of <Listwise Reward Estimation for Offline Preference-based Reinforcement Learning> (Choi et al., 2024) : https://github.com/chwoong/LiRE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

APPO : Adversarial Preference-based Policy Optimization

Dependencies

Datasets

Training

Results

Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
env		env
reward_learning		reward_learning
.gitignore		.gitignore
README.md		README.md
appo.py		appo.py
base_models.py		base_models.py
environment.yml		environment.yml
logger.py		logger.py
mr.py		mr.py
plotter.ipynb		plotter.ipynb

oh-lab/APPO

Folders and files

Latest commit

History

Repository files navigation

APPO : Adversarial Preference-based Policy Optimization

Dependencies

Datasets

Training

Results

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages