模倣学習のベースライン実装

このプロジェクトは模倣学習および報酬学習アルゴリズムを明快に実装することを目指しています。現在、以下のアルゴリズムが実装されています。「離散」「連続」は、それぞれ離散または連続の行動/状態空間をサポートするかどうかを表します。

アルゴリズム (+ 論文リンク)	APIドキュメント	離散	連続
行動クローニング（Behavioral Cloning）	`algorithms.bc`	✅	✅
DAgger	`algorithms.dagger`	✅	✅
密度ベースの報酬モデル	`algorithms.density`	✅	✅
最大因果エントロピー逆強化学習（Maximum Causal Entropy IRL）	`algorithms.mce_irl`	✅	❌
敵対的逆強化学習（AIRL）	`algorithms.airl`	✅	✅
生成的敵対的模倣学習（GAIL）	`algorithms.gail`	✅	✅
人間の嗜好からの深層強化学習（Deep RL from Human Preferences）	`algorithms.preference_comparisons`	✅	✅
Soft Q模倣学習（SQIL）	`algorithms.sqil`	✅	❌

ドキュメントはこちらから確認できます。

最新のベンチマーク結果はこちらから閲覧できます。

インストール

必要条件

Python 3.8以上
（任意）OpenGL（Gymnasium環境のレンダリング用）
（任意）FFmpeg（レンダリング動画のエンコード用）

注：imitation は新しい gymnasium 環境APIのみをサポートしており、古いgym APIは対応していません。

PyPIからのインストール

PyPIリリース版を使用するのが標準的で、ほとんどのユーザーに推奨されます。

pip install imitation

ソースからのインストール

ソースコードからインストールする場合、次のコマンドを実行します。

git clone http://github.com/HumanCompatibleAI/imitation && cd imitation
pip install -e ".[dev]"

通常利用の場合：

pip install .

追加オプションとして、tests、docs、parallel、atariが利用可能です。

macOSユーザーは以下もインストールしてください。

brew install coreutils gnu-getopt parallel

CLIクイックスタート

CLIスクリプトが提供されています。例えば、次のように使用します。

# PPOエージェントをpendulumで訓練し、エキスパートデモを収集
python -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart/rl/

# デモからGAILを訓練
python -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart/rl/rollouts/final.npz demonstrations.source=local

# デモからAIRLを訓練
python -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart/rl/rollouts/final.npz demonstrations.source=local

Pythonインターフェースクイックスタート

例としてexamples/quickstart.pyがあります。

密度報酬ベースライン

密度ベースの報酬基準の例としてこちらのノートブックがあります。

引用（BibTeX）

@misc{gleave2022imitation,
  author = {Gleave, Adam and Taufeeque, Mohammad and Rocamonde, Juan and Jenner, Erik and Wang, Steven H. and Toyer, Sam and Ernestus, Maximilian and Belrose, Nora and Emmons, Scott and Russell, Stuart},
  title = {imitation: Clean Imitation Learning Implementations},
  year = {2022},
  howPublished = {arXiv:2211.11972v1 [cs.LG]},
  archivePrefix = {arXiv},
  eprint = {2211.11972},
  primaryClass = {cs.LG},
  url = {https://arxiv.org/abs/2211.11972},
}

貢献

詳しくは貢献方法を参照してください。

Name		Name	Last commit message	Last commit date
Latest commit History 687 Commits
.circleci		.circleci
.github		.github
benchmarking		benchmarking
< 8000 a title="ci" aria-label="ci, (Directory)" class="Link--primary" href="/i-am-syosei/imitation/tree/master/ci">ci		ci
docs		docs
examples		examples
experiments		experiments
runners		runners
src/imitation		src/imitation
tests		tests
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CITATION.bib		CITATION.bib
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

模倣学習のベースライン実装

インストール

必要条件

PyPIからのインストール

ソースからのインストール

CLIクイックスタート

Pythonインターフェースクイックスタート

密度報酬ベースライン

引用（BibTeX）

貢献

About

Uh oh!

Releases

Packages

Languages

License

i-am-syosei/imitation

Folders and files

Latest commit

History

Repository files navigation

模倣学習のベースライン実装

インストール

必要条件

PyPIからのインストール

ソースからのインストール

CLIクイックスタート

Pythonインターフェースクイックスタート

密度報酬ベースライン

引用（BibTeX）

貢献

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages