Delusions

A PyTorch Implementation for experiments in "Rejecting Hallucinated State Targets during Planning"

-- an ICML 2025 conference paper by Mingde "Harry" Zhao, Tristan Sylvain, Romain Laroche, Doina Precup, Yoshua Bengio

BibTex:

@inproceedings{
	zhao2025reject,
	title={Rejecting Hallucinated State Targets during Planning},
	author={Mingde Zhao, Tristan Sylvain, Romain Laroche, Doina Precup, Yoshua Bengio},
	booktitle={Forty-Second International Conference on Machine Learning (ICML)},
	year={2025},
	url={https://openreview.net/forum?id=40gBawg6LX},
	note={\url{https://arxiv.org/abs/2410.07096}},
}

This repo was implemented by Harry Zhao (@PwnerHarry), mostly adapted from Skipper and DreamerV2-pytorch

This work was initiated during Harry's Mitacs Internship at RBC Borealis (formerly Borealis AI), under the mentorship of Tristan Sylvain (@TiSU32).

Use `common/evaluator.py` for SIMPLE integration with your agent!!!

Python virtual environment configuration:

Create a virtual environment with conda or venv. We used Python 3.10 for minigrid experiments. Note that for Dreamer experiments, Python <=3.10 is needed for compatibility with pip install ale-py==0.7.5 (from pip install -r experiments/Dreamer/requirements.txt)
Install PyTorch according to the official guidelines, make sure it recognizes your GPUs!
pip install -r requirements.txt to install dependencies (for Skipper, LEAP and Dyna on RandDistShift and SwordShieldMonster, one shared virtual environment would be sufficient. Dreamer experiments would need a separate one for the distinctive requirements.)

Check the results with `tensorboard`!

For experiments, write bash scripts to call these `python` scripts:

experiments/Skipper/run_minigrid_mp.py: a multi-processed experiment initializer for Skipper variants for minigrid experiments

experiments/{Skipper,Dyna}/run_minigrid.py: a single-processed experiment initializer for Skipper or Dyna minigrid experiments

experiments/LEAP/run_leap_pretrain_vae.py: a single-processed experiment initializer for pretraining generator for the LEAP agent

experiments/LEAP/run_leap_pretrain_rl.py: a single-processed experiment initializer for training distance estimator (policy) for the LEAP agent. Provide the existing seed acquired from run_leap_pretrain_vae.py.

experiments/Dreamer/dreamerv2/train.py: for running the PyTorch DreamerV2. Use --evaluator True --evaluator_reject True to enable the training of and rejection by the evaluator, respectively.

For minigrid experiments, please read carefully the argument definitions in runtime.py and pass the desired arguments.

To control the HER variants (`minigrid` experiments):

Use --hindsight_strategy to specify the hindsight relabeling strategy. The options are:

future: same as "future" variant in paper
episode: same as "episode" variant in paper
pertask: same as "pertask" variant in paper
future+episode: correspond to "E" variant in paper
future+pertask: correspond to "P" variant in paper
future+episode@0.5: correspond to "(E+P)" variant in paper, where 0.5 controls the mixture ratio of pertask

To use the "generate" strategy for estimator training, use --prob_relabel_generateJIT to specify the probability of replacing the relabeled target:

--hindsight_strategy future+episode --prob_relabel_generateJIT 1.0: correspond to "G" variant in paper
--hindsight_strategy future+episode --prob_relabel_generateJIT 0.5: correspond to "(E+G)" variant in paper
--hindsight_strategy future+episode@0.333 --prob_relabel_generateJIT 0.25: correspond to "(E+P+G)" variant in paper

To choose environment and training settings:

--game SwordShieldMonster --size_world 12 --num_envs_train 50: game can be switched with RandDistShift (RDS) and size_world should >= 8

Extras

There is a potential CUDA_INDEX_ASSERTION error that could cause hanging at the beginning of the *Skipper *runs. We don't know yet how to fix it
The Dynamic Programming (DP) solutions for minigrid ground truth are only compatible with deterministic experiments

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
common		common
experiments		experiments
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Delusions

Use `common/evaluator.py` for SIMPLE integration with your agent!!!

Python virtual environment configuration:

Check the results with `tensorboard`!

For experiments, write bash scripts to call these `python` scripts:

To control the HER variants (`minigrid` experiments):

To choose environment and training settings:

Extras

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mila-iqia/Delusions

Folders and files

Latest commit

History

Repository files navigation

Delusions

Use common/evaluator.py for SIMPLE integration with your agent!!!

Python virtual environment configuration:

Check the results with tensorboard!

For experiments, write bash scripts to call these python scripts:

To control the HER variants (minigrid experiments):

To choose environment and training settings:

Extras

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Use `common/evaluator.py` for SIMPLE integration with your agent!!!

Check the results with `tensorboard`!

For experiments, write bash scripts to call these `python` scripts:

To control the HER variants (`minigrid` experiments):

Packages