Gymnasium migration #177

Rohan138 · 2023-02-01T03:07:55Z

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

Migrating mbrl-lib from gym=0.17 all the way to a combination of gym=0.26.2 and gymnasium=0.26.3.

How Has This Been Tested (if it applies)

Will need thorough testing to ensure I didn't break anything, and also to make sure that performance is retained after refactoring the step API to use terminated and truncated.

Checklist

The documentation is up-to-date with the changes I made.
I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
All tests passed, and additional code has been covered with new tests.

Rohan138 · 2023-02-13T16:52:45Z

Update: Most files have been changed and fixed, tests with tests/core, tests/pybullet, tests/mujoco, tests/dmcontrol.
Still fixing tests/algorithms.

… gymnasium

Rohan138 · 2023-02-19T18:51:10Z

@luisenp @natolambert this should be ready for review.

Note: I decided to add the --follow-imports=skip flag to mypy for now, typing the whole codebase correctly with all the gymnasium and gym wrappers around dmcontrol and pybullet envs is too large a refactor, hope that's okay?
@raghavauppuluri13 is currently running learning experiments with this and the current main branch to verify that the migration does not affect performance; we'll put up results in a day or two.

natolambert

Did a quick review, interesting to me. Left some feedback, which maybe can be addressed in comments.

mbrl/env/ant_truncated_obs.py

natolambert · 2023-02-20T00:30:50Z

mbrl/env/ant_truncated_obs.py

        return (
            ob,
            reward,
-            done,
+            terminated,
+            False,


Do we need to add logic for environments to count number of steps for done, or is that handled elsewhere?

There used to be TimeLimit wrapper to take care of that. I imagine something similar is still being done here.

This is handled by the gymnasium.wrappers.TimeLimit wrapper we wrap all our environments with whenever they're instantiatied.

I just dug into this a little deeper though, noticed a minor bug:
https://github.com/facebookresearch/mbrl-lib/blob/main/mbrl/util/mujoco.py#L100
In this one place, max_episode_steps=1000 is hardcoded here, whereas everywhere else, it's max_episode_steps=cfg.overrides.get("trial_length", 1000) e.g. https://github.com/facebookresearch/mbrl-lib/blob/main/mbrl/util/env.py#L94

I'll file this one as an issue separately, fix it in another branch since it's an existing bug not directly related to the Gymnasium migration. #180

Good catch, filing issue sounds good. Thanks!

mbrl/env/ant_truncated_obs.py

natolambert · 2023-02-20T00:33:13Z

mbrl/env/humanoid_truncated_obs.py

        return (
            self._get_obs(),
            reward,
-            done,
+            terminated,
+            False,


same as above.

^ also resolved with above

mbrl/third_party/pytorch_sac_pranz24/sac.py

tests/mujoco/test_util.py

Fixed errors in notebooks after Gymnasium migration

raghavauppuluri13 · 2023-02-28T17:43:51Z

Benchmarking the PR against main. Here are the reward curves for some select envs. ~~The planet algorithms are still training, but you can see they are generally similar.~~ - EDIT: All are finished training

natolambert · 2023-02-28T18:15:50Z

@raghavauppuluri13 lmk if you're compute limited, shouldn't be too hard for me to run some too.

raghavauppuluri13 · 2023-02-28T18:32:00Z

@raghavauppuluri13 lmk if you're compute limited, shouldn't be too hard for me to run some too.

Do we want to benchmark all the env/algo combos in the examples? Or is this good enough? If so, that would be great. I've got a helper script.

#!/bin/bash

main_envs=("pets_reacher" "pets_pusher" "planet_cartpole_swingup" "planet_finger_spin" "mbpo_halfcheetah" "mbpo_inv_pendulum")
diff_envs=("pets_reacher" "pets_pusher" "planet_cartpole_swingup" "planet_finger_spin" "mbpo_half_cheetah_v4" "mbpo_inv_pendulum_v4")
algos=("pets" "pets" "planet" "planet" "mbpo" "mbpo")
device=("cuda:0" "cuda:0" "cuda:0" "cuda:0" "cuda:0" "cuda:0")
envs=$main_envs
tsp -S 6

for i in ${!envs[@]}; do
    if [[ "${algos[$i]}" == "planet" ]]; then
        tsp python -m mbrl.examples.main algorithm=${algos[$i]} dynamics_model=${algos[$i]} overrides=${envs[$i]} device=${device[$i]}
    else
        tsp python -m mbrl.examples.main algorithm=${algos[$i]} overrides=${envs[$i]} device=${device[$i]}
    fi
done

raghavauppuluri13 · 2023-03-02T13:37:04Z

@natolambert @luisenp Any other validation steps before the pr is ready to merge? (I updated the benchmarks with the finished training curves)

luisenp

Let's also bump the version number and update the CHANGELOG. After that (pending my other question) we should be good to merge.

luisenp

Left some final comments. We can merge after addressing these. Thanks!

CHANGELOG.md

mbrl/__init__.py

mbrl/util/replay_buffer.py

natolambert · 2023-03-09T22:19:34Z

Awesome work all, have been following along happily.

Rohan138 · 2023-03-27T03:03:28Z

@luisenp Could I get another review? Should be good to merge now, let me know if there's anything else I can change

* Migrated from gym to Gymnasium * gym==0.26.3 is still required for the dm_control and pybullet-gym environments * Transition and TranistionBatch now support the terminated and truncated booleans instead of the single done * boolean previously used by gym * Migrated calls to env.reset() which now returns a tuple of obs, info instead of just obs * Migrated calls to env.step() which now returns a observation, reward, terminated, truncated, info * Migrated to Gymnasium render API, environments are instantiated with render_mode=None by default * DMC and PyBullet envs use the original gym wrappers to turn them into gym environments, then are wrapper by gymnasium.envs.GymV20Environment * All Mujoco envs use the DeepMind Mujoco bindings, mujoco-py is deprecated as a dependency * Custom Mujoco envs e.g. AntTruncatedObsEnv inherit from gymnasium.envs.mujoco_env.MujocoEnv, and access data through self.data instead of self.sim.data * Mujoco environment versions have been updated to v4 fromv2 e.g. Hopper-v4 * Fixed PlaNet to save model to a directory instead of a file name * Added follow-imports=skip to mypy CI test to allow for gymnasium/gym wrapper compatibility * Bumped black to version 0.23.1 in CI

Rohan138 added 10 commits January 29, 2023 18:49

wip

281c00a

wip

0077789

wip

00eb286

wip

ecd1597

wip

2734ef6

wip

3a0811e

drop python 3.11 for now

ec638f5

wip

8665ded

wip

61e17f1

wip

1a491b2

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 1, 2023

Rohan138 added 6 commits February 9, 2023 10:04

Merge branch 'main' into gymnasium

9c68439

wip

206dfd7

wip

46fa18a

wip

f1c8244

wip

a2a9d1d

wip

61d6c60

Rohan138 and others added 11 commits February 13, 2023 12:08

wip

09caf18

wip - passes tests/algorithms

669af65

wip

dca254f

wip

0d0d203

wip

88a8040

wip

3bfa309

wip

dec83fa

Merge branch 'facebookresearch:main' into gymnasium

edaa68f

wip

83a817e

Merge branch 'gymnasium' of https://github.com/Rohan138/mbrl-lib into…

611e383

… gymnasium

wip

d777e5a

Rohan138 marked this pull request as ready for review February 19, 2023 18:34

Rohan138 added 2 commits February 19, 2023 13:38

wip

6f01689

wip

4082783

natolambert reviewed Feb 20, 2023

View reviewed changes

luisenp and others added 2 commits February 20, 2023 10:46

Fixed errors in notebooks after Gymnasium migration.

35f85f8

wip

9e182d4

Rohan138 mentioned this pull request Feb 20, 2023

[Bug] max_episode_steps=1000 is hardcoded in mujoco.py #180

Open

Rohan138 and others added 6 commits February 20, 2023 14:14

wip

e28cafc

wip

9afe539

Merge pull request #2 from facebookresearch/lep.fix_pets_notebook

b63c1ed

F438
Fixed errors in notebooks after Gymnasium migration

wip

7859750

wip

d18dde3

updated cfgs to support new wrappers

ac9abf1

raghavauppuluri13 mentioned this pull request Mar 1, 2023

[Bug Report] Bug when "freezing" select mujoco envs Farama-Foundation/Gymnasium#352

Closed

1 task

luisenp suggested changes Mar 2, 2023

View reviewed changes

Rohan138 added 2 commits March 3, 2023 17:34

wip

845ba05

wip

ac58d46

luisenp suggested changes Mar 9, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

mbrl/__init__.py Outdated Show resolved Hide resolved

mbrl/util/replay_buffer.py Outdated Show resolved Hide resolved

Update changelog

aa76f04

luisenp merged commit 811c234 into facebookresearch:main Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gymnasium migration #177

Gymnasium migration #177

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gymnasium migration #177

Gymnasium migration #177

Uh oh!

Conversation

Uh oh!

Types of changes

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!