8000 Gymnasium migration by Rohan138 · Pull Request #177 · facebookresearch/mbrl-lib · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Sep 1, 2024. It is now read-only.

Gymnasium migration #177

Merged
merged 40 commits into from
Mar 29, 2023
Merged

Gymnasium migration #177

merged 40 commits into from
Mar 29, 2023

Conversation

Rohan138
Copy link
Contributor
@Rohan138 Rohan138 commented Feb 1, 2023

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

Migrating mbrl-lib from gym=0.17 all the way to a combination of gym=0.26.2 and gymnasium=0.26.3.

How Has This Been Tested (if it applies)

Will need thorough testing to ensure I didn't break anything, and also to make sure that performance is retained after refactoring the step API to use terminated and truncated.

Checklist

  • The documentation is up-to-date with the changes I made.
  • I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
  • All tests passed, and additional code has been covered with new tests.

8000
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 1, 2023
@Rohan138
Copy link
Contributor Author
Rohan138 commented Feb 13, 2023

Update: Most files have been changed and fixed, tests with tests/core, tests/pybullet, tests/mujoco, tests/dmcontrol.
Still fixing tests/algorithms.

@Rohan138 Rohan138 marked this pull request as ready for review February 19, 2023 18:34
@Rohan138
Copy link
Contributor Author

@luisenp @natolambert this should be ready for review.

  • Note: I decided to add the --follow-imports=skip flag to mypy for now, typing the whole codebase correctly with all the gymnasium and gym wrappers around dmcontrol and pybullet envs is too large a refactor, hope that's okay?
  • @raghavauppuluri13 is currently running learning experiments with this and the current main branch to verify that the migration does not affect performance; we'll put up results in a day or two.

Copy link
Contributor
@natolambert natolambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick review, interesting to me. Left some feedback, which maybe can be addressed in comments.

return (
ob,
reward,
done,
terminated,
False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add logic for environments to count number of steps for done, or is that handled elsewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to be TimeLimit wrapper to take care of that. I imagine something similar is still being done here.

Copy link
Contributor Author
@Rohan138 Rohan138 Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled by the gymnasium.wrappers.TimeLimit wrapper we wrap all our environments with whenever they're instantiatied.

I just dug into this a little deeper though, noticed a minor bug:
https://github.com/facebookresearch/mbrl-lib/blob/main/mbrl/util/mujoco.py#L100
In this one place, max_episode_steps=1000 is hardcoded here, whereas everywhere else, it's max_episode_steps=cfg.overrides.get("trial_length", 1000) e.g. https://github.com/facebookresearch/mbrl-lib/blob/main/mbrl/util/env.py#L94

I'll file this one as an issue separately, fix it in another branch since it's an existing bug not directly related to the Gymnasium migration. #180

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, filing issue sounds good. Thanks!

return (
self._get_obs(),
reward,
done,
terminated,
False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ also resolved with above

@raghavauppuluri13
Copy link
Contributor
raghavauppuluri13 commented Feb 28, 2023

Benchmarking the PR against main. Here are the reward curves for some select envs. The planet algorithms are still training, but you can see they are generally similar. - EDIT: All are finished training
image

@natolambert
Copy link
Contributor

@raghavauppuluri13 lmk if you're compute limited, shouldn't be too hard for me to run some too.

@raghavauppuluri13
Copy link
Contributor
raghavauppuluri13 commented Feb 28, 2023

@raghavauppuluri13 lmk if you're compute limited, shouldn't be too hard for me to run some too.

Do we want to benchmark all the env/algo combos in the examples? Or is this good enough? If so, that would be great. I've got a helper script.

#!/bin/bash

main_envs=("pets_reacher" "pets_pusher" "planet_cartpole_swingup" "planet_finger_spin" "mbpo_halfcheetah" "mbpo_inv_pendulum")
diff_envs=("pets_reacher" "pets_pusher" "planet_cartpole_swingup" "planet_finger_spin" "mbpo_half_cheetah_v4" "mbpo_inv_pendulum_v4")
algos=("pets" "pets" "planet" "planet" "mbpo" "mbpo")
device=("cuda:0" "cuda:0" "cuda:0" "cuda:0" "cuda:0" "cuda:0")
envs=$main_envs
tsp -S 6

for i in ${!envs[@]}; do
    if [[ "${algos[$i]}" == "planet" ]]; then
        tsp python -m mbrl.examples.main algorithm=${algos[$i]} dynamics_model=${algos[$i]} overrides=${envs[$i]} device=${device[$i]}
    else
        tsp python -m mbrl.examples.main algorithm=${algos[$i]} overrides=${envs[$i]} device=${device[$i]}
    fi
done

@raghavauppuluri13
Copy link
Contributor
raghavauppuluri13 commented Mar 2, 2023

@natolambert @luisenp Any other validation steps before the pr is ready to merge? (I updated the benchmarks with the finished training curves)

Copy link
Contributor
@luisenp luisenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also bump the version number and update the CHANGELOG. After that (pending my other question) we should be good to merge.

Copy link
Contributor
@luisenp luisenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some final comments. We can merge after addressing these. Thanks!

@natolambert
Copy link
Contributor

Awesome work all, have been following along happily.

@Rohan138
Copy link
Contributor Author

@luisenp Could I get another review? Should be good to merge now, let me know if there's anything else I can change

@luisenp luisenp merged commit 811c234 into facebookresearch:main Mar 29, 2023
luisenp pushed a commit that referenced this pull request Mar 29, 2023
  * Migrated from gym to Gymnasium
  * gym==0.26.3 is still required for the dm_control and pybullet-gym environments
  * Transition and TranistionBatch now support the terminated and truncated booleans instead of the single done * boolean previously used by gym
  * Migrated calls to env.reset() which now returns a tuple of obs, info instead of just obs
  * Migrated calls to env.step() which now returns a observation, reward, terminated, truncated, info
  * Migrated to Gymnasium render API, environments are instantiated with render_mode=None by default
  * DMC and PyBullet envs use the original gym wrappers to turn them into gym environments, then are wrapper by gymnasium.envs.GymV20Environment
  * All Mujoco envs use the DeepMind Mujoco bindings, mujoco-py is deprecated as a dependency
  * Custom Mujoco envs e.g. AntTruncatedObsEnv inherit from gymnasium.envs.mujoco_env.MujocoEnv, and access data through self.data instead of self.sim.data
  * Mujoco environment versions have been updated to v4 fromv2 e.g. Hopper-v4
  * Fixed PlaNet to save model to a directory instead of a file name
  * Added follow-imports=skip to mypy CI test to allow for gymnasium/gym wrapper compatibility
  * Bumped black to version 0.23.1 in CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0