-
Notifications
You must be signed in to change notification settings - Fork 165
Conversation
Update: Most files have been changed and fixed, tests with |
@luisenp @natolambert this should be ready for review.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick review, interesting to me. Left some feedback, which maybe can be addressed in comments.
return ( | ||
ob, | ||
reward, | ||
done, | ||
terminated, | ||
False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add logic for environments to count number of steps for done
, or is that handled elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be TimeLimit
wrapper to take care of that. I imagine something similar is still being done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled by the gymnasium.wrappers.TimeLimit
wrapper we wrap all our environments with whenever they're instantiatied.
I just dug into this a little deeper though, noticed a minor bug:
https://github.com/facebookresearch/mbrl-lib/blob/main/mbrl/util/mujoco.py#L100
In this one place, max_episode_steps=1000
is hardcoded here, whereas everywhere else, it's max_episode_steps=cfg.overrides.get("trial_length", 1000)
e.g. https://github.com/facebookresearch/mbrl-lib/blob/main/mbrl/util/env.py#L94
I'll file this one as an issue separately, fix it in another branch since it's an existing bug not directly related to the Gymnasium migration. #180
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, filing issue sounds good. Thanks!
return ( | ||
self._get_obs(), | ||
reward, | ||
done, | ||
terminated, | ||
False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ also resolved with above
Fixed errors in notebooks after Gymnasium migration
@raghavauppuluri13 lmk if you're compute limited, shouldn't be too hard for me to run some too. |
Do we want to benchmark all the env/algo combos in the examples? Or is this good enough? If so, that would be great. I've got a helper script.
|
@natolambert @luisenp Any other validation steps before the pr is ready to merge? (I updated the benchmarks with the finished training curves) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also bump the version number and update the CHANGELOG. After that (pending my other question) we should be good to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some final comments. We can merge after addressing these. Thanks!
Awesome work all, have been following along happily. |
@luisenp Could I get another review? Should be good to merge now, let me know if there's anything else I can change |
* Migrated from gym to Gymnasium * gym==0.26.3 is still required for the dm_control and pybullet-gym environments * Transition and TranistionBatch now support the terminated and truncated booleans instead of the single done * boolean previously used by gym * Migrated calls to env.reset() which now returns a tuple of obs, info instead of just obs * Migrated calls to env.step() which now returns a observation, reward, terminated, truncated, info * Migrated to Gymnasium render API, environments are instantiated with render_mode=None by default * DMC and PyBullet envs use the original gym wrappers to turn them into gym environments, then are wrapper by gymnasium.envs.GymV20Environment * All Mujoco envs use the DeepMind Mujoco bindings, mujoco-py is deprecated as a dependency * Custom Mujoco envs e.g. AntTruncatedObsEnv inherit from gymnasium.envs.mujoco_env.MujocoEnv, and access data through self.data instead of self.sim.data * Mujoco environment versions have been updated to v4 fromv2 e.g. Hopper-v4 * Fixed PlaNet to save model to a directory instead of a file name * Added follow-imports=skip to mypy CI test to allow for gymnasium/gym wrapper compatibility * Bumped black to version 0.23.1 in CI
Types of changes
Motivation and Context / Related issue
Migrating mbrl-lib from
gym=0.17
all the way to a combination ofgym=0.26.2
andgymnasium=0.26.3
.How Has This Been Tested (if it applies)
Will need thorough testing to ensure I didn't break anything, and also to make sure that performance is retained after refactoring the
step
API to useterminated
andtruncated
.Checklist