Clarification on Training Iterations/ Batch Dataset for the Motion Reference

Hello,

I was reviewing your implementation of reference trajectories, and I have a few questions.

In this issue, you mentioned that training for 80k iterations results in a performant policy. I'm curious about how the data is managed during those iterations. Specifically:

How many reference trajectories are being tracked simultaneously during training? Is it 32 ?
Have you done any internal testing to determine the optimal number of trajectories to track at the same time?
Have you considered using a curriculum approach—starting with easier trajectories and gradually increasing difficulty?

Also, I noticed in the code that reference trajectories are resampled every 1000 seconds (resample_motions_for_envs_interval_s). Does this mean trajectories are changed after 1000 seconds of simulated training time? What could be the performance with 100 seconds instead of 1000? I am afraid that my policy can be stuck with an impossible trajectory for 1000 seconds.

Thanks for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions