Description
Hello,
I was reviewing your implementation of reference trajectories, and I have a few questions.
In this issue, you mentioned that training for 80k iterations results in a performant policy. I'm curious about how the data is managed during those iterations. Specifically:
-
How many reference trajectories are being tracked simultaneously during training? Is it 32 ?
-
Have you done any internal testing to determine the optimal number of trajectories to track at the same time?
-
Have you considered using a curriculum approach—starting with easier trajectories and gradually increasing difficulty?
Also, I noticed in the code that reference trajectories are resampled every 1000 seconds (resample_motions_for_envs_interval_s). Does this mean trajectories are changed after 1000 seconds of simulated training time? What could be the performance with 100 seconds instead of 1000? I am afraid that my policy can be stuck with an impossible trajectory for 1000 seconds.
Thanks for your help.