8000 Simulate latency v2 and use a policy wrapper to handle latency by alexander-soare · Pull Request #272 · huggingface/lerobot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Simulate latency v2 and use a policy wrapper to handle latency #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

alexander-soare
Copy link
Contributor
@alexander-soare alexander-soare commented Jun 14, 2024

What this does

At its core this PR adds PolicyRolloutWrapper a way of interfacing between a policy which has a notion of time-steps but not really time, and an environment which does care about time. Here are the major changes to go along with that:

  • eval.py::rollout now interacts with PolicyRolloutWrapper.
  • eval.py::rollout has a flag to simulate latency. This is a matter of providing a timeout option to the PolicyRolloutWrapperInstance, and managing a next_action across loop iterations.
  • I removed select_actions from the policy protocol. The policy now just accepts observation sequences and returns action sequences. The wrapper handles caching.

Caveat: We assume that all policies return actions sequences starting from the current time. I believe the design is such that this assumption can be relaxed without too much work.

TODO:

  • Follow through with ACT/TD-MPC.
  • Update examples
  • Add an example?

How it was tested

TODO:

  • Thorough CI tests to ensure that PolicyRolloutWrapper behaves as expected.
  • Test that I still get the same eval results as before with our hub policies.

How to checkout & try? (for the reviewer)

Run this to get a regular simulation with no latency simulation. It should behave as before.

python lerobot/scripts/eval.py -p lerobot/diffusion_pusht eval.n_episodes=2 eval.batch_size=2 env.episode_length=300 policy.n_action_steps=15 policy.num_inference_steps=10 +policy.noise_scheduler_type=DDIM +eval.do_simulate_latency=false +eval.n_action_buffer=0
eval_episode_0.mp4

Run this to get a simulation with latency being simulated, and pre-emptive inference (notice how the trajectories are updated before they are depleted):

python lerobot/scripts/eval.py -p lerobot/diffusion_pusht eval.n_episodes=2 eval.batch_size=2 env.episode_length=300 policy.n_action_steps=15 policy.num_inference_steps=10 +policy.noise_scheduler_type=DDIM +eval.do_simulate_latency=false +eval.n_action_buffer=4
eval_episode_1.mp4

Play around with policy.num_inference_steps and eval.n_action_buffer to appreciate the interplay between inference time and need to do preemptive inference. The higher the inference time, the bigger of a buffer you need. For example, try policy.num_inference_steps=100 with +eval.n_action_buffer=2. You will find that things get bad fast:

eval_episode_0.mp4

This change is Reviewable

@alexander-soare alexander-soare marked this pull request as draft June 14, 2024 14:37
- The diffusion model generates `horizon` steps worth of actions.
- `n_action_steps` worth of actions are actually kept for execution, starting from the current step.
Schematically this looks like:
----------------------------------------------------------------------------------------------
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Put part of this documentation back in somehow as it's helpful

@@ -84,6 +84,8 @@ eval:
batch_size: 1
# `use_async_envs` specifies whether to use asynchronous environments (multiprocessing).
use_async_envs: false
n_action_buffer: 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Point to documentation

@alexander-soare alexander-soare requested a review from Cadene June 14, 2024 14:54
timeout_ = timeout - elapsed - buffer
if timeout_ > 0:
self._future.result(timeout=timeout_)
except TimeoutError:
Copy link
Contributor Author
@alexander-soare alexander-soare Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Put this try/except just outside the self._future.result(timeout=timeout_).

3. Returns a sequence of actions starting from the requested timestamp (from a cache which is
populated by inference outputs).

If `timeout` is not provided, inference is run synchronously. If `timeout` is provided, inference runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this timeout param is confusing. If I understand it correctly, the whole point of the PolicyRollout is because inference is likely to take longer than 1/fps (e.g. 0.1s). As such, when this method decides to run inference, there should be no expectation that inference completes within 0.1s and returns an action sequence.

I would therefore propose that when inference is called, there should be no "attempt to wait for inference to complete, within the bounds of the timeout parameter." It should just run inference asynchronously, and continue to return the existing action_sequence. However, the timeout parameter should only check that the asynchronous inference process runs within the specified timeout period (e.g. 0.5s), and throws an error / warning if inference duration exceeds that timeout value. This check is merely to ensure that the action buffer doesn't get drawn down to 0 before a new set of actions are added to the buffer.

Copy link
Contributor Author
@alexander-soare alexander-soare Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ruijie-he thanks for commenting. Let me try to unpack/address your comment.

there should be no expectation that inference completes within 0.1s and returns an action sequence

I agree.

I would therefore propose that when inference is called, there should be no "attempt to wait for inference to complete, within the bounds of the timeout parameter

Why not? If we are running with 1/fps=0.1 we can give ourselves up to that amount of time to get an updated action sequence. For example, say a previously invoked inference is still running at the moment I call provide_observation_get_actions. It takes a trivial amount of time to look up an action from the cache. By your reasoning, we should return it immediately. But it's not going to be used for another ~0.0999 s. Now, what if the previously invoked inference was just about to finish (say, in 0.05 s)? Wouldn't we rather use the fresher action trajectory? And what does it cost us to wait? Nothing really, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think it depends on the expectation of when an action is executed after an observation is provided. My expectation is that we use an action as soon as it gets returned by this method, whereas it sounds like your expectation is that it is only executed at the end of that time window.

Because we use it as soon as we get an action, the varying time taken for inference matters, since it is important that actions are triggered at a regular time interval (i.e. 0.1s), rather than with variations across intervals (e.g. 0.06s, 0.14s, 0.08s, 0.12s, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation is that we use an action as soon as it gets returned by this method

So you don't have a control cycle you need to abide by?

it sounds like your expectation is that it is only executed at the end of that time window.

I think my expectation is that we execute actions at a fixed periodic interval.

since it is important that actions are triggered at a regular time interval

Yes this is exactly what I mean.

Let me try to illustrate what I mean with a diagram. Thanks for helping me get to the bottom of it.

video.mp4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the video explanation. I agree that we can just have an external control cycle so that we can execute the latest available action at a fixed interval.

I think that my broader point is that it's more important to have a timeout parameter that is longer than the action / observation time interval of 0.1s.
This timeout parameter (e.g. 0.5s) is needed to check that the asynchronous inference process runs within the specified timeout period, and throws an error / warning if inference duration exceeds that timeout value. This check is necessary to ensure that the action buffer doesn't get drawn down to 0 before a new set of actions is added to the buffer. If inference returns in less than 0.1s, great, then we'd use the new action, but that's less important than ensuring that inference always completes in time to refill the action buffer.

Copy link
Contributor Author
@alexander-soare alexander-soare Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems your setup is you have a 10Hz control loop but inference takes 0.5 s. Let's say you are producing action horizons of 15, so you probably want inference to run whenever you're down to 6 actions left in the cache. So here's what happens:

0.00  - The first time, you call the method with no timeout.
0.50  - You receive an action and execute it immediately. You also call the method with another
        observation, a timeout of 0.09, and a request for the action that you'd want to execute
        at 0.6. ·
0.59  - You receive an action (it's the 2nd action of the 1st inference).
0.60  - You execute the action, get an observation, and call the method with a timeout of 0.09,
        and requesting the action to execute at 0.7.
0.69  - Receive the 3rd action of the 1st inference.
0.70  - Execute action, get obs, call method with timeout=0.09, requesting action for 0.8.

...

1.29  - Receive the 9th action of the 1st inference.
        (only 6 actions left in the cache! 2nd inference starts)
...

1.70 - Execute action, get obs, call method with timeout=0.09, requesting action for 1.8.
1.78 - 2nd inference completes before your timeout! All remaining actions from 1st
       inference are overwritten in the cache.
1.79 - Receive the 7th action of the 2nd inference. Yes, the 7th. That's because the 2nd
       inference is based on timestamp 1.2. We skipped actions for timestamps
       1.2, 1.3, 1.4, 1.5, 1.6, 1.7.
1.80 - Execute action, get obs, call method with timeout=0.09, requesting action for 1.9.

None of the story above required you to ever tell the method that inference takes 0.5 seconds. BUT, you did have to tell the method (via an instance variable n_action_buffer), that you want inference whenever you have 6 or less future actions left in the cache.

Correct me if I'm wrong, but it seems that whatever it is you want to happen by providing 0.5 seconds as a parameter, is covered by the n_action_buffer parameter, which is also technically in units of time (the units are effectively 1/fps). In fact I could even change n_action_buffer to be s_action_buffer (how many seconds of buffer we want for the action cache). You'd then set it to 0.6 (0.5 + 0.1 to be safe).

Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm now confused if your timeout parameter should be in my example. In your latest comment, you are suggesting it should be 0.09, but in your previous comment, when you linked to the timeout warning, it suggests that we need to set it to 0.5s, otherwise I'll start getting a warning when inference takes more than 0.09s but less than 0.5s. If so, we don't need this warning since I expect inference to take 0.5s.

Regarding n_action_buffer, that only indicates when you want to start running another inference (e.g. when there are 6 actions remaining in the buffer), but doesn't provide a warning if inference actually ended up taking longer than the expected inference time (i.e. 0.5s), and is starting to fall behind what I had anticipated when I set n_action_buffer.

Anyway, I think that we've gone back and forth long enough, and greatly appreciate you taking the time to understand this. Hopefully the main issues are clear at this point, and maybe you could just clarify in your code comments which timeout parameter we're referring to here?

Copy link
Contributor Author
@alexander-soare alexander-soare Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in your previous comment, when you linked to the timeout warning, it suggests that we need to set it to 0.5s

Sorry but that's not what I was suggesting. Can you clarify why you think that? I'll clarify under what the conditions the error is raised: it is raised if inference is required (according to the n_action_buffer criteria) but a previous inference is still running. This makes it so that if inference requests are piling up, it fails.

but doesn't provide a warning if inference actually ended up taking longer than the expected

You are right that this warning is not provided! I'll consider adding it. Thank you.

I'm thinking:

  • I clarify in the code that the timeout is for how long we're willing to wait for provide_observation_get_actions to give an answer, and it is not necessarily related to inference time.
  • I set max_expected_inference_time = n_action_buffer * fps. Then if this is violated in the inference thread I log a warning like Inference time exceeded n_action_buffer * fps. This may cause inference to fall behind! I like this way because it avoids the user having to provide yet another timeout parameter which could be confusing.

Thanks for your help, and I'm always happy to discuss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that all sounds good, thanks! Let me know when you'd like me to try out your updates.

Thanks for all the work that you and the team are doing here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @ruijie-he, I added this warning in. Indeed I found it useful.

@imstevenpmwork imstevenpmwork added enhancement Suggestions for new features or improvements policies Items related to robot policies labels Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Suggestions for new features or improvements policies Items related to robot policies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0