-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Simulate latency v2 and use a policy wrapper to handle latency #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- The diffusion model generates `horizon` steps worth of actions. | ||
- `n_action_steps` worth of actions are actually kept for execution, starting from the current step. | ||
Schematically this looks like: | ||
---------------------------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Put part of this documentation back in somehow as it's helpful
@@ -84,6 +84,8 @@ eval: | |||
batch_size: 1 | |||
# `use_async_envs` specifies whether to use asynchronous environments (multiprocessing). | |||
use_async_envs: false | |||
n_action_buffer: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Point to documentation
timeout_ = timeout - elapsed - buffer | ||
if timeout_ > 0: | ||
self._future.result(timeout=timeout_) | ||
except TimeoutError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Put this try/except just outside the self._future.result(timeout=timeout_)
.
3. Returns a sequence of actions starting from the requested timestamp (from a cache which is | ||
populated by inference outputs). | ||
|
||
If `timeout` is not provided, inference is run synchronously. If `timeout` is provided, inference runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this timeout param is confusing. If I understand it correctly, the whole point of the PolicyRollout is because inference is likely to take longer than 1/fps (e.g. 0.1s). As such, when this method decides to run inference, there should be no expectation that inference completes within 0.1s and returns an action sequence.
I would therefore propose that when inference is called, there should be no "attempt to wait for inference to complete, within the bounds of the timeout parameter." It should just run inference asynchronously, and continue to return the existing action_sequence. However, the timeout parameter should only check that the asynchronous inference process runs within the specified timeout period (e.g. 0.5s), and throws an error / warning if inference duration exceeds that timeout value. This check is merely to ensure that the action buffer doesn't get drawn down to 0 before a new set of actions are added to the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ruijie-he thanks for commenting. Let me try to unpack/address your comment.
there should be no expectation that inference completes within 0.1s and returns an action sequence
I agree.
I would therefore propose that when inference is called, there should be no "attempt to wait for inference to complete, within the bounds of the timeout parameter
Why not? If we are running with 1/fps=0.1 we can give ourselves up to that amount of time to get an updated action sequence. For example, say a previously invoked inference is still running at the moment I call provide_observation_get_actions
. It takes a trivial amount of time to look up an action from the cache. By your reasoning, we should return it immediately. But it's not going to be used for another ~0.0999 s. Now, what if the previously invoked inference was just about to finish (say, in 0.05 s)? Wouldn't we rather use the fresher action trajectory? And what does it cost us to wait? Nothing really, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think it depends on the expectation of when an action is executed after an observation is provided. My expectation is that we use an action as soon as it gets returned by this method, whereas it sounds like your expectation is that it is only executed at the end of that time window.
Because we use it as soon as we get an action, the varying time taken for inference matters, since it is important that actions are triggered at a regular time interval (i.e. 0.1s), rather than with variations across intervals (e.g. 0.06s, 0.14s, 0.08s, 0.12s, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My expectation is that we use an action as soon as it gets returned by this method
So you don't have a control cycle you need to abide by?
it sounds like your expectation is that it is only executed at the end of that time window.
I think my expectation is that we execute actions at a fixed periodic interval.
since it is important that actions are triggered at a regular time interval
Yes this is exactly what I mean.
Let me try to illustrate what I mean with a diagram. Thanks for helping me get to the bottom of it.
video.mp4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the video explanation. I agree that we can just have an external control cycle so that we can execute the latest available action at a fixed interval.
I think that my broader point is that it's more important to have a timeout parameter that is longer than the action / observation time interval of 0.1s.
This timeout parameter (e.g. 0.5s) is needed to check that the asynchronous inference process runs within the specified timeout period, and throws an error / warning if inference duration exceeds that timeout value. This check is necessary to ensure that the action buffer doesn't get drawn down to 0 before a new set of actions is added to the buffer. If inference returns in less than 0.1s, great, then we'd use the new action, but that's less important than ensuring that inference always completes in time to refill the action buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems your setup is you have a 10Hz control loop but inference takes 0.5 s. Let's say you are producing action horizons of 15, so you probably want inference to run whenever you're down to 6 actions left in the cache. So here's what happens:
0.00 - The first time, you call the method with no timeout.
0.50 - You receive an action and execute it immediately. You also call the method with another
observation, a timeout of 0.09, and a request for the action that you'd want to execute
at 0.6. ·
0.59 - You receive an action (it's the 2nd action of the 1st inference).
0.60 - You execute the action, get an observation, and call the method with a timeout of 0.09,
and requesting the action to execute at 0.7.
0.69 - Receive the 3rd action of the 1st inference.
0.70 - Execute action, get obs, call method with timeout=0.09, requesting action for 0.8.
...
1.29 - Receive the 9th action of the 1st inference.
(only 6 actions left in the cache! 2nd inference starts)
...
1.70 - Execute action, get obs, call method with timeout=0.09, requesting action for 1.8.
1.78 - 2nd inference completes before your timeout! All remaining actions from 1st
inference are overwritten in the cache.
1.79 - Receive the 7th action of the 2nd inference. Yes, the 7th. That's because the 2nd
inference is based on timestamp 1.2. We skipped actions for timestamps
1.2, 1.3, 1.4, 1.5, 1.6, 1.7.
1.80 - Execute action, get obs, call method with timeout=0.09, requesting action for 1.9.
None of the story above required you to ever tell the method that inference takes 0.5 seconds. BUT, you did have to tell the method (via an instance variable n_action_buffer
), that you want inference whenever you have 6 or less future actions left in the cache.
Correct me if I'm wrong, but it seems that whatever it is you want to happen by providing 0.5 seconds as a parameter, is covered by the n_action_buffer
parameter, which is also technically in units of time (the units are effectively 1/fps). In fact I could even change n_action_buffer
to be s_action_buffer
(how many seconds of buffer we want for the action cache). You'd then set it to 0.6 (0.5 + 0.1 to be safe).
Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm now confused if your timeout parameter should be in my example. In your latest comment, you are suggesting it should be 0.09, but in your previous comment, when you linked to the timeout warning, it suggests that we need to set it to 0.5s, otherwise I'll start getting a warning when inference takes more than 0.09s but less than 0.5s. If so, we don't need this warning since I expect inference to take 0.5s.
Regarding n_action_buffer
, that only indicates when you want to start running another inference (e.g. when there are 6 actions remaining in the buffer), but doesn't provide a warning if inference actually ended up taking longer than the expected inference time (i.e. 0.5s), and is starting to fall behind what I had anticipated when I set n_action_buffer
.
Anyway, I think that we've gone back and forth long enough, and greatly appreciate you taking the time to understand this. Hopefully the main issues are clear at this point, and maybe you could just clarify in your code comments which timeout parameter we're referring to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in your previous comment, when you linked to the timeout warning, it suggests that we need to set it to 0.5s
Sorry but that's not what I was suggesting. Can you clarify why you think that? I'll clarify under what the conditions the error is raised: it is raised if inference is required (according to the n_action_buffer
criteria) but a previous inference is still running. This makes it so that if inference requests are piling up, it fails.
but doesn't provide a warning if inference actually ended up taking longer than the expected
You are right that this warning is not provided! I'll consider adding it. Thank you.
I'm thinking:
- I clarify in the code that the timeout is for how long we're willing to wait for
provide_observation_get_actions
to give an answer, and it is not necessarily related to inference time. - I set
max_expected_inference_time = n_action_buffer * fps
. Then if this is violated in the inference thread I log a warning likeInference time exceeded n_action_buffer * fps. This may cause inference to fall behind!
I like this way because it avoids the user having to provide yet another timeout parameter which could be confusing.
Thanks for your help, and I'm always happy to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that all sounds good, thanks! Let me know when you'd like me to try out your updates.
Thanks for all the work that you and the team are doing here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @ruijie-he, I added this warning in. Indeed I found it useful.
What this does
At its core this PR adds
PolicyRolloutWrapper
a way of interfacing between a policy which has a notion of time-steps but not really time, and an environment which does care about time. Here are the major changes to go along with that:eval.py::rollout
now interacts withPolicyRolloutWrapper
.eval.py::rollout
has a flag to simulate latency. This is a matter of providing atimeout
option to thePolicyRolloutWrapperInstance
, and managing anext_action
across loop iterations.select_actions
from the policy protocol. The policy now just accepts observation sequences and returns action sequences. The wrapper handles caching.Caveat: We assume that all policies return actions sequences starting from the current time. I believe the design is such that this assumption can be relaxed without too much work.
TODO:
How it was tested
TODO:
PolicyRolloutWrapper
behaves as expected.How to checkout & try? (for the reviewer)
Run this to get a regular simulation with no latency simulation. It should behave as before.
eval_episode_0.mp4
Run this to get a simulation with latency being simulated, and pre-emptive inference (notice how the trajectories are updated before they are depleted):
eval_episode_1.mp4
Play around with
policy.num_inference_steps
andeval.n_action_buffer
to appreciate the interplay between inference time and need to do preemptive inference. The higher the inference time, the bigger of a buffer you need. For example, trypolicy.num_inference_steps=100
with+eval.n_action_buffer=2
. You will find that things get bad fast:eval_episode_0.mp4
This change is