8000 improve flexiblity of retry policy by rcoh · Pull Request #584 · tower-rs/tower · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

improve flexiblity of retry policy #584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 1, 2022

Conversation

rcoh
Copy link
Contributor
@rcoh rcoh commented Apr 30, 2021

retry::Policy is an effective way to expressing retries, however, there two use cases that
as it stands today, cannot be expressed:

  • Altering the final response (eg. to record the fact that you ran out of retries)
  • Altering the request (eg. to set a header to the server indicating that this request is a retry)

(Technically the second is possible with clone_request, but it's a little unclear which request would actually get sent).

This change implements what I think is pretty close to the minimal update to make this possible, namely, req
and Res both become mutable references. This enables policies to mutate them during execution & enables both of the
use cases above without complicating the "simple path" callers who don't need this behavior.

This is a breaking change. However, the fixes are only a couple of &mut and potentially a call to as_ref().

I've also considered (and would be totally happy) with an API that "mapped" over req & res, taking and returning ownership. This may actually even be slightly more flexible. However, I didn't want to leak this detail onto all users.

retry::Policy is an effective way to expressing retries, however, there two use cases that
as it stands today, cannot be expressed:
- Altering the final response (eg. to record the fact that you ran out of retries)
- Altering the request (eg. to set a header to the server indicating that this request is a retry)

(Technically the second is possible with `clone_request`, but it's a little unclear _which_ request would actually get sent).

This change implements what I think is pretty close to the minimal update to make this possible, namely, `req`
and `Res` both become mutable references. This enables policies to mutate them during execution & enables both of the
use cases above without complicating the "simple path" callers who don't need this behavior.

**This is a breaking change.** However, the fixes are only a couple of `&mut` and potentially a call to `as_ref()`.
@davidpdrsn davidpdrsn added the A-retry Area: The tower "retry" middleware label May 1, 2021
Copy link
Member
@davidpdrsn davidpdrsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never used this middleware in practice but to me the change looks sensible. Though I'm wondering if someone else with more experience using Retry has run into other problems that could be solved, should we decide to make a breaking change. cc @hawkw @olix0r @LucioFranco @seanmonstar

Had one small suggestion/question.

I've also considered (and would be totally happy) with an API that "mapped" over req & res, taking and returning ownership. This may actually even be slightly more flexible. However, I didn't want to leak this detail onto all users.

To return a new request/response the future returned by Policy::retry would have to change as well which might be too much of a change.

@@ -58,7 +58,7 @@ pub trait Policy<Req, Res, E>: Sized {
///
/// [`Service::Response`]: crate::Service::Response
/// [`Service::Error`]: crate::Service::Error
fn retry(&self, req: &Req, result: Result<&Res, &E>) -> Option<Self::Future>;
fn retry(&self, req: &mut Req, result: &mut Result<Res, E>) -> Option<Self::Future>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess getting a &mut Result<Res, E> allows users to change Err(_) into Ok(_) which might be kinda odd. Not sure. Feels like getting Result<&mut Res, &mut E> would fit the use-cases as well but not allow changing the Result variant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that is a really good point david, what do you think about that @rcoh ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment wasn't threaded because of email:

The entire point is to change the variant (at least for me). Because there
are "success" cases the actually need to be retries and when I run out of
retries I need to change it to failure

@rcoh
Copy link
Contributor Author
rcoh commented May 2, 2021 via email

@davidpdrsn
Copy link
Member

Alright cool. That does make sense.

I kinda feel like if we are going to make a breaking change we should take care to design something that we are happy with. That might be this or it might be bigger changes. I haven't used Retry personally so I don't know which it is. There was some discussion about other potential changes to Retry here.

@LucioFranco LucioFranco requested review from seanmonstar and hawkw May 7, 2021 14:15
@LucioFranco
Copy link
Member

@rcoh maybe I am forgetting how this works but can't you return an error in the returned service future?

@rcoh
Copy link
Contributor Author
rcoh commented May 7, 2021

@LucioFranco you have a mut ref to the result, so you can *res = Err(...) (there's a UT in the PR that does this)

@hawkw hawkw added this to the 0.5 milestone May 7, 2021
@hawkw
Copy link
Member
hawkw commented May 19, 2021

i actually just ran into a use-case for a mutable request in clone_request. i was trying to implement a retry policy where an HTTP request body is buffered lazily as the first request's body is sent, rather than aggregating the whole body prior to sending the initial request.

the approach i was going to take was storing the buffered body on the initial request, and then sending the buffer on a oneshot channel when the first request is dropped so that the potential retry can take ownership of it. however, there isn't currently any way to get the receiver side of the oneshot out of the initial request, without introducing a mutex or something for this purpose. making the request mutable in clone_request would very make this kind of thing much easier, so i think we should definitely make this change!

@rcoh
Copy link
Contributor Author

@hawkw just to clarify, in this change the request becomes mutable in the retry method; I thought it was a little unclear from a user perspective which request was which if you were to mutate it in clone_request() (basically, is the cloned request sent first or is the original sent first?)

Which is why I made it mutatable in retry because then it seemed more obvious that the request you just mutated would get sent next. I think that should be equivalent with your try_clone use case, but we could also make it mutable in both cases?

@hawkw
Copy link
Member
hawkw commented May 19, 2021

@hawkw just to clarify, in this change the request becomes mutable in the retry method; I thought it was a little unclear from a user perspective which request was which if you were to mutate it in clone_request() (basically, is the cloned request sent first or is the original sent first?)

Which is why I made it mutatable in retry because then it seemed more obvious that the request you just mutated would get sent next. I think that should be equivalent with your try_clone use case, but we could also make it mutable in both cases?

oh, i misread the PR description, whoops! i'm going to just go ahead and try to implement a lazily buffered request body against this branch and report back? i think it may still be possible without also touching clone_request, but we'll see...

another option would be to make the Policy itself mutable in both clone_request() and retry(), which would also give you an avenue for sharing state between the first and second request...

@hawkw
Copy link
Member
hawkw commented Mar 31, 2022

It would be really nice to get this change into 0.5 (which we're currently working on). @rcoh, are you interested in updating this branch to build with the latest master, or should I?

@rcoh
Copy link
Contributor Author
rcoh commented Mar 31, 2022

I can do the merge tomorrow assuming there's nothing too problematic

@rcoh
Copy link
Contributor Author
rcoh commented Apr 1, 2022

merge is done. I also added another section of docs about the mutable fields—I assume you'll squash merge, but if not let me know and I can cleanup history

@LucioFranco LucioFranco requested a review from davidpdrsn April 4, 2022 19:34
Copy link
Member
@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor documentation suggestions, hope they're helpful!

///
/// The policy MAY chose to mutate the result. This can enable the retry policy to convert a failure
/// into a success and vice versa. For example, if the policy is used to poll while waiting for a state
/// change, you can switch the result to failure such that running out of retries returns a specific failure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in general, we prefer to avoid second-person ('you') language in docs:

Suggested change
/// change, you can switch the result to failure such that running out of retries returns a specific failure.
/// change, the policy can switch the result to failure such that running out of retries returns a specific failure.

@hawkw
Copy link
Member
hawkw commented Apr 8, 2022

I assume you'll squash merge, but if not let me know and I can cleanup history

Yeah, that's right, we generally merge PRs by squashing, so you don't have to worry about it! :)

rcoh and others added 3 commits April 8, 2022 16:42
- Wrap docs to 80 characters
- Small doc tweaks / rewrites to clarify & remove `you`
@LucioFranco LucioFranco merged commit 19c1a1d into tower-rs:master Aug 1, 2022
cratelyn added a commit to linkerd/linkerd2-proxy that referenced this pull request Mar 14, 2025
this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <kate@buoyant.io>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819
cratelyn added a commit to linkerd/linkerd2-proxy that referenced this pull request Mar 14, 2025
this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <kate@buoyant.io>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819
cratelyn added a commit to linkerd/linkerd2-proxy that referenced this pull request Mar 18, 2025
this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <kate@buoyant.io>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819
cratelyn added a commit to linkerd/linkerd2-proxy that referenced this pull request Mar 21, 2025
this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <kate@buoyant.io>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819
cratelyn added a commit to linkerd/linkerd2-proxy that referenced this pull request Mar 21, 2025
* chore(deps)!: upgrade to tower 0.5

this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <kate@buoyant.io>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819

* fix(stack/loadshed): update test affected by tower-rs/tower#635

this commit updates a test that was affected by breaking changes in
tower's `Buffer` middleware. see this excerpt from the description of
that change:

> I had to change some of the integration tests slightly as part of this
> change. This is because the buffer implementation using semaphore
> permits is _very subtly_ different from one using a bounded channel. In
> the `Semaphore`-based implementation, a semaphore permit is stored in
> the `Message` struct sent over the channel. This is so that the capacity
> is used as long as the message is in flight. However, when the worker
> task is processing a message that's been recieved from the channel,
> the permit is still not dropped. Essentially, the one message actively
> held by the worker task _also_ occupies one "slot" of capacity, so the
> actual channel capacity is one less than the value passed to the
> constructor, _once the first request has been sent to the worker_. The
> bounded MPSC changed this behavior so that capacity is only occupied
> while a request is actually in the channel, which broke some tests
> that relied on the old (and technically wrong) behavior.

bear particular attention to this:

> The bounded MPSC changed this behavior so that capacity is only
> occupied while a request is actually in the channel, which broke some
> tests that relied on the old (and technically wrong) behavior.

that pr adds an additional message to the channel in tests exercising
the laod-shedding behavior, on account of the previous (incorrect)
behavior.

https://github.com/tower-rs/tower/pull/635/files#r797108274

this commit performs the same change for our corresponding test, adding
an additional `ready()` call before we hit the buffer's limit.

Signed-off-by: katelyn martin <kate@buoyant.io>

* review: use vendored `Either` for consistency

#3744 (comment)

Signed-off-by: katelyn martin <kate@buoyant.io>

---------

Signed-off-by: katelyn martin <kate@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-retry Area: The tower "retry" middleware
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0