8000 Adjust Worker::poll logic to fix pending wake failure in balance service. by c98 · Pull Request #825 · tower-rs/tower · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Adjust Worker::poll logic to fix pending wake failure in balance service. #825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up f 8000 or GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

c98
Copy link
@c98 c98 commented May 13, 2025

#824
This commit aims to solve the problem that Worker::poll_next_msg may block service.poll_ready. Especially when the service is a Balance service, while multiple pending endpoints become ready, the task is blocked on Worker::poll_next_msg if there is no other message come which would cause these endpoints to be disconnected.

…ice.

This commit aims to solve the problem that Worker::poll_next_msg may
block service.poll_ready. Especially when the service is a balance service,
while multiple pending endpoints become ready, the task is blocked on
Worker::poll_next_msg if there is no other message come which would cause
these endpoints to be disconnected.
@seanmonstar
Copy link
Collaborator

Thanks! Looks like CI failed...

Also, is there a way to update the tests to check this change?

adjust the buffer tests.

remove `propagates_trace_spans` test due to the worker's service
`poll_ready` is outside the request processing.
@c98
Copy link
Author
c98 commented May 14, 2025

Thanks for your attention, I have updated the tests to check this change.

@seanmonstar
Copy link
Collaborator

@cratelyn do yall have a more extensive suite that includes balancing that can test this change out?

@cratelyn
Copy link
Contributor

@cratelyn do yall have a more extensive suite that includes balancing that can test this change out?

we do! i'll try running the linkerd2-proxy tests with this patch applied, pardon the wait.

Copy link
Contributor
@cratelyn cratelyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left some notes below. my main questions and concerns about this change are about how many of the test suite's assertions (and a test case) are removed in this change.

if changes to the test suite are required here, is this a breaking change to this middleware's behavior?

Comment on lines -378 to -395
#[tokio::test(flavor = "current_thread")]
async fn propagates_trace_spans() {
use tower::util::ServiceExt;
use tracing::Instrument;

let _t = support::trace_init();

let span = tracing::info_span!("my_span");

let service = support::AssertSpanSvc::new(span.clone());
let (service, worker) = Buffer::pair(service, 5);
let worker = tokio::spawn(worker);

let result = tokio::spawn(service.oneshot(()).instrument(span));

result.await.expect("service panicked").expect("failed");
worker.await.expect("worker panicked");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may i ask why this test was deleted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit change the worker exec order:

  1. service.poll_ready
  2. poll_next_msg
  3. service.call

after 2 poll_next_msg, the span in the msg can be retrieved, the service.call is now in the span scope, but not the servce.poll_ready, as you see, AssertSpanSvc does not fit the case, so i delete this test.

Comment on lines -348 to -353
let err = assert_ready_err!(response2.poll());
assert!(
err.is::<error::ServiceError>(),
"response should fail with a ServiceError, got: {:?}",
err
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this no longer the case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, the old exec order is

  1. poll_next_msg
  2. service.poll_ready
  3. service.call

the new exec order is

  1. service.poll_ready
  2. poll_next_msg
  3. service.call

there is no 'preload' message, so i delete it.

Comment on lines -220 to -228
assert_pending!(worker.poll());
handle
.next_request()
.await
.unwrap()
.1
.send_response("world");
assert_pending!(worker.poll());
assert_ready_ok!(response4.poll());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these assertions no longer needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in this case there is no other request will be issued, so it is no need to do this assertion.

@cratelyn
Copy link
Contributor

if this change is focused on solving how this middleware interacts with a Balance service, i might suggest adding a test case that exercises the buffer with a Balance service.

@cratelyn
Copy link
Contributor

@cratelyn do yall have a more extensive suite that includes balancing that can test this change out?

we do! i'll try running the linkerd2-proxy tests with this patch applied, pardon the wait.

@seanmonstar i wasn't able to get a patched version of the linkerd2-proxy building with this patch. we're on tonic v0.12 at the moment, which depends on tower 0.4, and thus builds fail if this branch is used.

i did leave some questions above, however. i'm concerned that trace contexts no longer seem to be propagated, and that this change seems to change the behavior of Buffer. having debugged changes in behavior introduced by other changes to Buffer, e.g. #635, i'm very hesitant about this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0