-
Notifications
You must be signed in to change notification settings - Fork 304
Adjust Worker::poll logic to fix pending wake failure in balance service. #825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up f 8000 or GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ice. This commit aims to solve the problem that Worker::poll_next_msg may block service.poll_ready. Especially when the service is a balance service, while multiple pending endpoints become ready, the task is blocked on Worker::poll_next_msg if there is no other message come which would cause these endpoints to be disconnected.
Thanks! Looks like CI failed... Also, is there a way to update the tests to check this change? |
adjust the buffer tests. remove `propagates_trace_spans` test due to the worker's service `poll_ready` is outside the request processing.
Thanks for your attention, I have updated the tests to check this change. |
@cratelyn do yall have a more extensive suite that includes balancing that can test this change out? |
we do! i'll try running the linkerd2-proxy tests with this patch applied, pardon the wait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i left some notes below. my main questions and concerns about this change are about how many of the test suite's assertions (and a test case) are removed in this change.
if changes to the test suite are required here, is this a breaking change to this middleware's behavior?
#[tokio::test(flavor = "current_thread")] | ||
async fn propagates_trace_spans() { | ||
use tower::util::ServiceExt; | ||
use tracing::Instrument; | ||
|
||
let _t = support::trace_init(); | ||
|
||
let span = tracing::info_span!("my_span"); | ||
|
||
let service = support::AssertSpanSvc::new(span.clone()); | ||
let (service, worker) = Buffer::pair(service, 5); | ||
let worker = tokio::spawn(worker); | ||
|
||
let result = tokio::spawn(service.oneshot(()).instrument(span)); | ||
|
||
result.await.expect("service panicked").expect("failed"); | ||
worker.await.expect("worker panicked"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may i ask why this test was deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this commit change the worker exec order:
- service.poll_ready
- poll_next_msg
- service.call
after 2 poll_next_msg, the span in the msg can be retrieved, the service.call is now in the span scope, but not the servce.poll_ready, as you see, AssertSpanSvc
does not fit the case, so i delete this test.
let err = assert_ready_err!(response2.poll()); | ||
assert!( | ||
err.is::<error::ServiceError>(), | ||
"response should fail with a ServiceError, got: {:?}", | ||
err | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this no longer the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above, the old exec order is
- poll_next_msg
- service.poll_ready
- service.call
the new exec order is
- service.poll_ready
- poll_next_msg
- service.call
there is no 'preload' message, so i delete it.
assert_pending!(worker.poll()); | ||
handle | ||
.next_request() | ||
.await | ||
.unwrap() | ||
.1 | ||
.send_response("world"); | ||
assert_pending!(worker.poll()); | ||
assert_ready_ok!(response4.poll()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these assertions no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because in this case there is no other request will be issued, so it is no need to do this assertion.
if this change is focused on solving how this middleware interacts with a |
@seanmonstar i wasn't able to get a patched version of the linkerd2-proxy building with this patch. we're on tonic v0.12 at the moment, which depends on tower 0.4, and thus builds fail if this branch is used. i did leave some questions above, however. i'm concerned that trace contexts no longer seem to be propagated, and that this change seems to change the behavior of |
#824
This commit aims to solve the problem that
Worker::poll_next_msg
may blockservice.poll_ready
. Especially when the service is aBalance
service, while multiple pending endpoints become ready, the task is blocked onWorker::poll_next_msg
if there is no other message come which would cause these endpoints to be disconnected.