-
Notifications
You must be signed in to change notification settings - Fork 464
Adding circuit breaker middleware #4336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: circuit_breaker_library
Are you sure you want to change the base?
Conversation
|
"host": host, | ||
}) | ||
|
||
metrics := &circuitBreakerMetrics{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make these metrics resemble the circuitbreakerfx metrics please. ideally we can use their dashboard as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me check. I was referring to this.
https://sg.uberinternal.com/code.uber.internal/uber-code/go-code/-/blob/src/code.uber.internal/rpc/circuitbreakerfx/middleware/observer.go?L110
To integrate with their dashboard, we might need to use the same component name as yarpc_circuit_breaker.
https://ugrafana.uberinternal.com/d/AYkjuu6Gk/yarpc-circuit-breaker-dashboard?orgId=1&refresh=30s
// WriteBatchRaw is a method that writes a batch of raw data. | ||
func (c *circuitBreakerClient) WriteBatchRaw(ctx thrift.Context, req *rpc.WriteBatchRawRequest) error { | ||
return withBreaker[*rpc.WriteBatchRawRequest](c, ctx, func() error { | ||
return c.next.WriteBatchRaw(ctx, req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to induce an alloc per fn call cause the lambda is changing (please verify with a benchmark) - can you avoid that if you restructure the code a bit?
i.e. change the signature of withBreaker
to
func withBreaker[T any](
c *circuitBreakerClient,
ctx tchannel.ContextWithHeaders,
req T,
call func(tchannel.ContextWithHeaders, T) error,
) error {
so that here, you can just do return withBreaker(c, ctx, req, c.next.WriteBatchRaw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the withBreaker function signature. It seems we might need an additional variant for cases where a result needs to be returned, something like:
func withBreakerWithResult[T any, R any](
c *client,
ctx thrift.Context,
req T,
call func(thrift.Context, T) (R, error),
) (R, error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from nits above, can you add tests please
return call(ctx, req) | ||
} | ||
|
||
if c.circuit == nil || !c.circuit.IsRequestAllowed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more. F438
why would the circuit be nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this will not be nil, added this check as a precaution to avoid nil pointer errors.
|
||
func newMetrics(scope tally.Scope, host string) *circuitBreakerMetrics { | ||
return &circuitBreakerMetrics{ | ||
successes: scope.Tagged(map[string]string{"host": host}).Counter("circuit_breaker_successes"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this src host or dest host? are you concerned about the cardinality of this at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dest host. It is primarily used to identify which node is experiencing issues in the event of a request failure. In cases where a node is slow, this helps pinpoint the problematic node so the server can be restarted accordingly.
// Execute the request and update circuit breaker state | ||
err := call(ctx, req) | ||
if err == nil { | ||
c.circuit.ReportRequestStatus(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this wrong? i.e. you shouldn't be returning true when in shadow mode and were rejected, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have corrected it. Thanks for pointing it out.
What this PR does / why we need it:
This PR adds circuit breaker middleware, that wraps TChannel clients with a circuit breaker to monitor and control request flow per host. It tracks request success/failure metrics and blocks calls when failure rates are too high.
Circuit breakers are initialized once per host using sync.Once and stored in concurrent maps. Each call is either executed or rejected based on the circuit state, improving system resilience.
Circuit breaker metrics:

Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: