8000 Adding circuit breaker middleware by ggaurav08 · Pull Request #4336 · m3db/m3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Adding circuit breaker middleware #4336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: circuit_breaker_library
Choose a base branch
from

Conversation

ggaurav08
Copy link
@ggaurav08 ggaurav08 commented Apr 25, 2025

What this PR does / why we need it:
This PR adds circuit breaker middleware, that wraps TChannel clients with a circuit breaker to monitor and control request flow per host. It tracks request success/failure metrics and blocks calls when failure rates are too high.
Circuit breakers are initialized once per host using sync.Once and stored in concurrent maps. Each call is either executed or rejected based on the circuit state, improving system resilience.

Circuit breaker metrics:
Screenshot 2025-05-01 at 2 41 13 PM

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

"host": host,
})

metrics := &circuitBreakerMetrics{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make these metrics resemble the circuitbreakerfx metrics please. ideally we can use their dashboard as is

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// WriteBatchRaw is a method that writes a batch of raw data.
func (c *circuitBreakerClient) WriteBatchRaw(ctx thrift.Context, req *rpc.WriteBatchRawRequest) error {
return withBreaker[*rpc.WriteBatchRawRequest](c, ctx, func() error {
return c.next.WriteBatchRaw(ctx, req)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to induce an alloc per fn call cause the lambda is changing (please verify with a benchmark) - can you avoid that if you restructure the code a bit?

i.e. change the signature of withBreaker to

func withBreaker[T any](
  c *circuitBreakerClient, 
  ctx tchannel.ContextWithHeaders, 
  req T, 
  call func(tchannel.ContextWithHeaders, T) error,
) error {

so that here, you can just do return withBreaker(c, ctx, req, c.next.WriteBatchRaw)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the withBreaker function signature. It seems we might need an additional variant for cases where a result needs to be returned, something like:
func withBreakerWithResult[T any, R any](
c *client,
ctx thrift.Context,
req T,
call func(thrift.Context, T) (R, error),
) (R, error)

Copy link
Collaborator
@prateek prateek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from nits above, can you add tests please

return call(ctx, req)
}

if c.circuit == nil || !c.circuit.IsRequestAllowed() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more. F438

why would the circuit be nil?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this will not be nil, added this check as a precaution to avoid nil pointer errors.

@ggaurav08 ggaurav08 marked this pull request as ready for review May 20, 2025 03:25

func newMetrics(scope tally.Scope, host string) *circuitBreakerMetrics {
return &circuitBreakerMetrics{
successes: scope.Tagged(map[string]string{"host": host}).Counter("circuit_breaker_successes"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this src host or dest host? are you concerned about the cardinality of this at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dest host. It is primarily used to identify which node is experiencing issues in the event of a request failure. In cases where a node is slow, this helps pinpoint the problematic node so the server can be restarted accordingly.

// Execute the request and update circuit breaker state
err := call(ctx, req)
if err == nil {
c.circuit.ReportRequestStatus(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this wrong? i.e. you shouldn't be returning true when in shadow mode and were rejected, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have corrected it. Thanks for pointing it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0