8000 tower-balance: Observations from testing · Issue #286 · tower-rs/tower · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
tower-balance: Observations from testing #286
Open
@olix0r

Description

@olix0r

We've recently been running tests on the Linkerd proxy that exercise the load
balancer in larger clusters (of 30+ endpoints). At the same time, I've been
exploring the existing endpoint-weighting scheme.

In doing so, I've realized that the balancer is currently O(n), though it is
intended to be effectively O(1). Furthermore, the existing weighting scheme
is complex to instrument in practice, and is of questionable value in its
current form.

All of this leads to me believe that we should drastically simplify the
balancer:

  1. Do not attempt to unify (Weighted) P2C and Round-Robin under one
    implementation. Each strategy benefits from being able to use its own data
    structure. For now, I propose that we simply drop the Round-Robin logic. It
    can easily be added later if it's desirable.
  2. The balancer cannot be responsible for driving the readiness of all of its
    constituent services. The P2C balancer is intended to sample two
    endpoints. In the current implementation, we always poll all unready
    inner services, which leads to poor behavior as the balancer scales.
    Something like Spawn ready: drives a service's readiness on an executor #283 is necessary to relax the balancer's readiness-polling
    guarantees.
  3. We initially decided that all endpoint-service errors should be treated as fatal
    to the balancer. It now seems more appropriate to let the balancer handle these
    failures by dropping the failed service from the balancer.
  4. The Load and Instrument traits are not balancer-specific and should
    become more generally useful abstractions in a dedicated crate.
  5. The balancer should expose a Layer that layers over inner layers that
    produce Discover-typed results.
  6. The Pool implementation is factored inconveniently, especially in light of how
    tower-layer has evolved: it requires direct access to a balancer implementation,
    accessing its discover field directly. I think that it should probably
    be implemented as a Discover proxy-type that is constructed with a
    Watch<Load>. The pool doesn't rely on any specific balancer behavior, so it
    shouldn't dictate use of a specific balancer implementation.

I have a series of changes that I would like to pursue to this end:

  • Spawn ready: drives a service's readiness on an executor #283 Enables endpoint stacks to be driven to readiness without being actively polled, i.e. by the balancer.
  • Extract tower-load from tower-balance #285 extracts the Load & Instrument traits into a dedicated crates; and removes the current Weight implementation (which is not really what we want).
  • I have another (followup) branch that removes the choose trait/module, leaving only a P2CBalance implementation.
  • I could use some help figuring out the path forward for Pool.

Metadata

Metadata

Assignees

Labels

A-balanceArea: The tower "balance" middlewareA-ready-cacheArea: The tower "ready_cache" middlewareC-musingCategory: musings about a better worldS-waiting-on-authorStatus: awaiting some action (such as code changes) from the PR or issue author.T-middlewareTopic: middleware

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0