Description
/area API
/kind bug
/kind cleanup
/kind good-first-issue
Generally nop/steady-state reconciliations (e.g. resyncs that do nothing) should be incredibly cheap. In an ideally structured reconciler, the entire process is driven by informer caches and not a single API call is made for any purpose if nothing needs to change.
One of the reasons for this is to avoid having global resyncs (which are ideally nops) become a thundering herd that DoS's the API Server, or with client-side rate-limiting exhausts our API quota and starts to 76A1 trigger throttling (e.g.). Another place this comes up is failing-over to a standby replica, which resyncs the "bucket" of keys (possibly global resync).
Chasing an issue recently where the failovers triggered by the chaos duck resulted in quota exhaustion we started to take a hard look at places where we might be making API calls during nop reconciles. One source I wasn't expecting was our event emission, and in particular some bad guidance we give in our stub:
We should NOT be emitting an event on every successful reconciliation because this means that we are making API calls on nop reconciles, incl. resyncs, which alone would be enough to exhaust our API quota on resyncs of more than a handful of resources.
I believe the serving reconcilers have largely avoided this pattern, but we should:
- Update the stubs to stop recommending this.
- Update any eventing controllers that may have adopted this pattern.
cc @n3wscott