-
Notifications
You must be signed in to change notification settings - Fork 102
Defer first full snapshot on etcd startup #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer first full snapshot on etcd startup #154
Conversation
6169ad8
to
0307725
Compare
@swapnilgm I have updated the PR as per the discussion we had. I am still doubtful about the case where it has been more than 24 hours since previous full snapshot, because then the full snapshot will take time to upload and in the meantime, readiness probe will fail as it /healthz endpoint is still not active. Is it feasible to set the readiness probe to active already before taking full snapshot in the case of previous snapshot being older than 24 hours? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we handle stopCh
during a full, delta or first-full/delta snapshot?
ssr.logger.Infof("Applied watch on etcd from revision: %d", ssr.prevSnapshot.LastRevision+1) | ||
|
||
for { | ||
wr, ok := <-ssr.watchCh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a select
with stopCh
and/or watchCtx.Done()
here?
Do we have the same problem in regular full and delta snapshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I've addressed this now.
} | ||
} | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We assuming that CollectEventsSincePrevSnapshot
executed before ssr.snapshotEventHandler
is ever executed in after a restart. Can we document this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
0307725
to
b6e40f3
Compare
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
b6e40f3
to
6f2d514
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Signed-off-by: Shreyas Rao shreyas.sriganesh.rao@sap.com
What this PR does / why we need it:
This PR defers the first full snapshot on etcd startup, by instead taking a delta snapshot initially and then taking a full snapshot. This helps speeding up setting the readiness probe of etcd to active. If no previous full snapshot exists in the backup or it has been more than 24 hours since the last full snapshot, a full snapshot will be taken on startup, followed by delta snapshots, as was the case earlier.
Which issue(s) this PR fixes:
Fixes #150
Special notes for your reviewer:
Snapshotter. Also, for the moment, I have hard-codedRun()
has been modified with newdelaySeconds
parameter to allow setting deferred snapshot timers24 hours
for checking if previous snapshot is too old, because to make it generic it requires larger effort (to convert any given cron schedule into a uniform time interval). Do let me know if there is an easy/quick way to solve it, or else I can work on it in a separate PR later.I have tested the PR to the best of my ability. I will also run memory profiling. Please run intensive memory and CPU profiling from your side as well. Thanks.
Release note: