-
Notifications
You must be signed in to change notification settings - Fork 102
Expose triggering on-demand full snapshot via HTTP endpoint #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose triggering on-demand full snapshot via HTTP endpoint #143
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comments. NewServerCommand
code is not that simple. It involves complex coordination logic. So, please handle it carefully.
4a65144
to
5075f72
Compare
@swapnilgm I have made the suggested changes and also refactored code to wait for etcd probe to succeed before setting http status to OK, even in the case of etcd-events. |
5075f72
to
b540b30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it simplify the code if we split the snapshotting and non-snapshotting cases as separate funcs? Channels and goroutines embedded in conditions in a long func makes me nervous :-)
@amshuman-kr I did consider this, but given it would create a lot of code duplication, I stuck to the current approach. |
@shreyas-s-rao how about smaller funcs for the reusable code? |
Makes sense @amshuman-kr . I'll make the necessary changes. |
b540b30
to
c409c22
Compare
@amshuman-kr @swapnilgm I have addressed the suggestions. Also fixed a couple of things wrt no-snapstore-provider case, namely probe loop and periodic defrag (for etcd-events case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from one comment about checking the snapshotterEnabled
flag twice.
The comments about wait period between probing etcd
can be discussed/addressed later.
insecureSkipVerify, | ||
etcdEndpoints) | ||
|
||
if snapshotterEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this block be merged with this to avoid checking the same flag twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
if err != nil { | ||
logger.Errorf("Failed to probe etcd: %v", err) | ||
handler.Status = http.StatusServiceUnavailable | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this comes from existing code. So, not necessarily in this PR, but shouldn't we think about a wait period before the next probe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Amshu. It would be better to quickly merge this PR and take this up in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call timeout for etcd probe is set to default 30 sec i.e. each every probe call blocks for 30 sec if etcd is unavailable. I think its not necessary to wait in between as it dosen't introduce much network traffic as well.
if err != nil { | ||
logger.Errorf("Failed to probe etcd: %v", err) | ||
handler.Status = http.StatusServiceUnavailable | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
c409c22
to
c965a25
Compare
@amshuman-kr @swapnilgm I've addressed the latest suggestions. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c965a25
to
4469567
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Refactoring looks good as well. Just one correction mentioned in comment. Please address it.
// This is needed to stop the currently running snapshotter. | ||
atomic.StoreUint32(&h.AckState, HandlerAckWaiting) | ||
h.Logger.Info("Changed handler state.") | ||
h.ReqCh <- emptyStruct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have two requests to stop snapshotter? Why do we have previous one https://github.com/gardener/etcd-backup-restore/pull/143/files#diff-912f19dfd8da2a79f988fd5045cd7dd0R146
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. Thanks for the catch!
4469567
to
8fe06ed
Compare
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
8fe06ed
to
8ef9e6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Now that we have snapshotter in as a part of handler. Probably we could get rid of reqCH and ackCh on handler. And simple make use of ssrStateMutex. But we will have that in separate PR.
What this PR does / why we need it:
This PR adds functionality to trigger on-demand full snapshots via the HTTP endpoint
/snapshot/full
.This PR also fixes a few unhandled scenarios for the case when snapstore provider is not configured, such as etcd readiness probe handling and defragmentation. It also refactors the server command to make it more modular.
Which issue(s) this PR fixes:
Fixes #113
Special notes for your reviewer:
Also refactored the
TriggerFullSnapshot
andDefragDataPeriodically
methods, as well as the order of object creation inNewServerCommand
in order to pass theSnapshotter
to theHTTPHandler
.Will add unit tests shortly.
Release note: