8000 Expose triggering on-demand full snapshot via HTTP endpoint by shreyas-s-rao · Pull Request #143 · gardener/etcd-backup-restore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Expose triggering on-demand full snapshot via HTTP endpoint #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2019

Conversation

shreyas-s-rao
Copy link
Collaborator
@shreyas-s-rao shreyas-s-rao commented Mar 27, 2019

What this PR does / why we need it:
This PR adds functionality to trigger on-demand full snapshots via the HTTP endpoint /snapshot/full.
This PR also fixes a few unhandled scenarios for the case when snapstore provider is not configured, such as etcd readiness probe handling and defragmentation. It also refactors the server command to make it more modular.

Which issue(s) this PR fixes:
Fixes #113

Special notes for your reviewer:
Also refactored the TriggerFullSnapshot and DefragDataPeriodically methods, as well as the order of object creation in NewServerCommand in order to pass the Snapshotter to the HTTPHandler.
Will add unit tests shortly.

Release note:

Added functionality to trigger on-demand full snapshots via the HTTP endpoint `/snapshot/full`.

@shreyas-s-rao shreyas-s-rao added kind/enhancement Enhancement, improvement, extension status/in-progress Issue is in progress/work needs/review Needs review component/etcd-backup-restore ETCD Backup & Restore platform/all area/robustness Robustness, reliability, resilience related needs/tests Needs (more) tests needs/lgtm Needs approval for merging area/backup Backup related needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 27, 2019
@shreyas-s-rao shreyas-s-rao added this to the 0.7.0 milestone Mar 27, 2019
Copy link
Contributor
@swapnilgm swapnilgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments. NewServerCommand code is not that simple. It involves complex coordination logic. So, please handle it carefully.

@shreyas-s-rao shreyas-s-rao force-pushed the full-snapshot-on-demand branch 2 times, most recently from 4a65144 to 5075f72 Compare May 13, 2019 09:34
@shreyas-s-rao
Copy link
Collaborator Author

@swapnilgm I have made the suggested changes and also refactored code to wait for etcd probe to succeed before setting http status to OK, even in the case of etcd-events.

@shreyas-s-rao shreyas-s-rao force-pushed the full-snapshot-on-demand branch from 5075f72 to b540b30 Compare May 14, 2019 08:47
Copy link
Collaborator
@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it simplify the code if we split the snapshotting and non-snapshotting cases as separate funcs? Channels and goroutines embedded in conditions in a long func makes me nervous :-)

@shreyas-s-rao
Copy link
Collaborator Author

@amshuman-kr I did consider this, but given it would create a lot of code duplication, I stuck to the current approach.

@amshuman-kr
Copy link
Collaborator
amshuman-kr commented May 14, 2019

it would create a lot of code duplication

@shreyas-s-rao how about smaller funcs for the reusable code?

@shreyas-s-rao
Copy link
Collaborator Author
shreyas-s-rao commented May 14, 2019

@shreyas-s-rao how about smaller funcs for the reusable code?

Makes sense @amshuman-kr . I'll make the necessary changes.

@shreyas-s-rao shreyas-s-rao force-pushed the full-snapshot-on-demand branch from b540b30 to c409c22 Compare June 25, 2019 09:45
@shreyas-s-rao
Copy link
Collaborator Author

@amshuman-kr @swapnilgm I have addressed the suggestions. Also fixed a couple of things wrt no-snapstore-provider case, namely probe loop and periodic defrag (for etcd-events case).

Copy link
Collaborator
@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from one comment about checking the snapshotterEnabled flag twice.

The comments about wait period between probing etcd can be discussed/addressed later.

insecureSkipVerify,
etcdEndpoints)

if snapshotterEnabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this block be merged with this to avoid checking the same flag twice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

if err != nil {
logger.Errorf("Failed to probe etcd: %v", err)
handler.Status = http.StatusServiceUnavailable
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this comes from existing code. So, not necessarily in this PR, but shouldn't we think about a wait period before the next probe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Amshu. It would be better to quickly merge this PR and take this up in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call timeout for etcd probe is set to default 30 sec i.e. each every probe call blocks for 30 sec if etcd is unavailable. I think its not necessary to wait in between as it dosen't introduce much network traffic as well.

if err != nil {
logger.Errorf("Failed to probe etcd: %v", err)
handler.Status = http.StatusServiceUnavailable
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@shreyas-s-rao shreyas-s-rao force-pushed the full-snapshot-on-demand branch from c409c22 to c965a25 Compare June 26, 2019 08:43
@shreyas-s-rao
Copy link
Collaborator Author

@amshuman-kr @swapnilgm I've addressed the latest suggestions. PTAL.

Copy link
Collaborator
@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shreyas-s-rao shreyas-s-rao force-pushed the full-snapshot-on-demand branch from c965a25 to 4469567 Compare June 26, 2019 09:36
Copy link
Contributor
@swapnilgm swapnilgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Refactoring looks good as well. Just one correction mentioned in comment. Please address it.

// This is needed to stop the currently running snapshotter.
atomic.StoreUint32(&h.AckState, HandlerAckWaiting)
h.Logger.Info("Changed handler state.")
h.ReqCh <- emptyStruct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have two requests to stop snapshotter? Why do we have previous one https://github.com/gardener/etcd-backup-restore/pull/143/files#diff-912f19dfd8da2a79f988fd5045cd7dd0R146

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Thanks for the catch!

@shreyas-s-rao shreyas-s-rao force-pushed the full-snapshot-on-demand branch from 4469567 to 8fe06ed Compare July 4, 2019 06:35
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
@shreyas-s-rao shreyas-s-rao force-pushed the full-snapshot-on-demand branch from 8fe06ed to 8ef9e6f Compare July 4, 2019 06:37
@swapnilgm swapnilgm added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 4, 2019
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 4, 2019
Copy link
Contributor
@swapnilgm swapnilgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Now that we have snapshotter in as a part of handler. Probably we could get rid of reqCH and ackCh on handler. And simple make use of ssrStateMutex. But we will have that in separate PR.

@swapnilgm swapnilgm merged commit 8c8d781 into gardener:master Jul 4, 2019
@shreyas-s-rao shreyas-s-rao deleted the full-snapshot-on-demand branch July 5, 2019 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backup Backup related area/robustness Robustness, reliability, resilience related component/etcd-backup-restore ETCD Backup & Restore kind/enhancement Enhancement, improvement, extension needs/lgtm Needs approval for merging needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review needs/tests Needs (more) tests platform/all status/in-progress Issue is in progress/work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose API to trigger full snapshot
5 participants
0