8000 Add new labels to sts pods, for backward compatibility with v0.23.0, which includes #777 by shreyas-s-rao · Pull Request #804 · gardener/etcd-druid · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add new labels to sts pods, for backward compatibility with v0.23.0, which includes #777 #804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 8, 2024

Conversation

shreyas-s-rao
Copy link
Contributor

How to categorize this PR?

/area quality
/kind task

What this PR does / why we need it:
This PR adds new k8s-recommended labels to the etcd pods - app.kubernetes.io/*, as defined here. This is required for backward compatibility when upgrading to etcd-druid:v0.23.0, which includes #777 , which switches all component labels and selectors to the new k8s-recommended labels, such that a downgrade from v0.23.0+ to v0.22.1 will be possible.

Additionally, this PR also ensures that any new label selector set on the statefulset is replaced with old (expected) label selector, ie {instance: <etcd-name>, name: etcd}, to be consistent with v0.22.0.

All of this is achieved by adding support for a preReconcileEtcd() step for the etcd reconciler, which calls statefulset component's PreDeploy method, which in turn does the following:

  1. If sts pod template labels don't contain old+new labels, patch the sts
  2. Wait for the sts pods to be updated (rolled out) and ready
  3. Delete the sts with orphan cascade option, so that the subsequent sts Deploy flow re-creates the sts with the expected (old) label selector

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

Add new k8s-recommended labels to etcd pods, to support backward compatibility for etcd-druid:v0.23.0.

@shreyas-s-rao shreyas-s-rao added this to the v0.23.0 milestone Jun 4, 2024
@shreyas-s-rao shreyas-s-rao requested a review from a team as a code owner June 4, 2024 15:44
@gardener-robot gardener-robot added needs/review Needs review area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/task General task labels Jun 4, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 4, 2024
@gardener-robot gardener-robot added size/l Size of pull request is large (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else labels Jun 4, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 4, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 5, 2024
@shreyas-s-rao shreyas-s-rao self-assigned this Jun 5, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 5, 2024
@shreyas-s-rao
Copy link
Contributor Author

/retest

Copy link
Contributor
@anveshreddy18 anveshreddy18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @shreyas-s-rao. I have few comments and/or questions. PTAL. Thanks

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 12, 2024
@shreyas-s-rao
Copy link
Contributor Author

@anveshreddy18 I have addressed your comments. PTAL

@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 12, 2024
@shreyas-s-rao
Copy link
Contributor Author

/retest

2 similar comments
@shreyas-s-rao
Copy link
Contributor Author

/retest

@shreyas-s-rao
Copy link
Contributor Author

/retest

Copy link
Member
@renormalize renormalize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @shreyas-s-rao!

Just one question: when downgrading etcd-druid from the changes proposed for v0.23.0 in 777 to this branch, I observe that the pods are rolled twice.

  • Once when the statefulset is patched with the older labels for the pod template before the statefulset deletion.
  • Once after the statefulset is deleted, and is recreated with the correct selectors.

Is this double rolling expected? It was my understanding that once the statefulset is recreated, it simply adopts the pods with the necessary labels.

Have a couple nits with comments for the sake of documenting in the code as well, but you can take a call and skip them, since the kind tests passing right now is a game of Russian roulette.

Thanks!

@renormalize
Copy link
Member

@shreyas-s-rao I've tested this PR with the latest changes pulled in from the refactor PR which were added after my review of this PR.

My observations while performing the test on the latest changes in 777:

  1. Deploy etcd-druid on current master HEAD. Create an Etcd.

  2. Upgrade to the branch of this PR.

    • Reconcile the Etcd.
    • The pods are rolled.
    • The statefulset's generation changes to 2. It is not deleted+recreated, and the behavior is as expected.
  3. Upgrade to the branch on the refactor PR.

    • Reconcile the Etcd.
    • Current statefulset is deleted.
    • New statefulset is created.
    • The pods are rolled.
  4. Downgrade to the branch on this PR.

    • Reconcile the Etcd.
    • Current statefulset is deleted.
    • New statefulset is created.
    • etcd-test-2 is deleted.
    • New statefulset is stuck in Ready 0/0 condition.
    • New pod etcd-test-2 is not being created.

I've tried this multiple times and I observe the same behavior. Can @anveshreddy18 take a look and replicate the tests I've performed with the latest changes pulled in from the refactor PR?

@renormalize
Copy link
Member

For some reason, I am unable to replicate the supposed issue I saw in my previous comment with
the latest changes, in the now merged #777.
Will investigate further and update the thread here.

@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 3, 2024
@shreyas-s-rao
Copy link
Contributor Author

@renormalize thanks for the review.

Is this double rolling expected?

As we tested in-person, we see that the second rolling is due to the change in sts label selector, which reverts to the old labels. This changes the sts controller revision hash label on the pods, which causes the second rolling.

Whereas, during upgrade from druid:v0.22.0 to this PR version, there is only one rolling of the pods, because the label selector on the sts remains the same. This PR simply adds new labels during the upgrade, and essentially adds old labels and updates label selector to old labels during downgrade.

I have now tested this behavior, and pushed the changes. PTAL.

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jul 3, 2024
@shreyas-s-rao
Copy link
Contributor Author

@renormalize I have also added the comments as requested in your review.

@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jul 3, 2024
@shreyas-s-rao
Copy link
Contributor Author

/retest

@renormalize renormalize self-assigned this Jul 4, 2024
@seshachalam-yv
Copy link
Contributor

/assign

Copy link
Contributor
@seshachalam-yv seshachalam-yv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shreyas-s-rao for supporting backward compatibility with v0.23.0

@ashwani2k
Copy link
Collaborator

@shreyas-s-rao & @renormalize: I've tested all the scenarios from mentioned in #804 (comment).

  1. Deploy etcd-druid on hotfix-v0.22. Create an Etcd.

    • etcd-test pods are created.
  2. Upgrade to the branch of this PR Add new labels to sts pods, for backward compatibility with v0.23.0, which includes #777 #804

    • Reconcile the Etcd.
    • The pods are rolled.
    • The statefulset's generation changes to 2. It is not deleted+recreated, and the behavior is as expected.
  3. Upgrade to the master which currently has Druid Refactor to Address Multiple Controller Conflicts #777 changes.

    • Reconcile the Etcd.
    • Current statefulset is deleted.
    • New statefulset is created.
    • The pods are rolled.
  4. Downgrade to the branch on this PR Add new labels to sts pods, for backward compatibility with v0.23.0, which includes #777 #804

    • Reconcile the Etcd.
    • it rolls the current STS and then-
    • deletes the statefulset.
    • New statefulset is created.
    • Rolls the etcd pods for the new sts.

I'm not sure but going back to v0.22 of druid will lead to rollin and then recreation of ETCD.
Let me know if you see it otherwise.

@renormalize
Copy link
Member

@ashwani2k Thanks for your testing.

What you saw is as expected from this PR.

Just to correct your wording slightly - the Etcd is not recreated.
The statefulset is recreated while downgrading from master to this PR's branch, and the pods are rolled twice in this - once before the sts recreation and once after.

Downgrading from this branch to hotfix-v0.22 would cause the pods to roll once. Neither the sts nor the Etcd are recreated.

Copy link
Member
@renormalize renormalize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@renormalize renormalize merged commit 798a9b1 into gardener:hotfix-v0.22 Jul 8, 2024
11 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jul 8, 2024
@shreyas-s-rao shreyas-s-rao deleted the enh/prep-for-pr777 branch July 9, 2024 09:42
@renormalize renormalize removed their assignment Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/task General task needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review needs/second-opinion Needs second review by someone else size/l Size of pull request is large (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0