-
Notifications
You must be signed in to change notification settings - Fork 7.4k
making the update reason and recreate annotations stable #38844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test failure in the first run is due to a weakness in our testing assumptions - and similarly causes some of the other tests be flakey as well. Here we're checking for Ready=False with a particular reason, but that is an ephemeral state. Even with using informers to pick up all status changes does not guarentee that we'll see that state - for two reasons:
|
I would rather stick to integration test in this case, feels too "artificial" to mock statefulset controller. |
There are of course tradeoffs. The reason the mocking here is so complete is to ensure that we generate intermediate states for the statefulset to match our test expectations looking for things like Ready=False - as explained above those are actually a little problematic in any case. We could simplify the assertions and the mock logic instead of simulating up / down handling could just make the new statefulset seem immediately available. In the end it's about the amount of coverage for how long the test takes. As written, the api server version of the test runs in less than 20 seconds. The integration test currently takes 1-2 minutes. I can tweak the integration test to instead disable probes (we don't really need the server to come up for this test), but it will still take nearly twice as long as the apiserver test - without any additional increase in coverage from the perspective of the operator. Another benefit of the apiserver testing locally is that it's faster to restart testing - there's no namespace to cleanup / setup, and no database needed. The last thought is that these testing approaches don't have to be mutually exclusive. The test logic here is the same whether it's being run against the apiserver only or a full kubernetes - if we're sufficiently clever, then we could have a single definition of a test, but three testing modes local, remote, and apiserver. |
// version 22 changed the match labels, account for older versions | ||
if (!existingDeployment.isMarkedForDeletion() && !hasExpectedMatchLabels(existingDeployment, primary)) { | ||
context.getClient().resource(existingDeployment).lockResourceVersion().delete(); | ||
Log.info("Existing Deployment found with old label selector, it will be recreated"); | ||
} | ||
|
||
baseDeployment.getMetadata().getAnnotations().put(Constants.KEYCLOAK_UPDATE_REASON_ANNOTATION, ContextUtils.getUpdateReason(context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is updating the metadata safe without using the "builder" classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. We don't need another copy because the desired is a working copy.
We also want to avoid using the actual state as a whole:
Line 568 in 0443cc5
var builder = actual.toBuilder(); |
The operator sdk was allowing this to happen - but should not have been (we already removed similar logic from fabric8). When using server side apply there could other stuff contributed by other controllers. Applying all of the actual state will make our operator assume ownership of everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation.
var updateCondition = assertUnknownUpdateTypeStatus(kc); | ||
|
||
// fake statefulset controller | ||
k8sclient.apps().statefulSets().withName(KeycloakDeploymentDependentResource.getName(kc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close()
is never invoked on the informer. Does it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written no, because we just have a single test. To be more correct, it should be moved to before all.
if (!Objects.equals(revision, obj.getStatus().getUpdateRevision())) { | ||
actualReplicas = 0; | ||
} else if (replicas == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this branch be removed?
If the replicas < status.replicas then you decrement the number of replicas, otherwise you increment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the revision doesn't match the updateRevision, then we are rolling in another spec change. Setting the replicas to 0 represents taking everything offline, then we let subsequent events handle the roll up. Hybrid of serial and parallel handling.
You don't have to have a replica change with a spec change, so the other handling with incrementing / decrementing won't do anything if the replicas are left the same in the CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! forgot about the rolling update 🤦
This is what I didn't figure out when I tried to fix the annotations here and it caused the test to fail more often 🤣 #38588 (comment)
It makes it easy to debug, which is a problem I had when trying to debug things. I suggest having the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
closes: keycloak#38487 Signed-off-by: Steve Hawkins <shawkins@redhat.com>
Signed-off-by: Steve Hawkins <shawkins@redhat.com>
To keep this pr moving quickly, I've pulled out the apiserver test approach - I'll refine that and bring it back in another pr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closes: #38487
The second commit shows what this test looks like against just the api server - that can be left off of this commit if desired as it has the obvious drawback of needing to emulate a statefulset controller. @pruivo @vmuzikar @mabartos what do you think?
The core fix is that we want to preserve the new annotations when possible from the existing deployment - they are still relevant on the current status even when the rolling or recreate has completed.