fix: excluding update commands from reaug #38899

shawkins · 2025-04-13T11:46:16Z

@ahus1 @pruivo @vmuzikar to align better with some of what we talked about in the meeting, the refactoring I want to do with #38514, and the thoughts on #38894, I would like to skip reaug, and also not use the persisted properties when --optimized is not specified.

While this introduces yet another command behavior that is distinct from our other behaviors, it is more inline with our intuition about how the command should function.

pruivo · 2025-04-14T12:49:11Z

@shawkins let me refresh my memory for a second by asking

If --optimized is set and the build time configuration is different, does the update-compatibility fail?
If --optimized is not set, it ignores the persisted configuration. It is similar to start behaviour, which discards the persisted configuration to rebuild, right?

shawkins · 2025-04-14T12:53:14Z

If --optimized is set and the build time configuration is different, does the update-compatibility fail?

Yes, it will fail.

If --optimized is not set, it ignores the persisted configuration. It is similar to start behaviour, which discards the persisted configuration to rebuild, right?

Like start, but we don't alter the persisted properties / augmented state.

pruivo

Thanks @shawkins. It sounds goot to me 👍

ahus1

I tested this locally, and to me the new behavior as observed on the command line is how I expect it to work. Thank you very much, I like it!

I got a question about the note you added, pleaese see below.

ahus1 · 2025-04-17T08:17:12Z

docs/guides/server/update-compatibility.adoc

+NOTE: make sure the build-time state of your `update-compatibility` command matches the build-time state of the corresponding `build` or `start` command you are checking the compatibility of.
+


We already have this paragraph:

keycloak/docs/guides/server/update-compatibility.adoc

Lines 67 to 72 in 2091e05

[WARNING]

====

Ensure that all configuration options, whether set via environment variables or CLI arguments, are included when running the above command.

Omitting any configuration options results in incomplete metadata, and could lead to a wrong reported result in the next step.

====

Adding it here seems to be a bit out-of-context here for me. Is there something missing below that you want to add?

BTW, "build-time state" might not be something people have a shared understand what it is, and how to ensure it. If the statement needs extension that we use below, can it be shifted in how to ensure the build-state matches?

We can just omit the new note. I had added something based upon the reaugmentation possibility, then refined it for the new logic - but should have removed it completely.

I've updated the pr.

ahus1

Thank you @shawkins - ready to be merged from my side.

I see that @vmuzikar added himself, so not merging it yet.

vmuzikar

@shawkins Sorry for the late reply and thank you for the PR!

I was playing with the scenario @pruivo mentioned:

If --optimized is set and the build time configuration is different, does the update-compatibility fail?

I noticed it truly fails but it fails, however due to --optimized it fails with rather odd message, e.g.:

Build time option: '--db' not usable with metadata

But we can address that as a follow-up.

Other thing I was wondering is if there are any side effects that we skip the reaug but still basically run the server for update-compatibility. What I have in mind is what if someone stops the server, changes some custom providers and runs update-compatibility. Then we would run it in an incorrect (outdated) state, no?

ahus1 · 2025-04-23T16:11:55Z

What I have in mind is what if someone stops the server, changes some custom providers and runs update-compatibility. Then we would run it in an incorrect (outdated) state, no?

@vmuzikar - I think this is outside of the documented use, and therefore not supported. So I'm ok to merge it as is.

shawkins · 2025-04-23T16:12:22Z

Build time option: '--db' not usable with metadata

metadata is just the command name, but I agree that starts to loose context for our sub-commands. It also would be clearer if it also referenced --optimized

As you say this could be done in a different issue, because this is the message we currently display for any command, not just the update ones.

Other thing I was wondering is if there are any side effects that we skip the reaug but still basically run the server for update-compatibility.

We don't run the server for update compatibility - it's all handled on the picocli side of things and exits before there is a quarkus launch.

What I have in mind is what if someone stops the server, changes some custom providers and runs update-compatibility. Then we would run it in an incorrect (outdated) state, no?

@ahus1 @vmuzikar the concern here is the user has effectively put 8000 the classloading index into a bad state wrt CompatibilityMetadataProvider classes. There are three cases:

If you have changed a CompatibilityMetadataProvider class, things may still work as expected.
If you have removed a CompatibilityMetadataProvider class, then you'll see a class loading error.
If you have added a CompatibilityMetadataProvider class, then that won't contribute to the update compatibilty check.

So if CompatibilityMetadataProvider classes are user contributable, then this is a valid concern - albeit one that only happens in a narrow circumstance. If we were to run the update commands in dev mode with the QuarkusClassLoader instead of the RunnerClassLoader, I think we could address this - but that would require some handling in the kc.sh/bat scripts.

For the operator the only circumstance this could happen relies upon the unsupported pod template:

user users the unsupported PodTemplate to turn the providers directory into a mount such that provider jars can be changed
user marks the cr as optimized=false

ahus1 · 2025-04-23T16:17:21Z

For the operator the only circumstance this could happen relies upon the unsupported pod template:

user creates an optimized image

user users the unsupported PodTemplate to turn the providers directory into a mount such that provider jars can be changed

user marks the cr as optimized=false

If the user marks the CR as non-optimized, the update-compatibility command would also run as non-optimized, as would the start command. How would that lead to results where the start command sees some different configuration from the start command?

shawkins · 2025-04-23T16:24:37Z

How would that lead to results where the start command sees some different configuration from the start command?

The update command without re-augmentation can only see the classes in the provider directory up to the last time a augmentation was run. If you use a default image and mount in some provider jars for example - the start command will pick those up, but the update command will not.

pruivo · 2025-04-23T16:31:02Z

The update command without re-augmentation can only see the classes in the provider directory up to the last time a augmentation was run

@shawkins are you limiting/configuring the classloader during reaug? If the jar is in the classpath, it should be visible without reaug.

shawkins · 2025-04-23T16:34:37Z

@shawkins are you limiting/configuring the classloader during reaug? If the jar is in the classpath, it should be visible without reaug.

The quarkus RunnerClassLoader supercedes normal classpath handling - https://quarkus.io/guides/maven-tooling#quarkus-core_quarkus-package-jar-user-providers-directory

EDIT: looking at the RunnerClassLoader impl it does eventually delegate to the parent classloader. So if we added the provider jars to the classpath built by kc.sh / kc.bat the behavior could be:

If you have changed a CompatibilityMetadataProvider class, things may still work as expected.
If you have removed a CompatibilityMetadataProvider class, then you'll see a class loading error - because the entry still exists in the index.
If you have added a CompatibilityMetadataProvider class, then that should get picked up (still need to cofirm that).

Not sure about the memory / lookup cost of adding additional classpath for the RunnerClassLoader to fall back to.

ahus1 · 2025-04-23T16:55:14Z

(invalid)

Some new questions aroused when testing

shawkins · 2025-04-23T17:06:43Z

I gave it a try with this PR branch, and the following would detect the changed provider and would update the reaug after a message "Changes detected in configuration. Updating the server image." and will recognize the new provider:

@ahus1 That doesn't appear to have been using this branch. The Picocli.shouldSkipRebuild check will return true, and we'll simply exit rather than checking / performing a reaugmentation.

The behavior you are describing is consistent with the current state of main - that the update command will reaugment when needed.

8000

shawkins · 2025-04-23T17:27:36Z

Since the classloading workarounds are not fully effective:

dev mode knows about changes to provider jars, it does not currently detect new provider jars
adding classpath entries may introduce some overhead generally and still doesn't help with the case where a CompatibilityMetadataProvider is removed - but used to have, and I believe still do, places where we effectively catch class not found exceptions due in situations like this. I'm not sure how this would look with a ServiceLoader.

I think we can just go back to the previous conculsion and let the update commands reaugment, and just add a small tweak to the docs letting users to be careful when not using --optimized like with the import / export commands.

ahus1 · 2025-04-23T17:27:53Z

The behavior you are describing is consistent with the current state of main - that the update command will reaugment when needed.

OK, I stand corrected, apparently I fumbled with the checkout of the correct branch.

I now see that the following does not pick up the provider as expected as you described:

bin/kc.sh build
cp compat-provider-1.0-SNAPSHOT.jar providers/
bin/kc.sh update-compatibility metadata --file=/tmp/file.json

And while I think that this wouldn't happen in an Operator as long as you don't use a podTemplate, this might happen in other situations where you have a custom image build pipeline where you at some point optimize the image, and the later run it in a non-optimized way because you just added a provider or DB driver.

There is another case where you copy providers to an image that was never optimized, where the command would never pick up the provider. And such an image would be perfectly find to be used as a non-optimized image with the Operator:

# freshly unzipped Keycloak
cp compat-provider-1.0-SNAPSHOT.jar providers/
bin/kc.sh update-compatibility metadata --file=/tmp/file.json

Given that, I would rather stick with the existing behavior, than opening this corner case.

Sorry for changing my mind again, and possibly adding more work. If you decide to abandon or park this efforts, I would understand that. Let me know how you want to proceed.

ahus1 · 2025-04-23T17:30:39Z

I think we can just go back to the previous conculsion and let the update commands reaugment, and just add a small tweak to the docs letting users to be careful when not using --optimized like with the import / export commands.

+1 for that

vmuzikar · 2025-04-24T08:09:01Z

Ouch, sorry for sparking another round of discussions around the approach here. :D

I think we can just go back to the previous conculsion and let the update commands reaugment, and just add a small tweak to the docs letting users to be careful when not using --optimized like with the import / export commands.

+1

Just to add, it's not just about CompatibilityMetadataProvider as it could use some other custom providers as well, so it's even more generic problem.

shawkins · 2025-04-24T10:52:57Z

Just to add, it's not just about CompatibilityMetadataProvider as it could use some other custom providers as well, so it's even more generic problem.

For the update commands it is specifically about CompatibilityMetadataProvider. Providers in general are not initialized.

shawkins · 2025-04-24T10:59:36Z

@ahus1 @pruivo (and @vmuzikar) looking more at CompatibilityMetadataProvider - it is in the private spi currently, are there plans to make it public?

If so, it does lack an init method to provide it with a stateful way to get the configuration - it would have to rely on static methods on classes like Config, Profile, and Version, which are not well designed for this purpose.

ahus1 · 2025-04-24T16:36:23Z

@shawkins - yes, eventually we want to make it public, and before we do so we can change the API as needed.

We need to solve the following problem here: We don't want to initialize the regular providers via the Keycloak session factory, as doing so would establish connections to the database, Infinispan and maybe even migrate the database.

So this SPI is independent from the KC session factory. This also prevents us from injecting a provider specific scoped configuration to those CompatibilityMetadataProviders. This is why they don't have an init() method.

As a fallback and good-enough-for-now, we're using Config, Profile, and Version to directly access the configuration. This way, an Infinispan related CompatibilityMetadataProvider might look at the configurations of other Infinispan providers and make sense out of that configuration. Depending on how much configuration we then end up using, we might inject an unscoped configuration.

At the moment we think we're only looking into very few profiles and versions, and the number of configurations is minimal. Let's see how it evolves, and we'll keep an eye on it when we implement #38862 - cc @ryanemerson

shawkins · 2025-04-24T18:30:18Z

So this SPI is independent from the KC session factory. This also prevents us from injecting a provider specific scoped configuration to those CompatibilityMetadataProviders. This is why they don't have an init() method.

I'm not suggesting that it should be scoped in the same sense as providers, just that it should be stateful.

We're discussing here what access to the "root" or non-scoped configuration would look like from the existing Scope interface. The current thinking is that the root would still just be a Scope (although in that draft pr it's actually specific to kc. properties). If you have any additional thoughts about how this may align to the update metadata needs (such as access to quarkus or even smallrye specific properties) please add a comment there.

ahus1 · 2025-04-24T19:00:34Z

I'm not suggesting that it should be scoped in the same sense as providers, just that it should be stateful.

IMHO it would make this simpler for testing, though I didn't see a reasoning in this or the other PR of the expected benefits. While we seems to agree, it would be good to write them down in an issue eventually.

shawkins · 2025-04-28T13:43:10Z

Updated to just be a docs change. The other thought is that when --optimized is specified, then we should not use the current augmentation if the persisted properties were auto-created. I'm not sure if that would be seen as a breaking change, but it would remove the need for this warning.

cc @ahus1 @pruivo @vmuzikar

vmuzikar · 2025-04-29T13:57:20Z

docs/guides/server/update-compatibility.adoc

@@ -86,6 +86,8 @@ If you are upgrading to a new {project_name} version, this command must be execu
 Failure to meet these requirements results in an incorrect outcome.
 ====

+NOTE: If you do not use `--optimized` keep in mind that an `update` command may implicitly create or update an optimized build for you - if you are running the command from the same machine as a server instance, this may impact the next start of your server.


Isn't this note also relevant to the metadata command?

It's applicable to both update metadata and update check.

Then I'd move this note a bit higher, below the first warning box in Determining the update strategy for an updated configuration chapter. The way it currently is it implies the note is relevant only to the check command.

@vmuzikar just to double check, you are in favor of proceeding with these warnings instead of changing the --optimized logic to not use auto builds correct?

Yes, having it just as a docs change is fine from my perspective. My suggestion in this thread is just moving the note/warning to a different place in the guide so it's clear it applies both to metadata and check commands.

Ok, moved where the warning is at.

also correcting the language across the warnings closes: keycloak#38662 Signed-off-by: Steve Hawkins <shawkins@redhat.com>

vmuzikar

@shawkins LGTM, thank you.

Considering @ahus1 and @pruivo approved previously and the gist of it remains the same, I'm merging the PR.

also correcting the language across the warnings closes: keycloak#38662 Signed-off-by: Steve Hawkins <shawkins@redhat.com>

shawkins requested review from a team as code owners April 13, 2025 11:46

keycloak-github-bot bot added team/cloud-native labels Apr 13, 2025

pruivo approved these changes Apr 14, 2025

View reviewed changes

vmuzikar self-requested a review April 14, 2025 13:17

ahus1 reviewed Apr 17, 2025

View reviewed changes

shawkins force-pushed the iss38662 branch from 2091e05 to e04855c Compare April 17, 2025 12:03

ahus1 previously approved these changes Apr 17, 2025

View reviewed changes

vmuzikar reviewed Apr 23, 2025

View reviewed changes

shawkins marked this pull request as draft April 24, 2025 13:38

shawkins mentioned this pull request Apr 26, 2025

Optimized startup fails from kc.spi-connections-http-client-default-expect-continue-enabled passed at runtime #39063

Closed

2 tasks

shawkins force-pushed the iss38662 branch from e04855c to 1bb3332 Compare April 28, 2025 13:40

shawkins marked this pull request as ready for review April 28, 2025 13:40

shawkins requested review from vmuzikar, ahus1 and pruivo April 28, 2025 13:43

ahus1 previously approved these changes Apr 28, 2025

View reviewed changes

ahus1 self-assigned this Apr 28, 2025

vmuzikar reviewed Apr 29, 2025

View reviewed changes

shawkins requested a review from vmuzikar April 29, 2025 17:49

shawkins dismissed ahus1’s stale review via 148bc36 April 30, 2025 20:19

shawkins force-pushed the iss38662 branch from 1bb3332 to 148bc36 Compare April 30, 2025 20:19

pruivo approved these changes Apr 30, 2025

View reviewed changes

fix: adds a warning about auto-build behavior

5a8b9a7

also correcting the language across the warnings closes: keycloak#38662 Signed-off-by: Steve Hawkins <shawkins@redhat.com>

shawkins force-pushed the iss38662 branch from 148bc36 to 5a8b9a7 Compare April 30, 2025 21:14

vmuzikar approved these changes May 5, 2025

View reviewed changes

vmuzikar merged commit 3e05f67 into keycloak:main May 5, 2025
54 checks passed

InJoDave pushed a commit to InJoDave/keycloak that referenced this pull request May 6, 2025

fix: adds a warning about auto-build behavior (keycloak#38899)

236c360

also correcting the language across the warnings closes: keycloak#38662 Signed-off-by: Steve Hawkins <shawkins@redhat.com>

shawkins added a commit to shawkins/keycloak that referenced this pull request May 7, 2025

fix: adds a warning about auto-build behavior (keycloak#38899)

29a71d8

also correcting the language across the warnings closes: keycloak#38662 Signed-off-by: Steve Hawkins <shawkins@redhat.com>

		NOTE: make sure the build-time state of your `update-compatibility` command matches the build-time state of the corresponding `build` or `start` command you are checking the compatibility of.

	[WARNING]
	====
	Ensure that all configuration options, whether set via environment variables or CLI arguments, are included when running the above command.

	Omitting any configuration options results in incomplete metadata, and could lead to a wrong reported result in the next step.
	====

fix: excluding update commands from reaug #38899

fix: excluding update commands from reaug #38899

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!