8000 Multi-stage docker builds fail --optimized validation · Issue #38893 · keycloak/keycloak · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Multi-stage docker builds fail --optimized validation #38893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 2 tasks
MarcusDunn opened this issue Apr 12, 2025 · 54 comments · Fixed by #39215 or #39340 · May be fixed by bitnami/charts#33492
Closed
1 of 2 tasks

Multi-stage docker builds fail --optimized validation #38893

MarcusDunn opened this issue Apr 12, 2025 · 54 comments · Fixed by #39215 or #39340 · May be fixed by bitnami/charts#33492
Assignees
Labels
area/dist/quarkus kind/bug Categorizes a PR related to a bug missing/docs Documentation is missing priority/important Must be worked on very soon release/26.2.3 release/26.3.0 team/cloud-native

Comments

@MarcusDunn
Copy link
MarcusDunn commented Apr 12, 2025

Before reporting an issue

  • I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

Area

dist/quarkus

Describe the bug

Running a dockerfile with --optimized that works with earlier versions fails on newer ones. I believe the commit is: 332bf12.

The error is: A provider JAR was updated since the last build, please rebuild for this to be fully utilized.

This may be similar to #37770, but they seemed to have resolved the issue.

Version

26.2

Regression

  • The issue is a regression

Expected behavior

This docker image can be run fine with --optimized

Actual behavior

It fails with ERROR: A provider JAR was updated since the last build, please rebuild for this to be fully utilized.

How to Reproduce?

I have the following dockerfile leveraging keycloakify:

FROM node:22@sha256:0e910f435308c36ea60b4cfd7b80208044d77a074d16b768a81901ce938a62dc AS keycloakify_jar_builder

RUN wget -O - https://apt.corretto.aws/corretto.key | gpg --dearmor -o /usr/share/keyrings/corretto-keyring.gpg && \
    echo "deb [signed-by=/usr/share/keyrings/corretto-keyring.gpg] https://apt.corretto.aws stable main" | tee /etc/apt/sources.list.d/corretto.list && \
    apt-get update && \
    apt-get install -y java-21-amazon-corretto-jdk && \
    apt-get install -y maven;

COPY package.json package-lock.json /opt/app/

WORKDIR /opt/app

RUN npm ci

COPY . /opt/app/

RUN npm run build-keycloak-theme

FROM quay.io/keycloak/keycloak:26.2@sha256:87758ff2293c78c942c7a1f0df2bc13e0f943fcf0c0d027c12fdfac54a35d93b

WORKDIR /opt/keycloak
COPY --from=keycloakify_jar_builder /opt/app/dist_keycloak/keycloak-theme-for-kc-all-other-versions.jar /opt/keycloak/providers/
RUN /opt/keycloak/bin/kc.sh build --db=postgres --health-enabled=true --metrics-enabled=true --tracing-enabled=true --features=opentelemetry:v1,multi-site:v1
ENTRYPOINT ["/opt/keycloak/bin/kc.sh"]

After building, running docker run 24f554b83ada start --optimized yeilds:

ERROR: A provider JAR was updated since the last build, please rebuild for this to be fully utilized.

I am not sure what causes this. I think it may have to do with docker COPY messing with timestamps, but don't know enough to be sure.

For now I can run unoptimized, but this is not ideal.

Anything else?

Running docker run <image hash> show-config yeilds:

Current Mode: production
Current Configuration:
        kc.health-enabled =  true (Persisted)
        kc.provider.file.keycloak-theme-for-kc-all-other-versions.jar.last-modified =  1744417140193 (Persisted)
        kc.log-level-org.jboss.resteasy.resteasy_jaxrs.i18n =  WARN (classpath application.properties)
        kc.log-level-io.quarkus.arc.processor.BeanArchives =  off (classpath application.properties)
        kc.log-level-io.quarkus.deployment.steps.ReflectiveHierarchyStep =  error (classpath application.properties)
        kc.tracing-enabled =  true (Persisted)
        kc.log-level-org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup =  WARN (classpath application.properties)
        kc.tracing-jdbc-enabled =  true (Persisted)
        kc.log-level-io.quarkus.config =  off (classpath application.properties)
        kc.log-console-output =  default (classpath application.properties)
        kc.metrics-enabled =  true (Persisted)
        kc.log-level-io.quarkus.arc.processor.IndexClassLookupUtils =  off (classpath application.properties)
        kc.db =  postgres (Persisted)
        kc.log-level-io.quarkus.hibernate.orm.deployment.HibernateOrmProcessor =  warn (classpath application.properties)
        kc.optimized =  true (Persisted)
        kc.version =  26.2.0 (SysPropConfigSource)
        kc.features =  opentelemetry:v1,multi-site:v1 (Persisted)
        kc.run-in-container =  true (ENV)
@shawkins
Copy link
Contributor

I am not sure what causes this. I think it may have to do with docker COPY messing with timestamps, but don't know enough to be sure.

I don't think we've seen this behavior before. But our example shows using ADD, instead of COPY, so that the ownership of the provider jar can also be set - maybe there's a difference between ADD and COPY coming into play here https://www.keycloak.org/server/containers#_writing_your_optimized_keycloak_containerfile

kc.provider.file.keycloak-theme-for-kc-all-other-versions.jar.last-modified = 1744417140193 (Persisted)

Out of curiosity could you also determine what the actual last-modified time of the provider jar is at runtime?

@bahaa
Copy link
bahaa commented Apr 12, 2025

I'm not sure this is the cause or not. But I face the same error when I try to build the image on a machine that has Docker version 24.0.2, build cb74dfc, but when I tried to build the same image on a machine with Docker version 27.5.1, build 9f9e405 or Docker version 28.0.4, build b8034c0ed7 it worked just fine.

@MarcusDunn
Copy link
Author
MarcusDunn commented Apr 12, 2025

I'm not sure this is the cause or not. But I face the same error when I try to build the image on a machine that has Docker version 24.0.2, build cb74dfc, but when I tried to build the same image on a machine with Docker version 27.5.1, build 9f9e405 or Docker version 28.0.4, build b8034c0ed7 it worked just fine.

I'm running one of your "working" versions, which is super weird.

docker -v
Docker version 27.5.1, build 9f9e405

Out of curiosity could you also determine what the actual last-modified time of the provider jar is at runtime?

I'm not sure of an easy way to inspect the modified time at runtime. But these are the stats. (Access and Modify matches the unixtime in the config, Change and Birth are after)

stat keycloak-theme-for-kc-all-other-versions.jar 
  File: keycloak-theme-for-kc-all-other-versions.jar
  Size: 1763566         Blocks: 3448       IO Block: 4096   regular file
Device: 7ch/124d        Inode: 33870099    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-04-12 00:19:00.000000000 +0000
Modify: 2025-04-12 00:19:00.000000000 +0000
Change: 2025-04-12 00:35:45.753114012 +0000
Birth: 2025-04-12 00:35:45.747114012 +000

I don't think we've seen this behavior before. But our example shows using ADD, instead of COPY, so that the ownership of the provider jar can also be set - maybe there's a difference between ADD and COPY coming into play here https://www.keycloak.org/server/containers#_writing_your_optimized_keycloak_containerfile

I think I have to COPY, ADD will not work with multi-stage builds.

@MarcusDunn
Copy link
Author
MarcusDunn commented Apr 12, 2025

found a jankier but functional workaround:

# NEW
RUN touch -m --date=@1744481906 /opt/keycloak/providers/keycloak-theme-for-kc-all-other-versions.jar
RUN /opt/keycloak/bin/kc.sh build --db=postgres --health-enabled=true --metrics-enabled=true --tracing-enabled=true --features=opentelemetry:v1,multi-site:v1
ENTRYPOINT ["/opt/keycloak/bin/kc.sh"]

I don't see many drawbacks of this.

@bahaa
Copy link
bahaa commented Apr 12, 2025

@shawkins changing COPY to ADD --chown=keycloak:keycloak --chmod=644 ... didn't fix the issue for me.

@dejwsz
Copy link
dejwsz commented Apr 15, 2025

For us this helped in our multi-stage Dockerfile:

  1. Copy everything like provider, themes with chown: COPY --chown=keycloak:keycloak ...
  2. Touch the provider jar and fix its timestamp back in time before optimized build like:
RUN touch -m -t "$(date -u -d '1 hour ago' +%Y%m%d%H%M.%S)" /opt/keycloak/providers/my-test-spi-jar-with-dependencies.jar && /opt/keycloak/bin/kc.sh build

Chown is required to be able to modify the jar timestamp, otherwise you get permission denied error.

@bdovaz
Copy link
Contributor
bdovaz commented Apr 15, 2025

For us this helped in our multi-stage Dockerfile:

  1. Copy everything like provider, themes with chown: COPY --chown=keycloak:keycloak ...
  2. Touch the provider jar and fix its timestamp back in time before optimized build like:
RUN touch -m -t "$(date -u -d '1 hour ago' +%Y%m%d%H%M.%S)" /opt/keycloak/providers/my-test-spi-jar-with-dependencies.jar && /opt/keycloak/bin/kc.sh build

Chown is required to be able to modify the jar timestamp, otherwise you get permission denied error.

It has worked for us but this indicates that somehow it has to be a regression created in 26.2.0.... It doesn't make sense that this has to be done when it was not necessary until now.

@shawkins
Copy link
Contributor

it has to be a regression created in 26.2.0

I wouldn't call this a regression as it's working as intended #34665

It seems more like an issue with specific versions of Docker not maintaining file timestamps.

@dejwsz
Copy link
dejwsz commented Apr 15, 2025

I did not analyze it too deep but it looks like it is related somehow to the new version of quarkus used in this new release, so 3.17.x (#36501). The reaugmentation process (https://quarkus.io/guides/reaugmentation) which depends on involved files timestamps, I can see in the keycloak sources the flag is used builder.setProperty("quarkus.launch.rebuild", "true"); - so any rebuild should be done during the process? Maybe here in a multi-stage dockerfile setup this "manual" copy of a provider jar messes up too much with the file timestamp and this makes the process break ?

System.setProperty("quarkus.launch.rebuild", "true");

I mean the image was built just ok for us, only during a keycloak bootstrap we had the message which ended up as error, so it did not start at all because of this.

@shawkins
Copy link
Contributor

@dejwsz the checking of provider jar timestamps is purely on the Keycloak side of things #34665

@dejwsz
Copy link
dejwsz commented Apr 15, 2025

it has to be a regression created in 26.2.0

I wouldn't call this a regression as it's working as intended #34665

It seems more like an issue with specific versions of Docker not maintaining file timestamps.

Interesting is with 26.1.5 we did not have the issue at all - our Dockerfile was exactly the same with exception to keycloak base image version.

@shawkins
Copy link
Contributor

Interesting is with 26.1.5 we did not have the issue at all - our Dockerfile was exactly the same with exception to keycloak base image version.

#34665 was added in 26.2.

@dejwsz
Copy link
dejwsz commented Apr 15, 2025

So to summarize (if I understood it well):

  1. If a provider jar is built in a different stage and we copy it, we just need to modify its timestamp to fool the keycloak / quarkus build process ? no other way 8000 to go here
  2. Build a provider jar and keycloak image in a one, common stage and it should work then just fine ?

@dejwsz
Copy link
dejwsz commented Apr 15, 2025

it has to be a regression created in 26.2.0

I wouldn't call this a regression as it's working as intended #34665

It seems more like an issue with specific versions of Docker not maintaining file timestamps.

I guess you may be right, we made those builds in gh runners with docker engine 26.x which was reported with this kind of problems. They plan to upgrade it soon: actions/runner-images#11766 what should solve this problem I guess. We did not have the issue in our local system using one of the latest 28.x docker engine.

@bahaa
Copy link
bahaa commented Apr 15, 2025

BTW, upgrading Docker Engine to 28.x didn't fix the issue. Here's the output of docker version on the machine that has the issue:

Client: Docker Engine - Community
Version:           28.0.4
API version:       1.48
Go version:        go1.23.7
Git commit:        b8034c0
Built:             Tue Mar 25 15:07:11 2025
OS/Arch:           linux/amd64
Context:           default
Server: Docker Engine - Community
Engine:
  Version:          28.0.4
  API version:      1.48 (minimum version 1.24)
  Go version:       go1.23.7
  Git commit:       6430e49
  Built:            Tue Mar 25 15:07:11 2025
  OS/Arch:          linux/amd64
  Experimental:     false
containerd:
  Version:          1.7.27
  GitCommit:        05044ec0a9a75232cad458027ca83437aae3f4da
runc:
  Version:          1.2.5
  GitCommit:        v1.2.5-0-g59923ef
docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

@dejwsz
Copy link
dejwsz commented Apr 15, 2025

so my assumption about docker version was wrong in that case - anyway the solution we used shoud not break anything so we will live with it for the moment

@dejwsz
Copy link
dejwsz commented Apr 16, 2025

To add more, we tried to build our provider jar and keycloak in the same stage and it also did not help, still the same error. It looks like a docker engine version also has nothing to do with it. So still a kind of puzzle. Anyway timestamp modification for the provider's jar helps.

@ibauersachs
Copy link
Contributor

We also ran into this issue and are currently working around it by modifying the file timestamp.

In any case, IMO file timestamps shouldn't be used for this kind of verification. A hash of the jar would be much better suited, or since it's not security relevant here, even a CRC would be enough and more reliable.

@dejwsz
Copy link
dejwsz commented Apr 16, 2025

@dejwsz the checking of provider jar timestamps is purely on the Keycloak side of things #34665

so maybe this way of solving it is not the right one then, like on relying on timestamps only

@shawkins
Copy link
Contributor

Looking into to this more there are some references that a plain copy or add operation does indeed have counterintuative affects on the mtime of files - it will be one value at build time, then in the image it will be stored as the image creation time.

There are a couple of suggestions for making thing consistent. From https://docs.docker.com/build/ci/github-actions/reproducible-builds/ you can use SOURCE_DATE_EPOCH, it also appears that COPY --link preserves the original file attributes.

Seems like we do need to update our example.

In any case, IMO file timestamps shouldn't be used for this kind of verification. A hash of the jar would be much better suited, or since it's not security relevant here, even a CRC would be enough and more reliable.

I am not sure what motivated the original decision to use timestamps. I believe that it was considered good enough for most most circumstances with out any performance implications. A CRC32 or other fast hash could prevent false positives from changing timestamps, but we'd need to vet the performance. I agree that there aren't security concerns here with a non-cryptographic given that the primary issue would be having something sensitive in the provider jars.

However this situation does seem to be more of a nuance of Docker that can be addressed in other ways.

@fruwe
Copy link
fruwe commented Apr 16, 2025

I'm experiencing the same issue using custom providers in a Docker-based Keycloak deployment.

When I run kc.sh show after a kc.sh build inside a Dockerfile, I get a config line like this:

kc.provider.file.sentry-8.8.0.jar.last-modified =  1744801494000 (Persisted)

However, when I later run kc.sh show-config inside a container started from that image (via docker compose), I see:

kc.provider.file.sentry-8.8.0.jar.last-modified =  1744801494737 (Persisted)

This results in a non-reproducible config and causes Keycloak to detect changes and re-trigger updates at runtime, even though no actual change occurred in the JAR.

After some investigation, it seems the root cause might be Docker’s image layering or file copy mechanism truncating or altering file timestamps slightly during docker build. The same file appears with a slightly different modification time inside the container at runtime.

Would it be possible for Keycloak to:

  • Normalize timestamps (e.g. truncate to seconds) during the build phase, or
  • Optionally skip persisting .last-modified metadata, especially if no changes are detected?

This would improve reproducibility in CI/CD pipelines and prevent unnecessary reconfiguration or rebuilds when using Docker.

@fruwe
Copy link
fruwe commented Apr 16, 2025

PS: I used this as a workaround:

RUN find /opt/keycloak/providers -name '*.jar' -exec bash -c ' \
  for f; do \
  epoch_sec=$(stat -c %Y "$f"); \
  touch -d @"$epoch_sec" "$f"; \
  done' bash {} +

@MarcusDunn
Copy link
Author

from a keycloak perspective, would it be possible to use a checksum of the jar instead of a timestamp? This seems to capture the intent of the check better, but comes with some overhead.

@shawkins
Copy link
Contributor

from a keycloak perspective, would it be possible to use a checksum of the jar instead of a timestamp? This seems to capture the intent of the check better, but comes with some overhead.

Beyond the workarounds that can be done on the docker side of things, I see a couple of options for Keycloak:

  1. use a checksum instead. My concern there is that isn't generally needed and adds startup cost that really isn't needed here either.
  2. A system property or option to indicate that the provider directory is read-only to skip this check.
  3. Change the check from an error to a warning - the worst case is that the augmentation state is out-of-date, so people putting in new jars and miss this warning could be initially confused when they see their stuff either not used, or class loading errors.

@mabartos mabartos linked a pull request Apr 28, 2025 that will close this issue
shawkins added a commit to shawkins/keycloak that referenced this issue Apr 29, 2025
closes: keycloak#38893

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
shawkins added a commit to shawkins/keycloak that referenced this issue Apr 29, 2025
closes: keycloak#38893

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
shawkins added a commit to shawkins/keycloak that referenced this issue Apr 29, 2025
closes: keycloak#38893

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
vmuzikar pushed a commit that referenced this issue Apr 30, 2025
closes: #38893

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
vmuzikar pushed a commit that referenced this issue Apr 30, 2025
* fix: documenting known issues with docker

closes: #38801 #38893



* Update docs/guides/server/containers.adoc




---------




(cherry picked from commit 68096ee)

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
Signed-off-by: Steven Hawkins <shawkins@redhat.com>
Co-authored-by: Martin Bartoš <mabartos@redhat.com>
shawkins added a commit to shawkins/keycloak that referenced this issue Apr 30, 2025
…39340)

closes: keycloak#38893

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
(cherry picked from commit 0ff4cce)
shawkins added a commit to shawkins/keycloak that referenced this issue Apr 30, 2025
…39340)

closes: keycloak#38893

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
(cherry picked from commit 0ff4cce)
ahus1 pushed a commit that referenced this issue Apr 30, 2025
closes: #38893


(cherry picked from commit 0ff4cce)

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
@pskopek pskopek added the kind/bug Categorizes a PR related to a bug label May 5, 2025
msvechla added a commit to msvechla/charts-1 that referenced this issue May 7, 2025
With the latest Keycloak version, the timestamps of providers is compared to the timestamp of the latest provider build.

When starting keycloak with the `--optimized` flag, keycloak will throw an error when the timestamp of the providers is more recent than it was during the build.

> A provider JAR was updated since the last build, please rebuild for this to be fully utilized.

As we do not preserve the timestamp in the init script, this error is always thrown when starting keycloak as optimized.

This is fixed by preserving the timestamp in the initContainer `cp` commands.

see: keycloak/keycloak#38893

Signed-off-by: msvechla <m.svechla@gmail.com>
msvechla added a commit to msvechla/charts-1 that referenced this issue May 7, 2025
With the latest Keycloak version, the timestamps of providers is compared to the timestamp of the latest provider build.

When starting keycloak with the `--optimized` flag, keycloak will throw an error when the timestamp of the providers is more recent than it was during the build.

> A provider JAR was updated since the last build, please rebuild for this to be fully utilized.

As we do not preserve the timestamp in the init script, this error is always thrown when starting keycloak as optimized.

This is fixed by preserving the timestamp in the initContainer `cp` commands.

see: keycloak/keycloak#38893

Signed-off-by: msvechla <m.svechla@gmail.com>
Signed-off-by: Marius Svechla <m.svechla@gmail.com>
shawkins added a commit to shawkins/keycloak that referenced this issue May 7, 2025
…39340)

closes: keycloak#38893

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
@brutaldev
Copy link

We are still getting this with 26.2.4-0

The reason being that the build happens on one server and the provider files get copied to a mount which changes their timestamps. When Keycloak tries to start the timestamps will be different despite the provider files being exactly the same. We cannot "touch" the files at the mount point because it's in cloud storage without that functionality.

Is it possible to disable the flakey timestamp check?

@shawkins
Copy link
Contributor
shawkins commented May 8, 2025

Is it possible to disable the flakey timestamp check?

Please start a new issue. As you can see on this one we ultimately decided to document how to touch the files and relaxed the check to be tolerant to ms truncation.

msvechla added a commit to msvechla/charts-1 that referenced this issue May 9, 2025
With the latest Keycloak version, the timestamps of providers is compared to the timestamp of the latest provider build.

When starting keycloak with the `--optimized` flag, keycloak will throw an error when the timestamp of the providers is more recent than it was during the build.

> A provider JAR was updated since the last build, please rebuild for this to be fully utilized.

As we do not preserve the timestamp in the init script, this error is always thrown when starting keycloak as optimized.

This is fixed by preserving the timestamp in the initContainer `cp` commands.

see: keycloak/keycloak#38893

Signed-off-by: msvechla <m.svechla@gmail.com>
Signed-off-by: Marius Svechla <m.svechla@gmail.com>
msvechla added a commit to msvechla/charts-1 that referenced this issue May 9, 2025
With the latest Keycloak version, the timestamps of providers is compared to the timestamp of the latest provider build.

When starting keycloak with the `--optimized` flag, keycloak will throw an error when the timestamp of the providers is more recent than it was during the build.

> A provider JAR was updated since the last build, please rebuild for this to be fully utilized.

As we do not preserve the timestamp in the init script, this error is always thrown when starting keycloak as optimized.

This is fixed by preserving the timestamp in the initContainer `cp` commands.

see: keycloak/keycloak#38893

Signed-off-by: msvechla <m.svechla@gmail.com>
Signed-off-by: Marius Svechla <m.svechla@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0