8000 Keycloak 26.2.0 UI Performance Degradation · Issue #39023 · keycloak/keycloak · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Keycloak 26.2.0 UI Performance Degradation #39023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 2 tasks
VonNao opened this issue Apr 16, 2025 · 27 comments · Fixed by #39536
Closed
1 of 2 tasks

Keycloak 26.2.0 UI Performance Degradation #39023

VonNao opened this issue Apr 16, 2025 · 27 comments · Fixed by #39536
Assignees
Labels
area/admin/ui kind/bug Categorizes a PR related to a bug kind/regression priority/blocker Highest Priority. Has a deadline and it blocks other tasks release/26.0.12 release/26.2.4 release/26.3.0 team/sre
Milestone

Comments

@VonNao
Copy link
VonNao commented Apr 16, 2025

Before reporting an issue

  • I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

Area

admin/ui

Describe the bug

We have Deployt keycloak via the Keycloak Operator on our Cluster. After updateing from Keycloak 26.1.2 to 26.2.0 Keycloak seems to be kinda slow. UI Operations aswell aus Authentication seem to be a slower.

For our Setup:

We got around 150 LDAP federations against our Active directory.
We deployed it on k8s via the operator and got cnpg as postgres-cluster. Metrics show that none of the systems is anywhere near utilization.

Version

26.2.0

Regression

  • The issue is a regression

Expected behavior

Responsiveness as of patch 26.1.x

Actual behavior

Degredation is responsivness in regards of ui oprations.

How to Reproduce?

Install keycloak 26.2.0 manage groups etc. via Webinterface

Anything else?

No response

@VonNao VonNao added kind/bug Categorizes a PR related to a bug status/triage labels Apr 16, 2025
8000 @VonNao VonNao changed the title Keycloak 26.2.0 Keycloak 26.2.0 UI Performance degredation Apr 16, 2025
@VonNao VonNao changed the title Keycloak 26.2.0 UI Performance degredation Keycloak 26.2.0 UI Performance Degradation Apr 16, 2025
@keycloak-github-bot
Copy link

Thanks for reporting this issue, but there is insufficient information or lack of steps to reproduce.

Please provide additional details, otherwise this issue will be automatically closed within 14 days.

@shawkins
Copy link
Contributor

Can you provide a reproducer? If not, can you provide timings for specific UI screens / actions to highlight level of degredation?

@ahus1
Copy link
Contributor
ahus1 commented Apr 16, 2025

Could you try to enable tracing as described in https://www.keycloak.org/observability/tracing and provide a trace? If you are using Jaeger, you could either provide a screenshot, or export the trace as a JSON.

As an alternative, you could provide a thread dump of a Keycloak node under load, still that is usually less helpful.

@VonNao
Copy link
Author
VonNao commented Apr 23, 2025

Sorry for the late response. I could narrow the "problem" down. The Slugish ui only is noticable in our keycloak operator deployments. Single node dev instances work as fast as ever.

The infrastructure around the k8s operator deployment did not change from 26.1 to 26.2 is it possible that session affinity could be a thing?

Our Ingress Controller ist Nginx-Ingress with the following annotations.

nginx.ingress.kubernetes.io/backend-protocol: "https"
cert-manager.io/cluster-issuer: "letsencrypt-production"
cert-manager.io/private-key-rotation-policy: Always
cert-manager.io/private-key-algorithm: "ECDSA"
cert-manager.io/private-key-size: "384"

@VonNao
Copy link
Author
VonNao commented Apr 23, 2025

Could you try to enable tracing as described in https://www.keycloak.org/observability/tracing and provide a trace? If you are using Jaeger, you could either provide a screenshot, or export the trace as a JSON.

As an alternative, you could provide a thread dump of a Keycloak node under load, still that is usually less helpful.

Sorry forgot to anwser. Right now sadly we got no jaeger deployed. Our go live is in arround 1 month from now on so we got no real load on the systems. The mostly idle

@ahus1
Copy link
Contributor
ahus1 commented Apr 23, 2025

@VonNao - As we describe in our docs, Jaeger could be run as a Pod on Kubernetes like any other, similar on how you deploy Keycloak today. While in a production environment you would want all applications to send their traces to Jaeger, it might be enough for a test environment for just Keycloak to send its logs to this test instance of Jaeger.

@ssilvert
Copy link
Contributor
ssilvert commented May 2, 2025

@VonNao @ahus1 Can this one be closed or moved to a different team? Or do we think there is a UI bug?

@ssilvert
Copy link
Contributor
ssilvert commented May 6, 2025

Closing due to lack of recent interest. We can reopen if needed.

@VonNao
Copy link
Author
VonNao commented May 6, 2025

@ssilvert Sorry for the late response. Jaeger is up and running. Following are some screenshots from the traces. Tested with normal ser actions form perspective of a administrator. Some Events just need a really long Time 4s+ to finish. As in another issue mentioned when scaling down to one instance the performance is like 26.1 and earlier.

Image

Image

Image

This last Screenshot is from a login test from a user

Image

I also added our external monitoring as reference. We scaled down to 1 replica at around 10:00am. After that latency went back to normal.

Image

Since with one instance there is no problem i would guess that it has something to do with the infinispan cluster?

If you need more information hook me up.

@keycloak-github-bot keycloak-github-bot bot added this to the 26.2.0 milestone May 7, 2025
@ahus1 ahus1 marked this as a duplicate of #39304 May 7, 2025
@ahus1
Copy link
Contributor
ahus1 commented May 7, 2025

Preliminary analysis how this caused:

This leads to the following symptoms:

  • JGroups message bundler will issue connects, and they will time out
  • The connect is blocking, and will delay any other requests in the queue
  • Due to that, you might see delays from 1-7 seconds.
  • This happens only when JGroups is reevaluating all members of the cluster, which seems to be every ~20 seconds

Possible remedies (to be verified):

  • Switch to a different bundler, so the bundler is not blocked (-Djgroups.bundler.type=per-destination)
    Once this change is in, we see connect exception that we probably swallowed before.
  • Instead of kubernetes (default for the Operator), use jdbc-ping as it won't probe the other ports. When using a Keycloak CR, this would be
    additionalOptions:
       - name: cache-stack
         value: jdbc-ping 
    

@VonNao
Copy link
Author
VonNao commented May 7, 2025

Preliminary analysis how this caused:

* KC Operator now enables a network policy by default

* Kubernetes distributed cache policy is in this version probing not only port 7800, but also the port 7801-7810 (see [JGroups errors when running a containerized Keycloak in Strict FIPS mode and with Istio #39454](https://github.com/keycloak/keycloak/issues/39454))

This leads to the following symptoms:

* JGroups message bundler will issue connects, and they will time out

* The connect is blocking, and will delay any other requests in the queue

* Due to that, you might see delays from 1-7 seconds.

* This happens only when JGroups is reevaluating all members of the cluster, which seems to be every ~20 seconds

Possible remedies (to be verified):

* Switch to a different bundler, so the bundler is not blocked (`-Djgroups.bundler.type=per-destination`)
  Once this change is in, we see connect exception that we probably swallowed before.

* Instead of `kubernetes` (default for the Operator), use `jdbc-ping` as it won't probe the other ports. When using a Keycloak CR, this would be
  ```
  additionalOptions:
     - name: cache-stack
       value: jdbc-ping 
  ```

Will try jdbc-ping in our testcluster

@wkloucek
Copy link
wkloucek commented May 7, 2025
  • Instead of kubernetes (default for the Operator), use jdbc-ping as it won't probe the other ports. When using a Keycloak CR, this would be
    additionalOptions:
       - name: cache-stack
         value: jdbc-ping 
    

I can confirm the fix (workaround?). Thanks a lot for the pointer!

Before setting it, we saw gaps on the request timeline during loadtesting, because the requests were just hanging:

Image

Now we have a constant request rate caused by our loadtest and answered in time by Keycloak:

Image

@VonNao
Copy link
Author
VonNao commented May 7, 2025

Can also confirm worked perfectly. Is there any advantage to kubernestes cache-stack management vs jdbc-ping?

@ahus1
Copy link
Contributor
ahus1 commented May 7, 2025

The kubernetes stack is more battle tested. See #39454 (comment) for a longer description.

As a lot of people are using kubernetes and without much problems before, and eventually we might go for jdbc-ping - but not in a patch release.

pruivo added a commit to pruivo/keycloak that referenced this issue May 7, 2025
Fixes keycloak#39023

Fixes keycloak#39454

Signed-off-by: Pedro Ruivo <pruivo@redhat.com>
pruivo added a commit to pruivo/keycloak that referenced this issue May 7, 2025
Fixes keycloak#39023

Fixes keycloak#39454

Signed-off-by: Pedro Ruivo <pruivo@redhat.com>
pruivo added a commit to pruivo/keycloak that referenced this issue May 7, 2025
Fixes keycloak#39023

Fixes keycloak#39454

Signed-off-by: Pedro Ruivo <pruivo@redhat.com>
ahus1 pushed a commit that referenced this issue May 7, 2025
Fixes #39023

Fixes #39454

Signed-off-by: Pedro Ruivo <pruivo@redhat.com>
ahus1 pushed a commit that referenced this issue May 7, 2025
Fixes #39023

Fixes #39454

Signed-off-by: Pedro Ruivo <pruivo@redhat.com>
@ahus1
Copy link
Contributor
ahus1 commented May 7, 2025

Added a follow-up issue for the per-destination bundler for 26.3: #39545

@ahus1
Copy link
Contributor
ahus1 commented May 8, 2025

KC 26.2.4 was released today that included a fix.

pruivo added a commit to pruivo/keycloak that referenced this issue May 9, 2025
Fixes keycloak#39023

Fixes keycloak#39454

Signed-off-by: Pedro Ruivo <pruivo@redhat.com>
ahus1 pushed a commit that referenced this issue May 9, 2025
Fixes #39023

Fixes #39454

Signed-off-by: Pedro Ruivo <pruivo@redhat.com>
@ahus1
Copy link
Contributor
ahus1 commented May 12, 2025

The versions affected by this: ISPN 15.0.14.Final and ISPN 15.0.13.Final

Due to backports, 26.0.11 was affected as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/admin/ui kind/bug Categorizes a PR related to a bug kind/regression priority/blocker Highest Priority. Has a deadline and it blocks other tasks release/26.0.12 release/26.2.4 release/26.3.0 team/sre
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0