8000 Support asynchronous logging by mabartos · Pull Request #38094 · keycloak/keycloak · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support asynchronous logging #38094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2025
Merged

Support asynchronous logging #38094

merged 1 commit into from
Apr 16, 2025

Conversation

mabartos
Copy link
Contributor
@mabartos mabartos commented Mar 13, 2025

8000
@keycloak-github-bot
Copy link

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.authz.EntitlementAPITest#testTokenExpirationRenewalWhenIssuingTokens

Keycloak CI - Base IT (3)

java.lang.AssertionError: expected:<301> but was:<300>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
...

Report flaky test

Copy link
@keycloak-github-bot keycloak-github-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@mabartos mabartos marked this pull request as ready for review April 8, 2025 15:11
@mabartos mabartos requested review from a team as code owners April 8, 2025 15:11
@mabartos mabartos requested review from shawkins, Pepo48 and vmuzikar April 8, 2025 15:12
Copy link
Contributor
@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Would you also consider adding log-async and similar to be defaults across all the handlers?

@mabartos
Copy link
Contributor Author
mabartos commented Apr 9, 2025

@shawkins Thanks for the review!

Would you also consider adding log-async and similar to be defaults across all the handlers?

You mean having the async logging enabled by default for all handlers? If we include it to Keycloak 26.2, IMHO, we might not do it by default for now. We can discuss more for the next releases. WDYT?

@mabartos
Copy link
Contributor Author
mabartos commented Apr 9, 2025

@shawkins Ready for another review once you have time.

@shawkins
Copy link
Contributor
shawkins commented Apr 9, 2025

@shawkins Thanks for the review!

Would you also consider adding log-async and similar to be defaults across all the handlers?

You mean having the async logging enabled by default for all handlers? If we include it to Keycloak 26.2, IMHO, we might not do it by default for now. We can discuss more for the next releases. WDYT?

Not on by default, just a way to set async behavior across all handlers similar to how we have both log-level and log-handler-level.

10000
@mabartos
Copy link
Contributor Author
mabartos commented Apr 9, 2025

Not on by default, just a way to set async behavior across all handlers similar to how we have both log-level and log-handler-level.

@shawkins Ahh, got it. Yes, IMO that might be a good enhancement, I'll include it there. Thanks!

@mabartos mabartos force-pushed the asyncLogging branch 2 times, most recently from b8acfcc to 779c254 Compare April 10, 2025 14:46
@mabartos
Copy link
Contributor Author

@keycloak/cloud-native Ready for a review. Thanks!

Copy link
Contributor
@vmuzikar vmuzikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mabartos That's a nice work, thank you! :)


These properties are available only when asynchronous logging is enabled for these specific log handlers.

==== Change overflow strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Keycloak being a security product and log messages containing potentially crucial info, IMHO this should not be configurable – discarding messages should not be an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmuzikar Thanks for this consideration.

I do not think we should force it - I could imagine a situation that users/customers leverage a specific log handler for this kind of source of truth (like syslog and file) and have the other log handlers (like console) as nice-to-have, but rather preferring performance.

I'd vote to have it here as the default strategy is BLOCK anyway, which means that no log record is lost/discarded. And users/customers can always change the configuration to comply with their needs.

Is it ok for you? Or I can even add a small comment on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just want to keep things opinionated, then only supporting BLOCK is fine - it's no worse than our current behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok guys, if you prefer not to include it, we don't have to - at least one option less to manage and support :)

If users/customers want to do some additional performance improvements, they are on their own with these strategies. Moreover, they can basically increase the queue size if necessary (which is supported) in normal cases.

Ok, let's remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mabartos If I understand correctly, with async there's a dedicated thread for logging. Is that correct? While with sync, each worker thread that handles the incoming HTTP request can send stuff to the log. Does it mean that with sync there could be potentially more threads logging at the same time? That means logging doesn't scale with worker threads?

Copy link
Contributor Author
@mabartos mabartos Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, with async there's a dedicated thread for logging. Is that correct?

@vmuzikar Yes, there's a dedicated thread for every log handler (enabled + async) - so for our case, it might be 3 more threads - for every log handler.

Does it mean that with sync there could be potentially more threads logging at the same time? That means logging doesn't scale with worker threads?

(AFAIK) Yes, in sync mode, every worker thread handles the writing to the log handler on its own, so multiple threads access the same shared resource. In that case, writing to the char streams (as might be console, file, syslog, sockets, ...) as a blocking op needs to take care of synchronization, thus using locks. When there are multiple worker threads and the logging is frequent (like trace), there might be a big latency issue as these worker threads are waiting for the resource to be unlocked.

In that case, the async mode works better as it does not block these worker threads and keeps a queue for every log handler.

cc: @dmlloyd At least that's my understanding. Could you verify these facts above? Thanks!

@vmuzikar Do you want me to add more details to the docs? I'm not sure if it's worth including some more impl details to the docs, though.

Async logging might even mitigate/solve the issue with blocked threads when using virtual threads described in #37266.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mabartos Thanks for the clarification.

As I'm thinking more and more about that, I wonder what particular use case are we trying to solve by the async logging. With blocking strategy enforced, we're basically creating a "buffer" for log messages (the queue). That solves peaks for logging system – short term situation where many log messages need to be processed. But it doesn't solve the situation when there's constantly high load on the logging system.

Now when would we expect the peak to happen in Keycloak's case? Logging rush hours? :) DoS attack? Still, the buffer/queue would help only short term.

Sorry for the push back. :) Just, I want to be sure before we add additional options, there's actually a real use case that would significantly benefit from them.

Do you want me to add more details to the docs?

It would be good to document the risk of messages being lost on unexpected exit. And potential situations when this could be useful as I mentioned above in this comment.

Copy link
Contributor
@shawkins shawkins Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm thinking more and more about that, I wonder what particular use case are we trying to solve by the async logging.

The potential benefits:

  • For anyone who is exhausting their worker threads, or potentially want to maintain a smaller pool, this can help (if logging is contributing significantly enough).
  • lower latency response times
  • Removes the risk of starvation from pinning virtual threads (what instigated this issue / pr)

Now when would we expect the peak to happen in Keycloak's case? Logging rush hours? :) DoS attack? Still, the buffer/queue would help only short term.

When blocking that is correct - there is an eventual horizon problem. Ideally that amount of flooding should only occur with non-production like settings. Is some kind of log or metric that indicates log queues are full / blocking? That would signal to users that their current logging settings are detremental to performance - which I don't believe they would currently infer without profiling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawkins Thanks for more context.

It's a good idea that we could reflect the queue in the metrics. But that's a follow-up.

Ok, let's go ahead with the PR.


If the queue is already full, it blocks the main thread and waits for free space in the queue.

NOTE: Be aware that enabling asynchronous logging might bring some **additional memory overhead** due to the additional separate thread and the inner queue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to highlight here what was mentioned in the other thread that it'd be good to mention the risk of log messages loss in case of unexpected shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Closes keycloak#38578

Closes keycloak#28851

Signed-off-by: Martin Bartoš <mabartos@redhat.com>
@mabartos
Copy link
Contributor Author

@shawkins @vmuzikar Ready for review. Thanks!

Copy link
Contributor
@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work @mabartos

Copy link
Contributor
@vmuzikar vmuzikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mabartos LGTM, thank you. Nice job. :)

@vmuzikar vmuzikar enabled auto-merge (squash) April 16, 2025 14:37
@vmuzikar vmuzikar merged commit 60fb7a5 into keycloak:main Apr 16, 2025
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Asynchronous logging Support Syslog async properties
4 participants
0