-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Support asynchronous logging #38094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support asynchronous logging #38094
Conversation
Unreported flaky test detectedIf the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR. org.keycloak.testsuite.authz.EntitlementAPITest#testTokenExpirationRenewalWhenIssuingTokens
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unreported flaky test detected, please review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Would you also consider adding log-async and similar to be defaults across all the handlers?
@shawkins Thanks for the review!
You mean having the async logging enabled by default for all handlers? If we include it to Keycloak 26.2, IMHO, we might not do it by default for now. We can discuss more for the next releases. WDYT? |
@shawkins Ready for another review once you have time. |
Not on by default, just a way to set async behavior across all handlers similar to how we have both log-level and log-handler-level. |
@shawkins Ahh, got it. Yes, IMO that might be a good enhancement, I'll include it there. Thanks! |
b8acfcc
to
779c254
Compare
@keycloak/cloud-native Ready for a review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mabartos That's a nice work, thank you! :)
docs/guides/server/logging.adoc
Outdated
|
||
These properties are available only when asynchronous logging is enabled for these specific log handlers. | ||
|
||
==== Change overflow strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Keycloak being a security product and log messages containing potentially crucial info, IMHO this should not be configurable – discarding messages should not be an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmuzikar Thanks for this consideration.
I do not think we should force it - I could imagine a situation that users/customers leverage a specific log handler for this kind of source of truth (like syslog and file) and have the other log handlers (like console) as nice-to-have, but rather preferring performance.
I'd vote to have it here as the default strategy is BLOCK anyway, which means that no log record is lost/discarded. And users/customers can always change the configuration to comply with their needs.
Is it ok for you? Or I can even add a small comment on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just want to keep things opinionated, then only supporting BLOCK is fine - it's no worse than our current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok guys, if you prefer not to include it, we don't have to - at least one option less to manage and support :)
If users/customers want to do some additional performance improvements, they are on their own with these strategies. Moreover, they can basically increase the queue size if necessary (which is supported) in normal cases.
Ok, let's remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mabartos If I understand correctly, with async there's a dedicated thread for logging. Is that correct? While with sync, each worker thread that handles the incoming HTTP request can send stuff to the log. Does it mean that with sync there could be potentially more threads logging at the same time? That means logging doesn't scale with worker threads? 8000 p>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, with async there's a dedicated thread for logging. Is that correct?
@vmuzikar Yes, there's a dedicated thread for every log handler (enabled + async) - so for our case, it might be 3 more threads - for every log handler.
Does it mean that with sync there could be potentially more threads logging at the same time? That means logging doesn't scale with worker threads?
(AFAIK) Yes, in sync mode, every worker thread handles the writing to the log handler on its own, so multiple threads access the same shared resource. In that case, writing to the char streams (as might be console, file, syslog, sockets, ...) as a blocking op needs to take care of synchronization, thus using locks. When there are multiple worker threads and the logging is frequent (like trace
), there might be a big latency issue as these worker threads are waiting for the resource to be unlocked.
In that case, the async mode works better as it does not block these worker threads and keeps a queue for every log handler.
cc: @dmlloyd At least that's my understanding. Could you verify these facts above? Thanks!
@vmuzikar Do you want me to add more details to the docs? I'm not sure if it's worth including some more impl details to the docs, though.
Async logging might even mitigate/solve the issue with blocked threads when using virtual threads described in #37266.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mabartos Thanks for the clarification.
As I'm thinking more and more about that, I wonder what particular use case are we trying to solve by the async logging. With blocking strategy enforced, we're basically creating a "buffer" for log messages (the queue). That solves peaks for logging system – short term situation where many log messages need to be processed. But it doesn't solve the situation when there's constantly high load on the logging system.
Now when would we expect the peak to happen in Keycloak's case? Logging rush hours? :) DoS attack? Still, the buffer/queue would help only short term.
Sorry for the push back. :) Just, I want to be sure before we add additional options, there's actually a real use case that would significantly benefit from them.
Do you want me to add more details to the docs?
It would be good to document the risk of messages being lost on unexpected exit. And potential situations when this could be useful as I mentioned above in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm thinking more and more about that, I wonder what particular use case are we trying to solve by the async logging.
The potential benefits:
- For anyone who is exhausting their worker threads, or potentially want to maintain a smaller pool, this can help (if logging is contributing significantly enough).
- lower latency response times
- Removes the risk of starvation from pinning virtual threads (what instigated this issue / pr)
Now when would we expect the peak to happen in Keycloak's case? Logging rush hours? :) DoS attack? Still, the buffer/queue would help only short term.
When blocking that is correct - there is an eventual horizon problem. Ideally that amount of flooding should only occur with non-production like settings. Is some kind of log or metric that indicates log queues are full / blocking? That would signal to users that their current logging settings are detremental to performance - which I don't believe they would currently infer without profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shawkins Thanks for more context.
It's a good idea that we could reflect the queue in the metrics. But that's a follow-up.
Ok, let's go ahead with the PR.
68e9aa4
to
da384a8
Compare
docs/guides/server/logging.adoc
Outdated
|
||
If the queue is already full, it blocks the main thread and waits for free space in the queue. | ||
|
||
NOTE: Be aware that enabling asynchronous logging might bring some **additional memory overhead** due to the additional separate thread and the inner queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to highlight here what was mentioned in the other thread that it'd be good to mention the risk of log messages loss in case of unexpected shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Closes keycloak#38578 Closes keycloak#28851 Signed-off-by: Martin Bartoš <mabartos@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work @mabartos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mabartos LGTM, thank you. Nice job. :)
@vmuzikar @shawkins @Pepo48 Could you please check this? Thanks!