-
Notifications
You must be signed in to change notification settings - Fork 2.3k
FileHandle Logging #16758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
FileHandle Logging #16758
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a22cfac
to
bcdaaf8
Compare
MacOS
reviewed
Mar 26, 2025
Thanks! |
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 18, 2025
FileHandle Logging (duckdb/duckdb#16758)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 19, 2025
FileHandle Logging (duckdb/duckdb#16758)
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
May 19, 2025
FileHandle Logging (duckdb/duckdb#16758)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR follows up on #15119 with:
My apologies for the slightly chaotic PR here. It took a few tries to understand what makes sense here and I still feel like we are not 100% there yet. However, this should be a decent step in the right direction.
Logging refactor: structured logging
To neatly support writing structured log messages, this PR makes a few conceptual overhauls to the logging.
First of all, we remove the log types as manually passed string. In the initial logging implementation, you would log like:
However these manually provided strings are not really practical and likely to result in utter chaos.
We change this now to:
The
DUCKDB_LOG
macro is defined asDUCKDB_LOG_INTERNAL(SOURCE, LOG_TYPE_CLASS::NAME, LOG_TYPE_CLASS::LEVEL, LOG_TYPE_CLASS::ConstructLogMessage(__VA_ARGS__))
Which will ensure that the logs types can only be created using the corresponding log message construction methods. The
LogType
class will then also contain the logic to deserialize the log message string into a predefined datatype for easy parsing.What this allows us to do is to easily enable a specific log type, and let DuckDB automatically deserialize and unnest the resulting data:
which yields:
Note that
duckdb_logs_parsed
is simply a table macro for:FileHandle logging
This PR adds logging infra for filehandles. Basically what I wanted is to create a unified way to log of the FileSystem APIs. I think this can be very useful, for example in testing/optimizing Iceberg/Delta workloads, but also to easily analyze IO patterns of our various readers like the parquet reader, json reader, or avro readers. Since the log messages will be in JSON format, they are easy to parse and do analysis on.
Heres a demo:
Implementation
While conceptually simple, the complexity lies in the fact that we want to generally log to the ClientContext to ensure the log entries have connection level context info. This is done by changing what was previously a
unique_ptr<Logger>
to ashared_ptr<Logger>
in the Client context and copying the pointer to the logger into the filehandles on creation. What this means is that a filehandle will log using the logging configuration of the client context that created it. because for now filehandles will not outlive the clientcontext that created it, this is fine, but in theory this can create some confusion if we were to share filehandles between connections or even between queries. I think we probably just want to ensure we don't keep filehandles open between queries.I've created some macros for standardization of the common filehandle operations that we want to log:
Then in the code, for example in the
LocalFileSystem::Read()
we can easily call the logger in an efficient way with the centrally defined log type and level for the FileHandle logging:DUCKDB_LOG_FILE_SYSTEM_READ(handle, bytes_to_read, location);
which will ensure that extension that implement their own filesystems will be able to easily adhere to the logging convention.
HTTP Logging
The old http logging mechanism would print http request information straight to stdout or to a file. In this PR we add a new structured logging type
HTTP
:TODOs
Some general follow ups:
A more complex follow up is to think about how to make this performant. Currently, enabling
PRAGMA enable_logging('FileSystem');
coulld become quite expensive be cause it needs to string compare the the logging type for every singleTRACE
level call. For now we actually get away with it because we only add the logger to the filehandle if that the time of filehandle creation we are logging forFileSystem
. However, as soon as we were to add more Trace level calls that are hit a lot during execution, things will likely slow down fast.