-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[Dev] Use an unordered_map instead of a vector for 'constant_map' in MultiFileReaderData
#16555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dev] Use an unordered_map instead of a vector for 'constant_map' in MultiFileReaderData
#16555
Conversation
extension/parquet/parquet_reader.cpp
Outdated
// this is a constant vector, look for the constant | ||
auto &constant = reader_data.constant_map[filter_entry.index].value; | ||
D_ASSERT(reader_data.constant_map.count(scan_filter.filter_idx)); | ||
auto &constant = reader_data.constant_map[scan_filter.filter_idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place that could be negatively impacted by the change from vector to unordered_map I feel, as we previously directly indexed into an array, and now we do a hash and lookup in the unordered map
Follow up PR is introducing (as typedefs):
To differentiate between the various This branch is ready, just waiting on this branch to get merged Next up is changing these mappings for BoundExpressions |
Thanks for the PR! I think these were specifically designed to avoid having to constantly do look-ups in an unordered_map - since these are done for every column in every chunk. Maybe that was a premature optimization but I don't really see that much of a benefit in this change. Instead of changing this to be an This also seems to break the delta extension currently:
|
Sure, we can use |
Superseded by #16596 |
I was working on the MultiFileReader and noticed that
constant_map
was iterated over looking for a specific column_id in a couple places, so I figured the code complexity would be reduced by changing this fromvector< MultiFileConstantEntry>
to anunordered_map<idx_t, Value>
, removingMultiFileConstantEntry
in the process.I even think
MultiFileFilterEntry
can be removed entirely as well, usingindex == DConstants::INVALID_INDEX
to signal that the filter is targeting the constant map, but I've left this out of this PR for now.