8000 [Dev] Use an unordered_map instead of a vector for 'constant_map' in `MultiFileReaderData` by Tishj · Pull Request #16555 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Dev] Use an unordered_map instead of a vector for 'constant_map' in MultiFileReaderData #16555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Tishj
Copy link
Contributor
@Tishj Tishj commented Mar 7, 2025

I was working on the MultiFileReader and noticed that constant_map was iterated over looking for a specific column_id in a couple places, so I figured the code complexity would be reduced by changing this from vector< MultiFileConstantEntry> to an unordered_map<idx_t, Value>, removing MultiFileConstantEntry in the process.

I even think MultiFileFilterEntry can be removed entirely as well, using index == DConstants::INVALID_INDEX to signal that the filter is targeting the constant map, but I've left this out of this PR for now.

@Tishj Tishj requested a review from samansmink March 7, 2025 10:10
@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 7, 2025 11:31
// this is a constant vector, look for the constant
auto &constant = reader_data.constant_map[filter_entry.index].value;
D_ASSERT(reader_data.constant_map.count(scan_filter.filter_idx));
auto &constant = reader_data.constant_map[scan_filter.filter_idx];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place that could be negatively impacted by the change from vector to unordered_map I feel, as we previously directly indexed into an array, and now we do a hash and lookup in the unordered map

@Tishj Tishj marked this pull request as ready for review March 7, 2025 11:33
@Tishj
Copy link
Contributor Author
Tishj commented Mar 7, 2025

Follow up PR is introducing (as typedefs):

  • global_idx_t
  • local_idx_t
  • global_column_id_t
  • local_column_id_t

To differentiate between the various idx_t's used in the MultiFileReaderData

This branch is ready, just waiting on this branch to get merged

Next up is changing these mappings for BoundExpressions

@Tishj Tishj requested a review from Mytherin March 10, 2025 17:02
@Mytherin
Copy link
Collaborator
Mytherin commented Mar 11, 2025

Thanks for the PR!

I think these were specifically designed to avoid having to constantly do look-ups in an unordered_map - since these are done for every column in every chunk. Maybe that was a premature optimization but I don't really see that much of a benefit in this change.

Instead of changing this to be an unordered_map we can perhaps add some helper methods to make accessing these more simple?

This also seems to break the delta extension currently:

[D:\a\duckdb\duckdb\build\release\extension\delta\delta_extension.vcxproj]
2025-03-07T13:13:06.9895626Z D:\a\duckdb\duckdb\build\release\_deps\delta_extension_fc-src\src\functions\delta_scan\delta_multi_file_reader.cpp(322,14): error C2039: 'column_id': is not a member of 'std::pair<const duckdb::idx_t,duckdb::Value>' [D:\a\duckdb\duckdb\build\release\extension\delta\delta_extension.vcxproj]
2025-03-07T13:13:06.9900704Z C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\unordered_map(66): message : see declaration of 'std::pair<const duckdb::idx_t,duckdb::Value>' [D:\a\duckdb\duckdb\build\release\extension\delta\delta_extension.vcxproj]
2025-03-07T13:13:06.9909956Z D:\a\duckdb\duckdb\build\release\_deps\delta_extension_fc-src\src\functions\delta_scan\delta_multi_file_reader.cpp(350,29): error C2039: 'push_back': is not a member of 'std::unordered_map<duckdb::idx_t,duckdb::Value,std::hash<uint64_t>,std::equal_to<uintptr_t>,std::allocator<std::pair<const duckdb::idx_t,duckdb::Value>>>' [D:\a\duckdb\duckdb\build\release\extension\delta\delta_extension.vcxproj]

@Tishj
Copy link
Contributor Author
Tishj commented Mar 11, 2025

Instead of changing this to be an unordered_map we can perhaps add some helper methods to make accessing these more simple?

Sure, we can use std::find_if instead of doing this manual loop to check if it's constant, I'll revert the changes then

@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 11, 2025 11:26
@Tishj Tishj closed this Mar 11, 2025
@Tishj
Copy link
Contributor Author
Tishj commented Mar 11, 2025

Superseded by #16596

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0