8000 Parquet: Add dedicated Select method that can be used to push selection vectors into the read by Mytherin · Pull Request #16174 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Parquet: Add dedicated Select method that can be used to push selection vectors into the read #16174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

Mytherin
Copy link
Collaborator

This effectively restores a previous optimization where we would skip reading elements if they were previously filtered out. For now we only enable this for strings - that has by far the highest performance benefits as we can skip UTF8 validation for any strings that we don't need to read.

For simple types like integers this optimization is not so straightforwardly useful - as we effectively replace a memcpy with a branchy lookup. I haven't run any benchmarks on this yet but I suspect that the usefulness of this optimization depends on selectivity - i.e. it might perform better when the selectivity is <10% (or some other to be determined threshold). I will leave that for a future PR.

@Mytherin Mytherin changed the title Add dedicated Select method that can be used to push selection vectors into the read Parquet: Add dedicated Select method that can be used to push selection vectors into the read Feb 11, 2025
@Mytherin Mytherin merged commit 4c77e9c into duckdb:main Feb 11, 2025
47 checks passed
Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Feb 27, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
@Mytherin Mytherin deleted the parquetselect branch April 2, 2025 09:25
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0