8000 [Nested] Optimize List Type in `list_value` by maiadegraaf · Pull Request #17063 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Nested] Optimize List Type in list_value #17063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 12, 2025

Conversation

maiadegraaf
Copy link
Contributor
@maiadegraaf maiadegraaf commented Apr 10, 2025

In #12468 list_value was optimized for primitive types. This PR aims to build on that and optimize list types.

For example:

Large Tables
CREATE TABLE large_list_table AS SELECT [i, i, i] AS a, [i + 1, i + 1] AS b, [i + 2] AS c FROM range(100000000) tbl(i);

SELECT LIST_VALUE(a, b, c) FROM large_list_table;
1.2.2 New
28.55s 8.68s
Large Lists
CREATE TABLE large_list AS SELECT list(i) AS a FROM range(1000000) t(i);

SELECT list_value(a, a, a, a, a) FROM large_list;
1.2.2 New
0.487s 0.0234s
Nested Lists
CREATE TABLE nested_lists AS SELECT [[i], [i + 1]] AS a, [[i, i], [i + 1, i + 1]] as b FROM range(10000) t(i);

SELECT list_value(a, b, a, b, a, b, a, b, a, b, a, b, a, b) FROM nested_lists;
1.2.2 New
0.128s 0.0075s

While these results show improvements, the timings are still slower than desired. Profiling suggests that most of the time is spent in VectorOperations::Copy. Any feedback or suggestions on how to further improve performance would be greatly appreciated!

Some additional tests and benchmarks have also been included.

Copy link
Collaborator
@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! LGTM - we can't really get around doing a copy here since we have to merge the data of multiple lists into one list.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2025 08:08
@maiadegraaf maiadegraaf marked this pull request as ready for review April 11, 2025 08:08
@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2025 11:05
@maiadegraaf maiadegraaf marked this pull request as ready for review April 11, 2025 11:05
Copy link
Contributor
@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, looks good! Just left two nits. :)

@maiadegraaf
Copy link
Contributor Author

Thanks for your feedback, I've implemented your suggestions and should be good to go now :)

@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2025 14:56
@maiadegraaf maiadegraaf marked this pull request as ready for review April 11, 2025 14:59
@Mytherin Mytherin merged commit 94d529e into duckdb:main Apr 12, 2025
52 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 19, 2025
[Nested] Optimize List Type in `list_value` (duckdb/duckdb#17063)
Re-enable Avro on core (duckdb/duckdb#17072)
Fix httpfs patches: avoid `git log` since might contain unsanitised `error` word (duckdb/duckdb#17075)
@maiadegraaf maiadegraaf deleted the list_value_optimize_nested branch May 28, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0