V1.2 histrionicus #16191

Mytherin · 2025-02-11T18:37:29Z

No description provided.

…recognized options

This PR fixes #16094 First this was using `global_columns`, this list of columns is what the Reader is aware of, in this case the Parquet reader. This list is influenced by the `schema` parameter. `global_column_ids` comes from the `TableFunctionInitInput`, and will also contain artificial/generated columns like "filename"

ValidEnd was still using the old valid_start inter-chunk state variable instead of reading the correct value from the already computed ValidBegin vector. This would in turn generate incorrect bounds for RANGE FOLLOWING searches, leading to erratic frame bounds. ValidEnd was also incorrectly setting up the prev values to optimise the search, instead of having the frame functions set it up for each chunk. fixes: #16098 fixes: duckdblabs/duckdb-internal#4170

Fix test comment. fixes: #16098 fixes: duckdblabs/duckdb-internal#4170

I was performing some storage fault injection tests - where storage IO for one database repeatedly error. I was running into a case - where I was injecting a failure on optimistic write fsync at https://github.com/motherduckdb/duckdb/blob/dad112b203212a590cb764695abf911e93d6ceee/src/transaction/duck_transaction.cpp#L207 The problem is if we see a storage error on WAL truncate at the following - the program will terminate due to throwing an exception in a try/catch. https://github.com/motherduckdb/duckdb/blob/dad112b203212a590cb764695abf911e93d6ceee/src/transaction/duck_transaction.cpp#L211 I was curious on your thoughts on if the logic was refactored a bit to perform the rollback outside of the original try/catch. Some questions: * Is it ok to propagate the first exception if rollback fails? * Will the db be invalidated if a RevertCommit fails?

ValidEnd was still using the old valid_start inter-chunk state variable instead of reading the correct value from the already computed ValidBegin vector. This would in turn generate incorrect bounds for RANGE FOLLOWING searches, leading to erratic frame bounds. ValidEnd was also incorrectly setting up the prev values to optimise the search, instead of having the frame functions set it up for each chunk. fixes: #16098 fixes: duckdblabs/duckdb-internal#4170

…nrecognized options (#15919) The current implementation of AddExtensionOption immediately sets the default value into the current config, which makes sense. However, if the extension in question is loaded as part of the main database type, and the the value is already set by the caller, the default value shouldn't override it. On top of it, it may be that the current value may be in `unrecognized_options` where config options get to for unknown options. Now that the extension is loaded and the option is configured, it is nice to get it from there as well automatically. PS I don't know how to test it other than by hand with our extension. Guidance welcome.

…from a dictionary vector

…e catalog search path of the current binder

This got lost during the column writer unification. This fixes a regression in the Parquet writer where, when writing large strings, the page size would be greatly underestimated. This is used upstream in the writer to limit page size to 100MB. Since the page size would be underestimated, we would be writing much larger pages than this when writing large strings, leading to potential memory issues. In addition, since Parquet pages are limited to 2GB (on account of page size being stored as an `int32`) - I think this could also lead to too large pages being written that could not be correctly represented in the Parquet spec in certain edge cases.

…from a dictionary vector (#16180) Fixes #16157

…e catalog search path of the current binder (#16181) Fixes #16122

…to suggest the table name

Mytherin · 2025-02-11T19:45:03Z

This can be merged after #16194 is merged

…to suggest the table name (#16189) Fixes #16134

#16194) #16161 added the ability for stats to be cast from `CastColumnReaders`. While this works, we can no longer do bloom filter look-ups through these casts (at least not without additional code to deal with the cast at this layer). Fixes the issue uncovered at #16191

V1.2 histrionicus (duckdb/duckdb#16191)

Joseph Hwang and others added 21 commits January 24, 2025 11:06

Avoid throwing in catch block

02908ec

Adding an extension option shouldn't delete a set value and promot un…

9e49cd4

…recognized options

protect against concurrent modification

17ce3c2

use global_column_ids instead of global_columns

9324566

add the test from the issue

89daa84

Issue #16098: ValidEnd Parallel Vectorisation

2e254f5

Fix test comment. fixes: #16098 fixes: duckdblabs/duckdb-internal#4170

Re-implement GetRowSize in Parquet Writer

b0ba978

Remove file

985de47

Format fix

eb52070

8000

Fix #16157: correctly get the first row when reading hive partitions …

c463448

…from a dictionary vector

Fix #16122: bind default values in a sub-binder to avoid modifying th…

22d9264

…e catalog search path of the current binder

Fix #16157: correctly get the first row when reading hive partitions …

a86eb12

…from a dictionary vector (#16180) Fixes #16157

Fix #16122: bind default values in a sub-binder to avoid modifying th…

5b8f354

…e catalog search path of the current binder (#16181) Fixes #16122

Fix #16134: when a catalog/schema/table has the same name, we prefer …

5d8933f

…to suggest the table name

8000

Need to pop_back the extra char again

49c12db

Mytherin mentioned this pull request Feb 11, 2025

Parquet reader: Avoid applying bloom filters if we are casting columns #16194

Merged

Fix #16134: when a catalog/schema/table has the same name, we prefer …

6059c92

…to suggest the table name (#16189) Fixes #16134

duckdb-draftbot marked this pull request as draft February 11, 2025 23:07

Mytherin marked this pull request as ready for review February 12, 2025 07:22

Mytherin merged commit aa49b41 into main Feb 12, 2025
21 checks passed

Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Feb 27, 2025

vendor: Update vendored sources to duckdb/duckdb@aa49b41

d3cd4e4

V1.2 histrionicus (duckdb/duckdb#16191)

krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025

vendor: Update vendored sources to duckdb/duckdb@aa49b41

382280c

V1.2 histrionicus (duckdb/duckdb#16191)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025

vendor: Update vendored sources to duckdb/duckdb@aa49b41

5e315f5

V1.2 histrionicus (duckdb/duckdb#16191)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025

vendor: Update vendored sources to duckdb/duckdb@aa49b41

29dba9c

V1.2 histrionicus (duckdb/duckdb#16191)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025

vendor: Update vendored sources to duckdb/duckdb@aa49b41

45dc76f

V1.2 histrionicus (duckdb/duckdb#16191)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025

vendor: Update vendored sources to duckdb/duckdb@aa49b41

62e109d

V1.2 histrionicus (duckdb/duckdb#16191)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V1.2 histrionicus #16191

V1.2 histrionicus #16191

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

V1.2 histrionicus #16191

V1.2 histrionicus #16191

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!