8000 V1.2 histrionicus by Mytherin · Pull Request #16191 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

V1.2 histrionicus #16191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Feb 12, 2025
Merged

V1.2 histrionicus #16191

merged 22 commits into from
Feb 12, 2025

Conversation

Mytherin
Copy link
Collaborator

No description provided.

Joseph Hwang and others added 21 commits January 24, 2025 11:06
This PR fixes #16094

First this was using `global_columns`, this list of columns is what the
Reader is aware of, in this case the Parquet reader.
This list is influenced by the `schema` parameter.

`global_column_ids` comes from the `TableFunctionInitInput`, and will
also contain artificial/generated columns like "filename"
ValidEnd was still using the old valid_start inter-chunk state variable
instead of reading the correct value from the already computed ValidBegin vector.
This would in turn generate incorrect bounds for RANGE FOLLOWING searches,
leading to erratic frame bounds.

ValidEnd was also incorrectly setting up the prev values to optimise the search,
instead of having the frame functions set it up for each chunk.

fixes: #16098
fixes: duckdblabs/duckdb-internal#4170
I was performing some storage fault injection tests - where storage IO
for one database repeatedly error.

I was running into a case - where I was injecting a failure on
optimistic write fsync at
https://github.com/motherduckdb/duckdb/blob/dad112b203212a590cb764695abf911e93d6ceee/src/transaction/duck_transaction.cpp#L207

The problem is if we see a storage error on WAL truncate at the
following - the program will terminate due to throwing an exception in a
try/catch.

https://github.com/motherduckdb/duckdb/blob/dad112b203212a590cb764695abf911e93d6ceee/src/transaction/duck_transaction.cpp#L211

I was curious on your thoughts on if the logic was refactored a bit to
perform the rollback outside of the original try/catch.

Some questions:
* Is it ok to propagate the first exception if rollback fails? 
* Will the db be invalidated if a RevertCommit fails?
ValidEnd was still using the old valid_start inter-chunk state variable
instead of reading the correct value from the already computed
ValidBegin vector.
This would in turn generate incorrect bounds for RANGE FOLLOWING
searches,
leading to erratic frame bounds.

ValidEnd was also incorrectly setting up the prev values to optimise the
search,
instead of having the frame functions set it up for each chunk.

fixes: #16098
fixes: duckdblabs/duckdb-internal#4170
…nrecognized options (#15919)

The current implementation of AddExtensionOption immediately sets the
default value into the current config, which makes sense. However, if
the extension in question is loaded as part of the main database type,
and the the value is already set by the caller, the default value
shouldn't override it. On top of it, it may be that the current value
may be in `unrecognized_options` where config options get to for unknown
options. Now that the extension is loaded and the option is configured,
it is nice to get it from there as well automatically.

PS I don't know how to test it other than by hand with our extension.
Guidance welcome.
…e catalog search path of the current binder
This got lost during the column writer unification. This fixes a
regression in the Parquet writer where, when writing large strings, the
page size would be greatly underestimated. This is used upstream in the
writer to limit page size to 100MB. Since the page size would be
underestimated, we would be writing much larger pages than this when
writing large strings, leading to potential memory issues. In addition,
since Parquet pages are limited to 2GB (on account of page size being
stored as an `int32`) - I think this could also lead to too large pages
being written that could not be correctly represented in the Parquet
spec in certain edge cases.
…e catalog search path of the current binder (#16181)

Fixes #16122
@Mytherin
Copy link
Collaborator Author

This can be merged after #16194 is merged

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 11, 2025 23:07
Mytherin added a commit that referenced this pull request Feb 12, 2025
#16194)

#16161 added the ability for stats
to be cast from `CastColumnReaders`. While this works, we can no longer
do bloom filter look-ups through these casts (at least not without
additional code to deal with the cast at this layer).

Fixes the issue uncovered at #16191
@Mytherin Mytherin marked this pull request as ready for review February 12, 2025 07:22
@Mytherin Mytherin merged commit aa49b41 into main Feb 12, 2025
21 checks passed
Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Feb 27, 2025
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0