This release of DuckDB is named "Ossivalis" after Bucephala Ossivalis, an ancestor of the Goldeneye duck that lived Millions of years ago.
Please also refer to the announcement blog post: https://duckdb.org/2025/05/21/announcing-duckdb-130
What's Changed
- V1.2 histrionicus by @Mytherin in #16070
- V1.2 histrionicus by @Mytherin in #16072
- unittests: clear test directory after every test by @Mytherin in #16053
- Benchmark runner: catch and log errors + add support for
retry load N
syntax by @Mytherin in #16054 - Throw an error when unsupported commands are used in concurrentloop by @Mytherin in #16009
- Remove extension definitions to prevent re-compilation of the entire system on commit by @Mytherin in #15955
- Display schema information of currently selected database only by @ashwaniYDV in #15815
- Issue #14366: Average Intervals by @hawkfish in #15864
- Internal #2176: Temporal AVG by @hawkfish in #15661
- discussions #15981: remove confusing comment in "duckdb/tools/shell/shell.cpp" by @komainu8 in #15984
- Fix #15466 Transform LIMIT or OFFSET first based on order specified in prepared statement by @ashwaniYDV in #15484
- Bitpacking mode info by @arjenpdevries in #15623
- Sniff Timestamp_TZ from CSV FIles by @pdet in #15730
- [no-op] Add documentation for filesystem read behavior by @dentiny in #15937
- Accept "Auto" as date/timestamp format by @pdet in #15808
- Parquet Reader Cleanup: Move ColumnReaders to separate files by @Mytherin in #16092
- Parquet Reader: Move decoding logic into separate Decoder classes by @Mytherin in #16100
- BundleStaticLibs to be also triggered by InvokeCI by @carlopi in #16107
- Parquet Reader: Split DeltaLengthByteArray decoder from DeltaByteArray, and read the strings in a streaming manner by @Mytherin in #16105
- Parquet Dictionary reader: set NULL values as the last value in the dictionary by @Mytherin in #16106
- Parquet Reader: Share ResizeableBuffers across decoders, and unify Plain/PlainReference by @Mytherin in #16113
- Using GitHub ARM runners for Linux CLI builds by @hannes in #16119
- Parquet Reader: Implement dedicated Skip method by @Mytherin in #16117
- Use ColumnSegment::FilterSelection and SelectionVector for filtering in Parquet scans by @Mytherin in #16126
- [Dev] Fix output (long lines > 333 characters) getting truncated in shell by @Tishj in #16128
- Adaptive table filter: initialize filter order based on heuristics by @Mytherin in #16127
- Feature #16044: TimeZone Offset Seconds by @hawkfish in #16048
- ATTACH OR REPLACE database to allow swapping of new data. by @xevix in #15355
- [Dev] Remove
upsert_conflict_in_different_chunk.test
by @Tishj in #15980 - [Dev] Fix issue related to unpacked columns and the NOT operator by @Tishj in #15534
- [Julia] Add support for named params in prepared statements by @tqml in #15621
- Use Adaptive Filters in the Parquet reader by @Mytherin in #16133
- Parquet reader: push table filters directly into dictionaries by @Mytherin in #16136
- Parquet reader: Plain templates - make CHECKED a template parameter, and use memcpy/bulk skip when reading/skipping without defines by @Mytherin in #16141
- Parquet reader: only set invalid entry in the dictionary when the column has defines by @Mytherin in #16144
- Add uniq_ptr_cast for interpreted benchmark. by @Tmonster in #16151
- Hopefully fixing ci runs by @hannes in #16150
- Removed the last CI job that used the Ubuntu 18 setup by @hannes in #16155
- Parquet Reader: Split
CreateReader
into two separate stages -ParseSchema
andCreateReader
by @Mytherin in #16161 - Have CSV Parellel tests on CI again by @pdet in #16164
- [Python][Dev] Bump the minimum pybind11 version from
2.6
to2.9
by @Tishj in #16159 - Add StackTraces to FatalExceptions by @NiclasHaderer in #16158
- Rework invoke by @carlopi in #16108
- Adds pre-optimization hooks for DuckDB by @NiclasHaderer in #16115
- Unify behavior of
range
/generate_series
with PostgreSQL by @kryonix in #15935 - [CI] Avoid Linux CLI jobs to fail-fast by @carlopi in #16173
- Parquet: Add dedicated Select method that can be used to push selection vectors into the read by @Mytherin in #16174
- Unvendor ICU by @m-kuhn in #16176
- Parquet reader: batch check if buffer is available in RLEBpDecoder by @Mytherin in #16185
- Parquet Reader: for DeltaLengthByteArray encoding, directly refer to strings from the block without copying by @Mytherin in #16186
- unified names for duckdb-extensions by @hmeriann in #16179
- Only delete test directory when
--test-temp-dir
is not specified by @Mytherin in #16192 - Fix #16163: COLUMNS should not treat identifiers as strings by @Mytherin in #16193
- Parquet reader: Avoid applying bloom filters if we are casting columns by @Mytherin in #16194
- Pretty print sniffer values by @pdet in #16182
- V1.2 histrionicus by @Mytherin in #16191
- Bump Julia by @Mytherin in #16199
- Ensure that dependent targets are present after find_package. by @BillyONeal in #16197
- Concurrency groups for R and Wasm by @hmeriann in #16201
- Parquet Writer Cleanup: Move ColumnWriters to separate files by @Mytherin in #16202
- [fix] Use bigobj when building with MSVC by @m-kuhn in #16200
- Improve performance of
UNNEST/UNPIVOT
by using selection vectors to unnest multiple lists at once by @Mytherin in #16210 - Add the
TRY
expression by @Tishj in #15939 - [Python][Dev] Replace the default connection when it's closed by @Tishj in #16160
- Use steady clock for profiler by @dentiny in #16198
- Add parallel memset when building hash join table by @hehezhou in #16172
- Avoid unnecessarily projecting the input columns of the UNPIVOT operator in the UNNEST by @Mytherin in #16221
- Left join push down optimization by @Damon07 in #15881
- Do In-Filter pushdown in PyArrow by @pdet in #16224
- Use _win32 with MSVC by @cfis in #16235
- Fix Python 3 executable name on Windows by @cfis in #16236
- Fix -std=c++11 by @cfis in #16237
- Issue #8265: AsOf Nested Loop by @hawkfish in #16218
- Include extension_util.hpp in libduckdb by @mlafeldt in #16255
- Generalize
rowid
into the concept of virtual columns, and makefilename
a virtual column in the Parquet/CSV/JSON readers by @Mytherin in #16248 - Modify histogram test to more fuzzily check boundaries since the test can be inconsistent on different platforms by @Mytherin in #16261
- [Dev] Fix issue in
TRY
expression withdictionary_expression
Vector verification by @Tishj in #16262 - [Python Dev] Add the correct variant of the
-std=c++11
flag based on the compiler (MSVC or not) by @Tishj in #16267 - Fix extension install mode null by @samansmink in #16268
- A little cleanup. by @JasonPunyon in #16028
- Improve Parquet writer performance by @lnkuiper in #16243
- Merge v1.2-histrionicus into main by @Mytherin in #16284
- Many reclaim storage fixes by @taniabogatsch in #15825
- Arena allocator for
MinMaxN
and skipNULL
s when creating enum by @lnkuiper in #16246 - Add pragma to truncate duckdb log storage by @samansmink in #16274
- Some more Parquet writer performance improvements by @lnkuiper in #16287
- Do duckdb_extract_statements to be able to execute pivot in ADBC by @pdet in #16162
- [Dev] Improve/Add handling of escapes in VARCHAR -> list/struct/map and align behavior by @Tishj in #15944
- make ValidityMask::RowIsValidUnsafe really unsafe by @xuke-hat in #16302
- Multi File Reader Rework: Add
MultiFileReaderFunction
that is used to wrap a single-file reader, and use it for the Parquet reader by @Mytherin in #16299 - [Python Dev] Add support for fully qualified references in
.table(...)
method by @Tishj in #16291 - [Dev] MultiFileReader - Add to the
column_indexes
forfile_row_number
by @Tishj in #16311 - Parquet reader performance by @lnkuiper in #16315
- Bump Julia FixedPointDecimals dependency version by @mbarbar in #16323
- Merge V1.2 histrionicus into main by @Mytherin in #16324
- Add new recursive semantics (
USING KEY
) by @cryoEncryp in #12430 - fix: add StringStats::SetMaxStringLength by @rustyconover in #16326
- Fix decorrelation of WITH USING KEY by @kryonix in #16330
- Issue #16250: Window Range Performance by @hawkfish in #16320
- Verify UTF-8 in
DeltaLengthByteArrayDecoder
and speed it up by @lnkuiper in #16328 - Add missing include by @Mytherin in #16342
- [chore] No ccache for OSX Python by @carlopi in #16348
- Linux CLI: override platform for ARM manylinux by @carlopi in #16347
- docs: tweak explanation of median for even cardinality inputs by @NickCrews in #13726
- [parquet] Fix implicit idx_t to int64_t conversion flagged by clang-tidy by @carlopi in #16368
- Improve error message on failure to install local extension by @carlopi in #16371
- MAIN_BRANCH_VERSIONING: main branch to get descriptors like v1.3.0-dev1234 instead of v1.2.1-dev1234 by @carlopi in #16366
- Parallel HT Zeroing: Set entries_per_task so that there are 4x more tasks than threads by @gropaul in #16301
- Internal #2176: SUMMARIZE Temporal Types by @hawkfish in #16095
- [SwiftRelease CI] fetch tags before checking there is already a tag with the same name by @hmeriann in #16376
- Push Select into ArrayColumnData to avoid scanning arrays that are not required for the query by @Mytherin in #16356
- Revert "Linux CLI: override platform for ARM manylinux" by @carlopi in #16374
- Rework CSV Reader: use the new MultiFileReaderFunction interface by @Mytherin in #16349
- Add connection and transaction identifiers by @samansmink in #16296
- Add parquet 'unknown' logical type by @hannes in #16378
- Internal #4287: INTERVAL Times DOUBLE by @hawkfish in #16386
- pb/compressed vector serialization by @peterboncz in #16066
- Fix issue #16377 by @kryonix in #16391
- Read support for Parquet Float16 by @hannes in #16395
- MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation by @carlopi in #16400
- Fuzzer Fix: Fix Avg for NULL cast to TIMESTAMP by @Tmonster in #16394
- [FriendlySQL] Expand functionality of the Unpacked COLUMNS expression by @Tishj in #16290
- Python Client: Faster Python Object Conversion by @Mytherin in #16431
- Fixup #16400 by correctly passing down SETUPTOOLS_SCM_PRETEND_VERSION by @carlopi in #16435
- Issue #16250: Window Range Performance by @hawkfish in #16438
- Merge v1.2-histrionicus into main by @Mytherin in #16439
- MAIN_BRANCH_VERSIONING: Add also prefix_version by @carlopi in #16441
- [no-op] Remove unused function
GetValueRefUnsafe
by @dentiny in #16440 - Filter Combiner Clean-up: move filter pushdown to separate functions, remove old commented out code by @Mytherin in #16443
- [Python] Add the SQLExpression method to the Expression API by @Tishj in #16424
- [Dev] Mention the problematic type in UNNEST BinderException by @Tishj in #16429
- Merge v1.2 into main again by @Mytherin in #16447
- Filter Combiner: Allow rowid pushdown for IN/OR filters and pushdown for temporal types by @Mytherin in #16450
- Parquet: always launch max threads if we are scanning multiple files by @Mytherin in #16457
- fix documents of C functions by @yiyuanliu in #16357
- Add a TableFilterState for execution of table filters by @Mytherin in #16461
- Mirror discussions to the internal repository by @szarnyasg in #16464
- Rework JSON Reader: use the new MultiFileReaderFunction interface by @Mytherin in #16477
- Speed-up contains by using
memchr
on every iteration by @Mytherin in #16484 - Fix error cases by @Y-- in #16494
- Prevent external joins (if possible) by @lnkuiper in #16430
- Merge v1.2 into main by @Mytherin in #16517
- Optimize FSST decoding by @lnkuiper in #16508
- Extract subsystem by name by @dentiny in #16226
- Avoid throwing an exception (that is then swallowed) when computing compressed materialization over stats that are not set by @Mytherin in #16532
- Checksum backward compatibility by @lnkuiper in #16505
- Prefetch Parquet page header by @lnkuiper in #16507
- Let GitHub render *.test files as SQL by @mlafeldt in #16534
- Fix ADBC to properly quote table and schema names by @CurtHagenlocher in #16526
- Pass
ClientContext
to catalog initialize, and postpone index binding when replaying the WAL by @Mytherin in #16536 - Allow UNITTEST_ROOT_DIRECTORY to be configured through CMake by @Mytherin in #16540
- Internal #4347: ISO Year Week by @hawkfish in #16567
- throw() -> noexcept in skiplist by @r-barnes in #16548
- Fix
test/sql/aggregate/aggregates/histogram_table_function.test
to pass the Linux CLI (arm64) CI by @hmeriann in #16538 - feat: move GRANT from reserved to unreserved keyword by @stephaniewang526 in #16546
- Python test runner: Avoid enabling profiling when executing restart command by @Flogex in #16547
- Add
duckdb_prepared_statements
by @Tishj in #16541 - [minor] Keep bit type sanity check consistent by @dentiny in #16575
- Support CREATE TABLE AS ... WITH NO DATA by @hannes in #16586
- Parquet FLOAT16 - fix cast by @hannes in #16580
- remove invalid tokens from nanosecond example by @hamilton in #16577
- CrossVersion.yml: Add v1.2.1, v1.2-histrionicus and main by @carlopi in #16576
- Fix #16524: DEPENDENT_JOIN may not flatten by @flashmouse in #16537
- [Julia] Add support for appending duckdb List types by @era127 in #16512
- [PySpark] - Add
expr
function by @mariotaddeucci in #16468 - regex_replace no longer swallows regex errors by @hannes in #16380
- Parquet Writer Clean-up: Split
CreateWriterRecursive
into two methods, and useParquetColumnData
for writer as well by @Mytherin in #16592 - Bump Julia to 1.2.1 by @Mytherin in #16593
- Improved appender error message by @NiclasHaderer in #16599
- Change static variables to be on the stack instead by @Y-- in #16597
- Add support for
RETURN_STATS
toCOPY
by @Mytherin in #16595 - Better error messages for the CSV Scanner by @pdet in #16585
- Support Enum types in read_csv - Python by @pdet in # 8000 15710
- Fix CI Tidy by @pdet in #16610
- Add some minor helper functions (
QueryResultIterator::IsNull
and casts to MultiFileList/Reader by @Mytherin in #16611 - Add support for
ALTER TABLE tbl SET PARTITIONED BY (key1, key2, ...)
in the grammar by @Mytherin in #16612 - Issue template: direct UI issues to the UI repository by @szarnyasg in #16619
- [Dev] Make the various mappings in
MultiFileReaderData
typesafe by @Tishj in #16596 - Bump mbedtls to 3.6.2 and re-apply patches by @hannes in #16485
- Read and Write Complex Json from Arrow Types by @pdet in #16385
- Add Docker support for RISC-V CI with appropriate build commands by @mocusez in #16549
- Fix missing **kwargs in adbc_driver_duckdb.dbapi.connect() by @davlee1972 in #16637
- [Dev] Clean up and fix the CGroup memory/cpu limit discovery logic by @Tishj in #16608
- Expose
Value::ToSQLString()
in C API by @mt-caret in #16471 - Add the missing binding for json_serialize_sql by @liznear in #16666
- Do not create validity mask for non-null const vector by @xuke-hat in #16669
- Fix #16665: fix parquet multi_reader bloom_probe logic error by @flashmouse in #16677
- Add alias to catalog by @c-herrewijn in #16600
- Decouple physical operator ownership from operators by @taniabogatsch in #16545
- cmake: fix external icu by @autoantwort in #16676
- Character length and date functions by @hannes in #16653
- [Dev] Don't try to include
third_party/mbedtls/VERSION
withpackage_build.py
by @Tishj in #16683 - Add
-ui
to CLI help text by @akx in #16626 - Fix alias of column reference lost in ReplaceProjectionBindings by @Damon07 in #16686
- Merge v1.2-histrionicus into main by @Mytherin in #16687
- Fix for GCC-4.8 by @Mytherin in #16690
- JSON Reader: make read_position atomic so this can be read by the progress bar while processing the JSON file by @Mytherin in #16692
- [Julia] support binding for vectors by @slwu89 in #16701
- Make CSV Parser strict_mode=True fail on a mix of new line delimiters. by @pdet in #15959
- [pypi] Fix cleanup logic for multiple branches by @hmeriann in #16634
- Add support for
ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)
in the grammar by @Mytherin in #16714 - RETURN_STATS: remove footer_offset, and emit written partition keys by @Mytherin in #16715
- In case all rows of a CSV batch are errors, we continue processing by @pdet in #16713
- add workaround for patching httpfs ext by @samansmink in #16722
- Implement UUID v7 by @dentiny in #15819
- Fix roundtripping of stringified nested types by @Tishj in #16304
- Add Notify External Repositories Workflow by @maiadegraaf in #16730
- Expose a selection vector and the Slice method to the C API by @joseph-isaacs in #16696
- Add support for tracking
column_size_bytes
andcontains_nan
in RETURN_STATS by @Mytherin in #16731 - Add support for
WRITE_EMPTY_FILE
option toCOPY
- which allows skipping of writing empty files by @Mytherin in #16737 - Parquet Writer: Truncate string stats for large strings, instead of bailing on writing stats by @Mytherin in #16736
- RLE compression - memset alignment bytes to zero when aligning the counts by @Mytherin in #16735
- Write UUID stats to Parquet files and support reading uuid stats by @Mytherin in #16744
- Add an initial value to
list_reduce
by @maiadegraaf in #16602 - shell: make -bail work for more errors by @mlafeldt in #16594
- Call Notify External Repositories from Invoke CI by @maiadegraaf in #16747
- JSON bugfixes by @lnkuiper in #16729
- Add support for dynamically providing extra info post-execution in table functions, and use this to emit the total number of files read by the MultiFileReader by @Mytherin in #16749
- [Python Dev] Fix the versioning of the nightly python builds by @Tishj in #16739
- shell: fix sometimes-uninitialized error by @mlafeldt in #16761
- Issue #16250: Window Range Performance by @hawkfish in #16765
- Avoid building Python 3.7 wheels also for Linux by @carlopi in #16769
- Pyodide 0.27.2: conditionally skip tests by @carlopi in #16772
- Push catalog lookups through an extensible EntryLookupInfo struct by @Mytherin in #16764
- Fix two minor problems with NotifyExternalRepositories / odbc by @carlopi in #16776
- update expected results reflecting the changes brought ups with
Fix roundtripping of stringified nested types
PR by @hmeriann in #16775 - Merge V1.2 -> Main by @pdet in #16751
- Add support for time travel syntax in the
FROM
clause by @Mytherin in #16774 - Python docs: List all join types by @szarnyasg in #16789
- [chore] NotifyExternalRepositories.yml: Fix endpoint to be pinged by @carlopi in #16793
- Remove delta from extensions built on a nightly basis (vs main branch) by @carlopi in #16795
- OSX.yml & Windows.yml: remove repository_dispatch, already handled by InvokeCI by @carlopi in #16796
- Make extensions be linked privitally into duckdb by @JAicewizard in #16726
- Add additional iterations to avoid assertion failure in
TemporaryMemoryManager
by @lnkuiper in #16801 - Change the STANDARD_MASK_SIZE calculation to use size of template type. by @sebastiaan-dev in #16807
- Fix nightly table sample error by @Tmonster in #16811
- Fix tidy by @pdet in #16805
- support 'categories' label in function catalog by @c-herrewijn in #15654
- regenerate function headers by @c-herrewijn in #16822
- Internal #4490: Window Jump Reset by @hawkfish in #16816
- Regression.yml: Actually checkout proper base.sha commit by @carlopi in #16824
- fix: drop useless python import by @yihong0618 in #16808
- NightlyTests.yml: Inline env variables into build command by @carlopi in #16817
- Benchmark runner summary by @hmeriann in #16759
- Add storage_version 66 for version 1.3.0 by @carlopi in #16800
- Revert "fix: drop useless python import" by @Mytherin in #16834
- [MultiFileReader] Rework
MultiFileReader::FinalizeChunk
to use Expressions by @Tishj in #16630 - Merge v1.2 into main by @Mytherin in #16832
- Fix NULL key handling in mark join by @xuke-hat in #16825
- compressed vector serialization fixes by @peterboncz in #16648
- really sorry about this by @peterboncz in #16840
- Fix Python docstrings for unique by @szarnyasg in #16845
- [MultiFileReader] Create "local" filters to hand to underlying readers by @Tishj in #16838
- Revert "Regression.yml: Actually checkout proper base.sha commit" by @Mytherin in #16860
- [ART] Immediately erase empty fixed-size buffers by @taniabogatsch in #16727
- Resolve defaults and column index map by pushing a Projection (instead of executing in the insert itself) by @Mytherin in #16867
- Fix issue with sorting dev versions in pypi_cleanup.py script to keep on PyPi the most recent dev versions by @hmeriann in #16873
- Allow filters to be pushed through joins if there are projection maps by @lnkuiper in #16871
- Expressions in create secret by @samansmink in #15801
- Python - Arrow IPC support in from_arrow by @pdet in #16821
- [ART] Introduce a new ARTScanner and make InitMerge and Vacuum iterative by @taniabogatsch in #16861
- Do not pushdown filters which bindings only match the right side of the left join by @Damon07 in #16880
- MultiFileReader Rework (part 17) - remove
MultiFileReaderData
- and move as much as possible out of the file readers by @Mytherin in #16882 - ICU: Unify TimeZone accessing code by @Mytherin in #16887
- Rework ICU age computation to convert to a timestamp and use the regular interval age computation by @Mytherin in #16889
- Reduce allocations during aggregations by @lnkuiper in #16849
- CI: Prevent marking issues as 'stale' if they have the 'no stale' label by @szarnyasg in #16903
- Add field name to log line which fails Parquet spec by @jsbali in #16862
- Internal #4490: Window Threading Cleanup by @hawkfish in #16879
- Adding gzip version of shell for linux/osx install script by @hannes in #16116
- Fix USING KEY reference error by @kryonix in #16906
- [Nested] Enable Varargs in
LIST_CONCAT
by @maiadegraaf in #16870 - Fix several issues with vsize=2, and move vsize=2 tests to
Main.yml
by @Mytherin in #16918 - C API comments: Fix a/an typos by @szarnyasg in #16925
- Reduce locking with
FILE_SIZE_BYTES
/ROW_GROUPS_PER_FILE
in Parquet writer by @lnkuiper in #16928 - [Python] Fix annotation of
condition
argument injoin
so it acceptsExpression
by @MarcoGorelli in #16933 - Fix GCC 4.8 and add it back to
Main
workflow by @Mytherin in #16937 - Merge v1.2 into main again by @Mytherin in #16939
- MultiFileReader - Perform nested remapping of field indexes instead of relying on casts by @Mytherin in #16941
- Internal #4552: Short Circuit CSE by @hawkfish in #16931
- Add back manylinux extensions by @carlopi in #16944
- Run CI on merge group by @Mytherin in #16945
- Internal #4516: Interval BIGINT Variants by @hawkfish in #16904
- Split query string for multi-statement queries by @Mytherin in #16955
- Vector Verification: Rework to run based on env variable
DUCKDB_DEBUG_VERIFY_VECTOR
and move toMain.yml
by @Mytherin in #16957 - Move the no string inline/alternative verify workflow to Main.yml by @Mytherin in #16958
- [Python] Tighten type annotations on
shape
andcolumns
by @MarcoGorelli in #16948 - Pass down CMAKE_POLICY_VERSION_MINIMUM and fix for local development by @carlopi in #16953
- [ART] Use the ARTScanner for VerifyAllocations (make it iterative) by @taniabogatsch in #16946
- Move ThreadSanitizer test from nightly test to Main, and fix locking issue by @Mytherin in #16960
- Re-enable workflows to run on PRs by @Mytherin in #16961
- Fix for selecting NaN values from Parquet files by @Mytherin in #16962
- Move LatestStorage tests to NightlyRelease - and fix issue with overflow string blocks not being cleaned up correctly by @Mytherin in #16972
- Arena-allocate physical operators by @taniabogatsch in #16911
- Make
file_row_number
a virtual column, and support per-file virtual columns in the MultiFileReader by @Mytherin in #16979 - Add a setting
scheduler_process_partial
that allows partial scheduling of tasks in the background threads by @Mytherin in #16973 - Clean up format script, gather all files then run concurrently instead of running concurrently per directory by @Mytherin in #16988
- Add support for altering struct columns (adding fields, dropping fields, renaming fields) by @Mytherin in #17003
- Fix CSV fuzzer tests by @pdet in #16994
- [Fix] Keep original expression for macro + lambda's with subqueries by @taniabogatsch in #17020
- Detect when tables have been dropped or altered, and prevent deletes in this scenario by @Mytherin in #17018
- Update links pointing to duckdb.org by @szarnyasg in #16999
- Fix for joining on floating columns #16901 by @nickzoic in #16965
- fix: remove ununsed stream struct member from ArrowScanLocalState by @rustyconover in #17023
- [Dev] Use
UnifiedVectorFormat
instead of a flattenedVector
inUpdateSegment::Update
by @Tishj in #16974 - Remove Arrow Extenson from core extensions by @pdet in #17027
- Correctly propagate ClientContext to TaskExecutor by @ywelsch in #17026
- Issue #17001: AsOf memory Management by @hawkfish in #17028
- [MultiFileReader] Make it possible for the multi file reader to add a
DeleteFilter
to theBaseFileReader
by @Tishj in #17032 - Add optional
OVERRIDE_NEW_DELETE
build parameter by @lnkuiper in #17035 - Clean-up virtual columns and make MultiFileReader::InitializeReader virtual by @Mytherin in #17038
- Allow a table to define their own row-id columns for delete/update, instead of assuming it is always COLUMN_IDENTIFIER_ROW_ID by @Mytherin in #17039
- Handle Parquet with compressed empty DataPage v2 by @EnricoMi in #17031
- Combine small row groups in Parquet writer by @lnkuiper in #17036
- Merge v1.2.2 into main by @carlopi in #17037
- implement function so I can send a patch to httpfs by @lnkuiper in #17048
- FORCE_ASYNC_SINK_SOURCE: pass also to unittester by @carlopi in #17053
- If a Max Line Size Error happens on all CSV dialect candidates, throw a max line size error. by @pdet in #16935
- Expose BindExtraColumns as a public function by @Mytherin in #17060
- trigger .github/workflows/NightlyBuildsCheck.yml from external repo by @hmeriann in #16949
- Minor parquet crypto clean-up: allow footer key to be passed in directly, and avoid constantly re-reading the key from the config by @Mytherin in #17070
- update julia to v1.2.2 by @Maxxen in #17074
- MultiFileReader Rework (part 18): Replace file path with
OpenFileInfo
struct by @Mytherin in #17071 - Fix httpfs patches: avoid
git log
since might contain unsanitisederror
word by @carlopi in #17075 - Re-enable Avro on core by @Tishj in #17072
- [Nested] Optimize List Type in
list_value
by @maiadegraaf in #17063 - Grow string dictionary dynamically in Parquet writer by @lnkuiper in #17061
- Add extended file info to OpenFileInfo, and use this to pass encryption keys and footer size to Parquet reader by @Mytherin in #17085
- [Dev] Automatically re-execute when calling
__arrow_c_stream__
on an already-consumed-result by @Tishj in #17087 - fsst: Avoid to propagate alignment information in FSST_UNALIGNED_STORE by @carlopi in #17094
- Fix sqlite3 api wrapper link + remove R-CMD-check + add more nightly tests by @carlopi in #17095
- support large dictionary value and constant vector creation in the C API by @joseph-isaacs in #17064
- Add missing lock to UpdateSegment::FetchRow, and cleanup API to require the lock by @Mytherin in #17100
- Valgrind requires tpch by @carlopi in #17101
- Switch to manylinux_2_28 by @hannes in #16956
- Changing mbedtls encryption API by @ccfelius in #16196
- Pull OpenFileExtended through the opener and virtual file system layers by @Mytherin in #17102
- Fix an issue in upserts where the local append state was not correctly flushed by @Mytherin in #17109
- Always parallelize
read_json
schema detection by @lnkuiper in #17106 - Move transaction cleanup outside of the transaction lock by @taniabogatsch in #17034
- Remove R_CMD_CHECK.yml, now handled by duckdb/duckdb-r repo by @carlopi in #17127
- JSON Bugfixes by @lnkuiper in #17119
- Refactor relassert runs, adding some variations in compiler / statically linked extensions by @carlopi in #17104
- extension-upload-from-nightly.sh: Add --region by @carlopi in #17120
- MultiFileReader: several fixes for virtual column handling and make virtual column handling extensible by @Mytherin in #17123
- Remove misleading lock comment in data table by @taniabogatsch in #17125
- [Dev] Add "registries" to
vcpkg.json
, add script to list the packages of the registry. by @Tishj in #17124 - External File Cache by @lnkuiper in #16463
- Notify nightly build status by @hmeriann in #17108
- Strict UUID cast by @lnkuiper in #17138
- Copy To File: avoid calling Combine for threads that have not written any rows by @Mytherin in #17142
- Add file_index virtual column to the multi file reader that returns the file index of the read file by @Mytherin in #17144
- MultiFileReader: simplify constant handling, and allow virtual columns returned by the multi file reader to be constant by @Mytherin in #17149
- Changes to encodings to make them more flexible to replacement maps. by @pdet in #17146
- Optimize large Top N queries by @lnkuiper in #17141
- Only trigger TopN rewrite relatively small limits compared to the table size. by @Tmonster in #17140
- platform.hpp: Propagate DUCKDB_EXPLICIT_PLATFORM, avoid early return by @carlopi in #17137
- Keeping the filters which do not remove NULL values by @Damon07 in #17045
- Improve
FileSync
call on unix platform by @dentiny in #16893 - README: Fix to building link by @szarnyasg in #17161
- [InvokeCI] Add missing pipe to run instruction by @hmeriann in #17163
- Internal #4667: 2025b TimeZone Data by @hawkfish in #17160
- Unify function list by @c-herrewijn in #17168
- [Dev] Generate the
EXTENSION_SECRET_TYPES
instead of hardcoding them by @Tishj in #17183 - Fix grouping feature with interval type by @handstuyennn in #17181
- Add filename to GZIP stream error by @marcoslot in #17166
- Issue #17115: TimeTZ Approximate Quantile by @hawkfish in #17162
- Issue #17046: AsOf Left Predicates by @hawkfish in #17159
- [Fix] Pass delete indexes when committing updates by @taniabogatsch in #17176
- Python.yml: Add back logic to perform fast-fail on Python 3.10 by @carlopi in #17107
- Notify JDBC repo to run Vendor.yml workflow by @staticlibs in #17099
- Issue #17049: ICU Date Cast by @hawkfish in #17067
- Add bind_operator callback to TableFunction - allowing table functions to directly emit a LogicalOperator by @Mytherin in #17196
- [ENCRYPTION] Make block header size adaptive by @ccfelius in #17118
- Issue #16839: Disable TIMESTAMP Casts by @hawkfish in #16899
- Add support for an explicit PRESERVE_ORDER flag for copy to file by @Mytherin in #17199
- Add
SYSTEM_PEAK_BUFFER_MANAGER_MEMORY
andSYSTEM_PEAK_TEMP_DIRECTORY_SIZE
to profiler by @lnkuiper in #17164 - Fix [InvokeCI / NotifyExternalRepository] Unexpected value 'true' by @hmeriann in #17212
- Add support for the cast_to_type function, that allows generating a cast from an expression to the type of another column by @Mytherin in #17209
- Better cardinality estimates for inequality joins/grouped aggregations by @lnkuiper in #17139
- Add
ExternalFileCache
validation as option forExtendedOpenFileInfo
by @lnkuiper in #17205 - Explicitly flush the thread-local optimistic writer in
PhysicalBatchInsert
when finalizing by @Mytherin in #17214 - Pushdown arbitrary expressions into scans by @Mytherin in #17213
- Fix #17170: sort selection result in OR expression by @flashmouse in #17180
- [Dev] Re-enable Iceberg, Bump Avro, fix
generate_extension_functions.py
for dependencies between extensions by @Tishj in #17204 - Change Invalid Unicode Error to Invalid Encoding by @pdet in #17208
- Direct IO for temp files by @lnkuiper in #17219
- Fix [InvokeCI / NotifyExternalRepository] GitHub Actions has encountered an internal error when running your job. by @hmeriann in #17218
- Add "thousands" option to CSV Reader by @pdet in #17220
- add capi functions to create map and union values by @jraymakers in #17227
- Only notify JDBC when all runs are successful by @staticlibs in #17233
- Update Friendlier SQL link.md by @hfrifkin in #17248
- Implement reading concatenated GZIP members by @lnkuiper in #17255
- Return invalid
BufferHandle
upon loading a destroyedBlockHandle
by @lnkuiper in #17249 - Internal #4772: Timestamp Error Parameter by @hawkfish in #17283
- BUGFIX: do not perform unused columns optimization in presence of multiple grouping sets by @Tmonster in #17259
- Internal #4532: 13 Month Intervals by @hawkfish in #17303
- Dont try to load extension if storage type is already registered by @Maxxen in #17241
- Adapt size of hash table during aggregation using HyperLogLog by @lnkuiper in #17236
- Switch to always using list identifier instead of array by @J-Meyers in #17242
- Add root's query_location also to TransformInterval by @carlopi in #17271
- Histogram table function test by @hmeriann in #17276
- Guess Parquet footer size by @lnkuiper in #17300
- Issue #16563: FLOAT to DECIMAL by @hawkfish in #17302
- Feature #15873: Windowed ORDER BYs by @hawkfish in #17304
- Switch from Bottom-Up to Top-Down Decorrelation Strategy by @kryonix in #17294
- Generating random data for mbedtls without key by @ccfelius in #17309
- Fix CI by @Mytherin in #17319
- [Arrow] Implement support to consuming and producing Decimal 32 and 64. by @pdet in #17314
- take the column ids from the logical get, don't require a LogicalGet β¦ by @Tishj in #17315
- Allow installing extensions with external access allowlist by @samansmink in #17316
- Implement ARTMerger replacing the recursive ART merge algorithm by @taniabogatsch in #17243
- Share null mask with constant null arg vector by @iceTTTT in #17234
- Fix #17311: correctly check for presence of recursive keys in transformer by @Mytherin in #17320
- [CSV Reader] Simplify Quote/Escape detection code, make it more robust and decouple comment and skip_rows option. by @pdet in #17284
- Fix
try_cast
from NaNdouble
todecimal
by @lnkuiper in #17322 - Add serialization for new TableColumn type by @Mytherin in #17321
- Extract expressions from nested conjunction AND for index scan by @lnkuiper in #17297
- Support late materialization in the Parquet reader, and handle
COUNT(*)
directly in the multi file reader by @Mytherin in #17325 - Implement ARTOperator replacing Lookup and the recursive Insert by @taniabogatsch in #17327
- Internal #4723: Inequality Condition Pushdown by @hawkfish in #17317
- Properly format strings when throw JSON errors by @lnkuiper in #17331
- Fix potential vulnerable cloned function by @npt-1707 in #17340
- Fix potential vulnerable cloned function by @npt-1707 in #17339
- Revert "Skip MinGW, currently failing on main" by @carlopi in #17342
- Unify Parquet Metadata cache invalidation logic with Cached File System cache invalidation by @Mytherin in #17334
- Fix issue with empty ranges by @kryonix in #17332
- Internal #4797: Timestamp Range Cardinality by @hawkfish in #17330
- Some nitpicking fixes by @szarnyasg in #17337
- Issue #17299: Integer Rounding by @hawkfish in #17328
- Parquet Reader: emit partition stats for any files that have cached metadata, and implement
ListFilesExtended
that adds extra info to files globbed by @Mytherin in #17344 - Add support for UUID v7 to Filename Pattern - and clean it up so that it correctly supports composite patterns by @Mytherin in #17345
- Add support for the HIVE_FILE_PATTERN option - that allows partitioned files to be written without writing them to a hive-style directory structure by @Mytherin in #17346
- Add an OnDetach callback to the catalog that is triggered when the user detaches a catalog by @Mytherin in #17347
- Pass commit ID to NotifyExternalRepositories.yml by @staticlibs in #17333
- Add support for BENCHMARK_ROOT_DIRECTORY cmake option to change benchmark runner root directory, and add support for cache_file and reload options to enable better caching for non-DuckDB databases by @Mytherin in #17355
- Support --directories option in format.py by @Mytherin in #17354
- Handle both ENCRYPTION_KEY and STORAGE_VERSION passed as options by @carlopi in #17357
- Fix internal exception from assigning invalid index to
optional_idx query_id;
by @Tishj in #17359 - Fixup amalgamation: reqlen is only used with assert enabled by @carlopi in #17361
- md5_number: return UHUGEINT by @szarnyasg in #17336
- Skip emitting partition stats if "has_deletes" is set in the file info by @Mytherin in #17365
- Benchmark runner: add
argument
,include
andload_only
options - and make ClickBench run the original benchmark instead of a subset by @Mytherin in #17367 - Fix two off-by-one errors in row estimate of range and generate_series by @JelteF in #17373
- [Nested] Fix: 16489 - Find
NULL
s in lists usinglist_position
by @maiadegraaf in #17080 - fix #17258: Allow to open database in readonly mode within cli by @jjballano in #17375
- Join Hash Table Probing Optimization: Optional Probing Selection Vector by @gropaul in #17062
- Remove bundled TPCH & TPCDS in Python wheels by @carlopi in #15923
- [Compression] Introduce
DICT_FSST
compression method by @Tishj in #15637 - Deprecate lambda arrow (->) and replace it with LAMBDA x : x + 1 by @taniabogatsch in #17235
- fix not setting nested validity when map_extract returns null by @Maxxen in #17379
- Function chaining: report missing column instead of missing function if function exists by @Mytherin in #17383
- Improve error messages in UPDATE ... SET by @Mytherin in #17384
- Add candidates suggestion when COLUMNS regex does not match any columns by @Mytherin in #17385
- add step to clean up the disc space to fix
No space left on device
by @hmeriann in #17390 - Fix issue in string -> hugeint conversion with decimals and exponents by @Mytherin in #17388
- Improve error message reporting for cast failures by @Mytherin in #17382
- Fix Python CI: pin virtualenv to previous version by @Mytherin in #17386
- Improve error reporting for missing qualified columns by @Mytherin in #17397
- Issue #17266: Lead Lag Nulls by @hawkfish in #17391
- Fix #17266οΌthe result of lad/lead when the offset is null by @ditdb in #17268
- VirtualFileSystem to take an input, allowing to customize behaviour by @carlopi in #17393
- [Dev] Add
QualifiedName::ParseComponents
, add input to the error messages by @Tishj in #17403 - Provide suggestions and a link to the documentation for OOM errors by @Mytherin in #17402
- [Dev] Flatten any deeper children vectors, when the top level is a FLAT vector by @Tishj in #17387
- Minor fixes for the CLI by @Mytherin in #17405
- Add support for CREATE OR REPLACE TYPE, CREATE TYPE IF NOT EXISTS and CREATE TEMPORARY TYPE by @Mytherin in #17404
- Use an insertion order preserving map in Value::MAP by @taniabogatsch in #17389
- Implement
json_each
/json_tree
by @lnkuiper in #17406 - Fix #16552: adjust join condition sequence by @flashmouse in #16943
- WAL replay index fixes by @taniabogatsch in #17409
- ZSTD: use a high penalty when min size is exceeded instead of disabling compression to allow force compression to work by @Mytherin in #17412
- Internal #4723: PWMJ Inequality Pushdown by @hawkfish in #17400
- Move all httplib code to HTTPUtil class by @Mytherin in #17420
- Avoid generating default views and macros in the temporary catalog by @Mytherin in #17408
- unittest: improve detection of whether or not we can run
--force-restart
tests by @Mytherin in #17419 - Give tasks a
TaskType
with a name by @Mytherin in #17421 - Use argparse in scripts/format.py by @adsharma in #17360
- Add missing commas by @szarnyasg in #17424
- Internal #4830: IEJoin Inequality Pushdown by @hawkfish in #17422
- Add conn.query_progress() method by @nickzoic in #16927
- Fixes filter pruning use the statistics updated by the same filter by @Damon07 in #17425
- Fix JSON extension compilation on Ubuntu 22.04 by @staticlibs in #17434
- Use pytest in SQLLogic Python test runner by @Flogex in #16685
- On COPY TO/FROM check the format during binding. by @pdet in #17381
- BUGFIX: DELIM_JOINS should reflect functionality of NULL filtering conditions in joins with DELIM_GETS by @Tmonster in #16910
- Allow directly attaching of Parquet/CSV/JSON files by @Mytherin in #17415
- Force errors when trying lines as early as possible by @pdet in #17427
- Enable
SYSTEM_PEAK_BUFFER_MEMORY
andSYSTEM_PEAK_TEMP_DIR_SIZE
profiling by default by @lnkuiper in #17407 - [C API] Expose the client context, connection id and scalar function bind data by @taniabogatsch in #17449
- [CSV Sniffer] Proper type replacement in header only files by @pdet in #17447
- Recurse into
MAP
andLIST
with theremap_struct
and the MFR ColumnMapper by @Tishj in #17448 - Fix: pyproject.toml does not contain a tool.setuptools_scm section by @YUKI2eN3e in #17443
- [Fix] Macro binding with unknown parameters in list_has_all and some other code tidying by @taniabogatsch in #17450
- Generalize HTTP interface and use the new HTTP interface in
httpfs
by @Mytherin in #17464 - [Fix] Switch between constant and flat vector in C API by @taniabogatsch in #17465
- Fix TIMETZ cast in example by @szarnyasg in #17468
- Remove duplicated arrow fetch test by @emmanuel-ferdman in #17476
- Multi File Reader Rework (Part 19): Make
MultiFileReaderInterface
virtual, and move reading methods to theBaseFileReader
by @Mytherin in #17475 - [Serializer] Lambda Compatibilty Fix by @maiadegraaf in #17428
- fix parsing bool values in JSON by @ccfelius in #17460
- Emit dictionary vectors with unaligned start index by @OmidAfroozeh in #17471
- Add release version by @hannes in #17479
- Expose qualified table names in GetTableNames and add duckdb_get_table_names to C API by @taniabogatsch in #17472
- Bump avro, httpfs, mysql, postgres and sqlite by @Mytherin in #17482
- Fix GeoParquet ExpressionColumnReader schema by @Maxxen in #17481
- add regression_threshold_seconds argument to
regression/test_runner.py
by @hmeriann in #17485 - DROP of missing entry should fail in binding by @jeewonhh in #17474
- HTTPFS Parameters fix by @Mytherin in #17486
- HTTPUtil Fix: correctly pass in on_retry by @Mytherin in #17494
- Bump spatial & vss by @Maxxen in #17492
- Add support for altering structs (drop, add, rename field) inside
LIST
andMAP
columns. by @Tishj in #17462 - [Python Dev] Guard against python exceptions when interacting with the
currentframe
object by @Tishj in #17490 - If distinct count from stats is 0, do not use it in Join Order Optimizer by @Tmonster in #17466
- Make the encodings extension a core extension, and make it auto-loadable. by @pdet in #17206
- Allow passing down rc-style version also via OVERRIDE_GIT_DESCRIBE by @carlopi in #17501
- Allow DUCKDB_EXPLICIT_VERSION to be propagated by @carlopi in #17498
- Minor nightly fixes by @Mytherin in #17500
- Add FileSystem::TryRemoveFile - that only removes a file if it exists by @Mytherin in #17502
- Add OperatorFinalize callback to operators - which is called after a pipeline is finished by @Mytherin in #17503
- Apply dynamic filter pushdown of TopN optimizer also to existing TopN nodes by @Mytherin in #17504
- Fix: Optional Probe Selection by @gropaul in #17505
- FileHandle Logging by @samansmink in #16758
- Fix typos by @szarnyasg in #17478
- Remove spatial from OSX Relassert by @carlopi in #17509
- Update more extensions by @Maxxen in #17510
- Bump HTTPFS again by @Mytherin in #17511
- feat: include catalog and schema names in function serialization by @rustyconover in #17512
- Fix encodings by @carlopi in #17514
- Fix python nightly build by @Tishj in #17515
- Use Catalog::TryAutoLoad for encodings extension by @pdet in #17520
- [Python Dev] Using
reinterpret_steal
breaks the refcount of the passed-in object by @Tishj in #17525 - Fix update extensions by @carlopi in #17527
- Minor fixes to exception error messages by @carlopi in #17528
- [Python Dev] Fix failing tests for the Python SQLLogicTester by @Tishj in #17529
- Resolve GitHub workflow
set-output
deprecation warnings by @kurtmckee in #17516 - [CSV Reader] Detect SQLNULL types for schema merging, use schema merging in csv relations, add files_to_sniff option. by @pdet in #17467
- Fix extension test by @carlopi in #17536
- [Dev] Fix crash when describing a table with a virtual column by @Tishj in #17544
- [HTTPUtil] Let requests made through the
HTTPUtil
interface accept URI's without a scheme. by @Tishj in #17545 - Attach after setting database type by @Mytherin in #17546
- Pass MultiFileGlobalState to InitializeReader, and pass file list to CreateMapping instead of eagerly getting the first file by @Mytherin in #17553
- [Dev] Fix
allowed_directories
crash by @Tishj in #17548 - [Fix] duplicate filters during index scans by @taniabogatsch in #17547
- Generate data for tpch sf100 in steps by @Tmonster in #17539
- Issue #17537: Fractional Second Padding by @hawkfish in #17556
- Make MultiFileList::Copy a virtual method by @Mytherin in #17566
- [Dev] Can't use
USING COMPRESSION
with a deprecated compression type by @Tishj in #17542 - Add (de)serialization for ExtraOperatorInfo by @NiclasHaderer in #17563
- Fix issue with
ExternalFileCache
when data is evicted by @lnkuiper in #17567 - Remote Reads: allocate correct buffer size for prefetch by @Mytherin in #17557
- Remove patch and bump httpfs by @carlopi in #17558
- [Dev] Fix Arrow fixed size binary reading by @Tishj in #17573
- Fix setup.py to correctly handle OVERRIDE_GIT_DESCRIBE by @carlopi in #17580
Full Changelog: v1.2.2...v1.3.0