Closed
Description
What happens?
After upgrading to duckdb 1.2.0 I have noticed certain queries with complicated window functions or multiple window clauses are much slower to execute.
To Reproduce
Here is an example:
select
count(distinct uid) over preceding_year as uid_count_rolling,
count(distinct uid) over cumulative as uid_count_cumulative,
count(distinct (f1, f2)) over preceding_year as f_rolling,
count(distinct (f1, f2)) over cumulative as f_cumulative,
from t1
window
preceding_year as (
partition by g1, g2, g3, g4
order by d asc
range between interval 11 months preceding and current row
),
cumulative as (
partition by g1, g2, g3, g4
order by d asc
rows between unbounded preceding and current row
)
On my machine on a table with ~16M rows this took 1m44s on v1.2.0 but only 26s for v1.1.3, so ~4x, and I have seen more slowdown for more complicated queries. I can provide the table if needed (~200 MB).
OS:
macOS arm / apple silicon
DuckDB Version:
1.2.0
DuckDB Client:
CLI
Hardware:
repro with 8 core/ 8GB mem and with 16 core / 128GB
Full Name:
Matt Hanlon
Affiliation:
PwC
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
No - I cannot easily share my data sets due to their large size
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have