8000 `LATERAL` set-returning function produces a suboptimal plan · Issue #21779 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

LATERAL set-returning function produces a suboptimal plan #21779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
BugenZhao opened this issue May 8, 2025 · 3 comments
Open

LATERAL set-returning function produces a suboptimal plan #21779

BugenZhao opened this issue May 8, 2025 · 3 comments
Labels
A-optimizer Area: SQL optimizer.
Milestone

Comments

@BugenZhao
Copy link
Member
BugenZhao commented May 8, 2025

When creating a materialized view that uses a CROSS JOIN LATERAL generate_series(1, v), RisingWave’s planner currently decomposes it into a HashAgg + HashJoin workflow, resulting in multiple exchanges, an aggregation, and a join:

EXPLAIN CREATE MATERIALIZED VIEW mv AS
  SELECT v, i
  FROM t
  , generate_series(1, v) AS i;

Produces:

 StreamMaterialize { columns: [v, i, t._row_id(hidden), t.v(hidden), projected_row_id(hidden)], stream_key: [t._row_id, projected_row_id, v], pk_columns: [t._row_id, projected_row_id, v], pk_conflict: NoCheck }
 └─StreamExchange { dist: HashShard(t.v, t._row_id, projected_row_id) }
   └─StreamHashJoin { type: Inner, predicate: t.v IS NOT DISTINCT FROM t.v }
     ├─StreamExchange { dist: HashShard(t.v) }
     │ └─StreamTableScan { table: t, columns: [v, _row_id] }
     └─StreamProjectSet { select_list: [$0, GenerateSeries(1:Int32, $0)] }
       └─StreamProject { exprs: [t.v] }
         └─StreamHashAgg { group_key: [t.v], aggs: [count] }
           └─StreamExchange { dist: HashShard(t.v) }
             └─StreamTableScan { table: t, columns: [v, _row_id] }
(10 rows)

However, the equivalent form:

EXPLAIN CREATE MATERIALIZED VIEW mv AS
  SELECT v, generate_series(1, v) AS i
  FROM t;

Is planned as a single StreamProjectSet directly on the scan:

 StreamMaterialize { columns: [projected_row_id(hidden), v, i, t._row_id(hidden)], stream_key: [t._row_id, projected_row_id], pk_columns: [t._row_id, projected_row_id], pk_conflict: NoCheck }
 └─StreamProjectSet { select_list: [$0, GenerateSeries(1:Int32, $0), $1] }
   └─StreamTableScan { table: t, columns: [v, _row_id] }

We expect the planner to recognize CROSS JOIN LATERAL generate_series(…) as a row‐by‐row expansion (ProjectSet), avoiding the extra deduplication and join.

@BugenZhao BugenZhao added the A-optimizer Area: SQL optimizer. label May 8, 2025
@github-actions github-actions bot added this to the release-2.4 milestone May 8, 2025
@BugenZhao
Copy link
Member Author
BugenZhao commented May 8, 2025

What's interesting is that I find Postgres recommends the LATERAL way instead of putting it into the SELECT list, which was even marked "will be deprecated," although this note is no longer found.

https://www.postgresql.org/docs/9.6/xfunc-sql.html
https://www.postgresql.org/docs/current/xfunc-sql.html

@BugenZhao
Copy link
Member Author

cc @chenzl25 @xiangjinwu

@BugenZhao BugenZhao modified the milestones: release-2.4, release-2.5 May 8, 2025
@chenzl25
Copy link
Contributor
chenzl25 commented May 8, 2025

I think we can write a rule to match this pattern (transformation after TableFunctionToProjectSetRule)

 └─LogicalApply 
   ├─LogicalScan 
   └─LogicalProject 
     └─LogicalProjectSet 
       └─LogicalValues 

And convert it to

 ─LogicalProject 
   └─LogicalProjectSet
     └─LogicalLogicalScan

explain trace

 EXPLAIN (trace) CREATE MATERIALIZED VIEW mv AS
  SELECT v, i
  FROM t
  , generate_series(1, v) AS i;
 Begin:

 LogicalProject { exprs: [t.v, generate_series] }
 └─LogicalApply { type: Inner, on: true, correlated_id: 1 }
   ├─LogicalScan { table: t, columns: [v, _row_id, _rw_timestamp] }
   └─LogicalTableFunction { table_function: GenerateSeries(1:Int32, CorrelatedInputRef { index: 0, correlated_id: 1 }) }

 Table Function Convert:

 apply TableFunctionToProjectSetRule 1 time(s)

 LogicalProject { exprs: [t.v, GenerateSeries(1:Int32, CorrelatedInputRef { index: 0, correlated_id: 1 })] }
 └─LogicalApply { type: Inner, on: true, correlated_id: 1 }
   ├─LogicalScan { table: t, columns: [v, _row_id, _rw_timestamp] }
   └─LogicalProject { exprs: [GenerateSeries(1:Int32, CorrelatedInputRef { index: 0, correlated_id: 1 })] }
     └─LogicalProjectSet { select_list: [GenerateSeries(1:Int32, CorrelatedInputRef { index: 0, correlated_id: 1 })] }
       └─LogicalValues { rows: [[]], schema: Schema { fields: [] } }

TODO: apply the new rule here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-optimizer Area: SQL optimizer.
Projects
None yet
Development

No branches or pull requests

2 participants
0