-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Add support for materialized CTEs #7126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Edit: This is fixed.
Edit: This is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good - very cool feature! I have left some comments inline. One note is that we have been working to remove C-style pointers (see #7080) - this might cause a few merge conflicts on your end.
SELECT x FROM (WITH t(y, z) AS MATERIALIZED (SELECT 1, 2) SELECT * FROM t) AS _(x, y);
Error: INTERNAL Error: Failed to bind column reference "x" [0.0] (bindings: [9.0])
This is the column binding resolver that complains. Essentially "x" is bound as the first column to table binding 0
, but table binding 0
is not accessible from that part of the plan.
Second, pipeline construction of two (or more) materialized recursive CTEs fails:
I'm not sure about what causes this exactly - I would guess that something goes wrong in the order in which BuildPipelines
is executed so that the layered situation is not resolved correctly, or perhaps the incorrect pipeline is passed on to the child.
Perhaps it would also be interesting to add a benchmark where the benefits of the materialized CTEs are visible?
The binder issue seems to be resolved now. Thank you for the comments. I will address them.
I think I will just rebase onto master and do the rewritings before I take this pr out of its draft status. Should be a clean solution. |
I think all remaining issues are now resolved. |
NVM I guess. |
Some test cases can (and should!) be re-enabled when decorrelation of materialized CTEs is implemented.
If you find the time, @Mytherin, I think this pr is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! Looks great to me now. One minor comment otherwise this is good to go. Could you also merge with feature
so the CI can rerun? I believe the CI failures should be fixed then.
TransformCTE(*PGPointerCast<duckdb_libpgquery::PGWithClause>(stmt.withClause), result->cte_map, | ||
materialized_ctes); | ||
if (!materialized_ctes.empty()) { | ||
throw NotImplementedException("Materialized CTEs are not implemented for delete."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be update
- also could you perhaps expand on the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the message is wrong. I will fix that.
There is an AST 'mismatch' currently. Materialized CTEs are represented as a QueryNode
, whereasINSERT/UPDATE/DELETE
are SQLStatements
. This means, that it is easy to represent the materialized CTEs as a bunch of stacked QueryNodes
(this happens in Transformer::TransformMaterializedCTE
). It is not that easy for INSERT/UPDATE/DELETE
, however. While there should not be a principle problem implementing that, my current plan is to tackle that in a separate pr.
statement ok | ||
PRAGMA enable_verification | ||
|
||
require noalternativeverify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative verify would lead to non-MATERIALIZED
CTEs here. Therefore, the tests would be unexpectedly successful.
Thanks! |
This pr adds support for materialized CTE expressions as an alternative to inlining, as I briefly mentioned in #404.
Inlining CTEs can be great, but it can also multiply the work. Take this query for example:
Inlining duplicates the definition of
t
for each reference.This results in the following query:
Now, if
t
is an expensive query, there may be a problem.With this pr, CTEs can be annotated as being
MATERIALIZED
.Such CTEs first fully materialize their result, which can then be read:
Instead of copying
t
for each reference,t
is calculated exactly once. Since the result is cached, no work is done multiple times.