8000 feat: delay temporal filter && optimize dyn filter with always relax condition by st1page · Pull Request #13985 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: delay temporal filter && optimize dyn filter with always relax condition #13985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Dec 18, 2023

Conversation

st1page
Copy link
Contributor
@st1page st1page commented Dec 13, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

#13916

  • allow now in upper bound condition
  • add condition_always_relax flag means the right side's change always makes the condition more relaxed, only store records which does not satisfy the condition in the table in this case.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

Please ping me and I will write the doc for the usage of the delay temporal filter, thanks

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@github-actions github-actions bot added the type/feature Type: New feature. label Dec 13, 2023
Copy link
codecov bot commented Dec 14, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (d8dd8f4) 68.04% compared to head (12982b9) 68.06%.
Report is 20 commits behind head on main.

Files Patch % Lines
src/stream/src/from_proto/dynamic_filter.rs 0.00% 8 Missing ⚠️
src/common/src/util/stream_graph_visitor.rs 0.00% 5 Missing ⚠️
src/stream/src/executor/dynamic_filter.rs 97.72% 3 Missing ⚠️
...d/src/optimizer/plan_node/stream_dynamic_filter.rs 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13985      +/-   ##
==========================================
+ Coverage   68.04%   68.06%   +0.02%     
==========================================
  Files        1536     1536              
  Lines      265364   265521     +157     
==========================================
+ Hits       180557   180727     +170     
+ Misses      84807    84794      -13     
Flag Coverage Δ
rust 68.06% <90.17%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@st1page st1page changed the title feat: delay temporal filter feat: delay temporal filter && optimize dyn filter with alway relax condition Dec 14, 2023
Comment on lines 46 to 58
// TODO(st1page): the condition is wrong, introduce monotonically increasing property of the node
// TODO(st1page): https://github.com/risingwavelabs/risingwave/pull/13984
let right_monotonically_increasing = {
if let Some(e) = core.right().as_stream_exchange() && *e.distribution() == Distribution::Broadcast {
if let Some(proj) = e.input().as_stream_project() {
proj.input().as_stream_now().is_some()
} else {
false
}
} else {
false
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack for the temporal filter, but will fix it with monotonic property later. The #13984 is too big and I'd like to merge this PR at first because we only use now executor as the temporal filter and the hack is correct.

Comment on lines 427 to 430
// TODO(st1page): https://github.com/risingwavelabs/risingwave/issues/13998
if self.condition_always_relax && !self.cleaned_by_watermark {
to_delete_rows.push(row.clone());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@st1page st1page requested review from chenzl25, kwannoel, BugenZhao, hzxa21 and fuyufjh and removed request for chenzl25, kwannoel, BugenZhao and hzxa21 December 14, 2023 15:52
@@ -88,7 +88,12 @@ fn visit_stream_node_tables_inner<F>(
always!(node.right_degree_table, "HashJoinDegreeRight");
}
NodeBody::DynamicFilter(node) => {
always!(node.left_table, "DynamicFilterLeft");
if node.condition_always_relax {
always!(node.left_table, "DynamicFilterLeftNotSatisfy");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we test this new branch somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current e2e test includes the branch.

@@ -60,6 +60,11 @@ pub struct DynamicFilterExecutor<S: StateStore, const USE_WATERMARK_CACHE: bool>
metrics: Arc<StreamingMetrics>,
/// The maximum size of the chunk produced by executor at a time.
chunk_size: usize,
/// If the right side's change always make the condition more relaxed.
/// In other words, make more record in the left side satisfy the condition.
/// In that case, there are only records which does not satisfy the condition in the table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this means we store records which do not satisfy the condition,

so if the condition is something like:

LHS < NOW()

We only store

LHS >= NOW()

But why do we store records which do not satisfy the condition in the table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I got it.
Consider the following case with the following records in upstream:

1
2
3
4

And the current filter (RHS < NOW()) is:

3

We store internal state, which only stores values which are excluded:

3
4

When RHS increases to 5,

3
4

We can now emit 3, 4 to downstream.

@@ -429,11 +429,9 @@ message DynamicFilterNode {
catalog.Table left_table = 3;
// Right table stores single value from RHS of predicate.
catalog.Table right_table = 4;
// It is true when the right side of the inequality predicate is monotonically:
// - decreasing for <, <=, increasing for >, >=
// bool is_monotonic = 10;
Copy link
Contributor
@kwannoel kwannoel Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about case where we have non-monotonic input. 🤔

For now I suppose it's not supported. But I guess that's the intention of having the is_monotonic flag previously.

Perhaps we should just keep that field commented out instead of removing it.

I don't think condition_always_relax can handle it.
Because when condition_always_relax is false, I think the current intention is for it to mean that right hand's side change always more the condition more constrained. But this only holds for monotonic RHS.

With no monotonicity property, the opposite of it actually implies that right side's change sometimes makes the condition more relaxed, sometimes more constrained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we need it, we can add it back. I think the field should be on-demand by the executor with its specific optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think we should still add some comment, because DynamicFilter's RHS may not always be monotonic.

So for the condition_always_relax we should add a comment which says that we assume the input to always be monotonically changing.

Copy link
Contributor
@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor comments.

Co-authored-by: Noel Kwan <47273164+kwannoel@users.noreply.github.com>
st1page and others added 3 commits December 18, 2023 14:05
Co-authored-by: Noel Kwan <47273164+kwannoel@users.noreply.github.com>
Co-authored-by: Eric Fu <eric@singularity-data.com>
@st1page st1page added this pull request to the merge queue Dec 18, 2023
Merged via the queue into main with commit 78b4dbe Dec 18, 2023
@st1page st1page deleted the sts/delay_temporal_filter branch December 18, 2023 07:44
github-actions bot pushed a commit that referenced this pull request Dec 18, 2023
…condition (#13985)

Co-authored-by: Noel Kwan <47273164+kwannoel@users.noreply.github.com>
Co-authored-by: Eric Fu <eric@singularity-data.com>
st1page added a commit that referenced this pull request Dec 18, 2023
…condition (#13985) (#14038)

Co-authored-by: stonepage <40830455+st1page@users.noreply.github.com>
Co-authored-by: Noel Kwan <47273164+kwannoel@users.noreply.github.com>
Co-authored-by: Eric Fu <eric@singularity-data.com>
Copy link
Member
@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Type: New feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0