8000 refactor(meta): avoid using frontend generated `ColumnIndexMapping` when constructing new dispatcher for replacement by BugenZhao · Pull Request #21499 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

refactor(meta): avoid using frontend generated ColumnIndexMapping when constructing new dispatcher for replacement #21499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 13, 2025

Conversation

BugenZhao
Copy link
Member
@BugenZhao BugenZhao commented Apr 22, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Previously, when performing schema change, the frontend will generate a ColumnIndexMapping between the old and the new schema. It will be used by the meta node to rewrite the output_indices field of the dispatchers, to maintain the same schema from the view of existing downstream jobs.

Things start to get complicated since we support altering the column type (#19755): instead of using a simple output_indices to transform the data chunks in the dispatcher, we need to introduce some sort of expression evaluation to transform between between two different but compatible composite types.

Note that altering type can be called multiple times on a column. Instead of rewriting the transforming expression with an enhanced FieldIndexMapping, it's obvious that generating the transformation between the latest upstream & downstream schema each time can be much easier to implement. This is the motivation behind this PR.

However, we still cannot deprecate the frontend generated ColumnIndexMapping now, because it's also used to rewrite the index expressions. Considering that the persisted information is all about column index instead of ID, I still haven't come up with a solution to easily update the functional indexes after altering column type. UPDATE: it will be removed in an upcoming PR after we eliminate the need for using it to rewrite index expressions:

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@BugenZhao BugenZhao marked this pull request as ready for review April 22, 2025 08:35
@BugenZhao BugenZhao changed the title refactor: avoid using frontend generated ColumnIndexMapping when constructing new dispatcher for replacement refactor(meta): [WIP] avoid using frontend generated ColumnIndexMapping when constructing new dispatcher for replacement Apr 22, 2025
@BugenZhao BugenZhao changed the title refactor(meta): [WIP] avoid using frontend generated ColumnIndexMapping when constructing new dispatcher for replacement refactor(meta): avoid using frontend generated ColumnIndexMapping when constructing new dispatcher for replacement Apr 23, 2025
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the dispatcher construction for replacement by removing reliance on the frontend‑generated ColumnIndexMapping for rewriting output indices and by introducing a new helper function for generating output indices. Key changes include:

  • Adding a new method (column_ids) for SourceBackfillNode in the prost module.
  • Replacing inline column index extraction with the helper function gen_output_indices in the meta module.
  • Renaming and converting types from DispatchStrategy to DispatcherType across several meta and controller files to support the refactoring.

Reviewed Changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/prost/src/lib.rs New method column_ids added to SourceBackfillNode.
src/meta/src/stream/stream_graph/fragment.rs Updated logic for generating output_indices and converted variable names to enhance clarity.
src/meta/src/rpc/ddl_controller.rs Removal of the col_index_mapping parameter and related dispatcher rewriting.
src/meta/src/manager/metadata.rs Updated type usage from DispatchStrategy to PbDispatcherType.
src/meta/src/controller/fragment.rs Refactored downstream fragment processing and dispatcher type conversion.
Files not reviewed (2)
  • e2e_test/ddl/alter_table_column_type.slt: Language not supported
  • e2e_test/source_legacy/cdc_inline/alter/cdc_table_alter.slt: Language not supported
Comments suppressed due to low confidence (1)

src/meta/src/stream/stream_graph/fragment.rs:1147

  • [nitpick] Consider updating the parameter type of required_columns from &Vec to &[i32] for a more idiomatic and flexible API.
fn gen_output_indices(required_columns: &Vec<i32>, upstream_columns: Vec<i32>) -> Option<Vec<u32>> {

Copy link
Member
@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we want to switch from calculating in FE to calculating in meta?

What was the original reason to calculate in FE? 🤔

@BugenZhao
Copy link
Member Author

Do you mean we want to switch from calculating in FE to calculating in meta?

Exactly

What was the original reason to calculate in FE? 🤔

Cannot remember clearly.

For example, there's an upstream table on its schema version $i$, and we create a downstream job from it. Then we now have a column mapping $m(up_i \rightarrow down)$ in the dispatcher. When we are going to alter the upstream table to schema version $i+1$

  • (current approach) the frontend generates $m(up_{i+1} \rightarrow up_i)$, the meta node applies it to the existing mapping $m(up_i \rightarrow down)$ in the dispatcher, to get

    $$m(up_{i+1} \rightarrow down) = m(up_{i+1} \rightarrow up_i) \cdot m(up_i \rightarrow down)$$

  • (approach after this PR) the meta node directly resolve the schema of $up_{i+1}$ and compare it against $down$, to generate $m(up_{i+1} \rightarrow down)$.

The previous approach eliminated the need for the meta node to resolve table schema information, which might have been considered simpler to implement when table schema changes were first introduced.

However, the key point is that index mapping can be flattened regardless of how many mappings are applied. This does hold any more if we are going to work on complicated mappings involving expressions (for altering column type).

@BugenZhao BugenZhao force-pushed the bz/remove-alter-frontend-gen-col-idx-mapping branch from b7650c8 to aab2115 Compare April 29, 2025 09:36
@BugenZhao BugenZhao force-pushed the bz/remove-alter-frontend-gen-col-idx-mapping branch 2 times, most recently from 9a1e7d9 to 05dc197 Compare May 1, 2025 07:05
@BugenZhao BugenZhao changed the base branch from main to graphite-base/21499 May 1, 2025 08:54
@BugenZhao BugenZhao force-pushed the bz/remove-alter-frontend-gen-col-idx-mapping branch from 05dc197 to 8cee6c9 Compare May 1, 2025 08:54
@BugenZhao BugenZhao changed the base branch from graphite-base/21499 to bz/dashboard-show-index May 1, 2025 08:54
Base automatically changed from bz/dashboard-show-index to main May 6, 2025 06:04
@BugenZhao BugenZhao requested review from stdrc and xxchan May 6, 2025 06:41
@BugenZhao BugenZhao force-pushed the bz/remove-alter-frontend-gen-col-idx-mapping branch from 8cee6c9 to 088350c Compare May 6, 2025 06:44
BugenZhao added 8 commits May 13, 2025 15:08
…ering a table

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao force-pushed the bz/remove-alter-frontend-gen-col-idx-mapping branch from 088350c to 8de600f Compare May 13, 2025 07:13
Copy link
Contributor
@shanicky shanicky left a comment

LGTM!

@BugenZhao BugenZhao added this pull request to the merge queue May 13, 2025
Merged via the queue into main with commit 148b494 May 13, 2025
33 checks passed
@BugenZhao BugenZhao deleted the bz/remove-alter-frontend-gen-col-idx-mapping branch May 13, 2025 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0