8000 feat(frontend): make KILL process_id work across all worker nodes by zwang28 · Pull Request #21998 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(frontend): make KILL process_id work across all worker nodes #21998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 27, 2025

Conversation

zwang28
Copy link
Contributor
@zwang28 zwang28 commented May 26, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Resolve #21953

Example

dev=> show processlist;
 Worker Id | Id  | User |      Host       | Database |  Time  |         Info          
-----------+-----+------+-----------------+----------+--------+-----------------------
 2         | 2:0 | root | 127.0.0.1:50447 | dev      | 4114ms | SELECT pg_sleep(3600)
 3         | 3:1 | root | 127.0.0.1:50457 | dev      | 6ms    | SHOW PROCESSLIST
 3         | 3:0 | root | 127.0.0.1:50453 | dev      | 2844ms | SELECT pg_sleep(3600)
(3 rows)

dev=> KILL '2:0';
KILL
dev=> KILL '3:0';
KILL
dev=> show processlist;
 Worker Id | Id  | User |      Host       | Database | Time |       Info       
-----------+-----+------+-----------------+----------+------+------------------
 2         | 2:0 | root | 127.0.0.1:50447 | dev      |      | 
 3         | 3:1 | root | 127.0.0.1:50457 | dev      | 2ms  | SHOW PROCESSLIST
 3         | 3:0 | root | 127.0.0.1:50453 | dev      |      | 
(3 rows)


Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note
10000

@zwang28 zwang28 requested a review from chenzl25 May 26, 2025 10:20
@graphite-app graphite-app bot requested a review from a team May 26, 2025 10:20
@github-actions github-actions bot added the type/feature Type: New feature. label May 26, 2025
Copy link
Contributor
@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@xxchan xxchan requested a review from Copilot May 26, 2025 12:38
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the implementation of the KILL command to support cancelling queries across all worker nodes by using a string representation of the worker process ID. Key changes include:

  • Updating the SQL parser and AST definition to accept a worker process ID string instead of a numeric process ID.
  • Introducing a new WorkerProcessId struct with proper string conversion and display implementations.
  • Extending RPC, session, and worker node management logic to support distributed cancellation via the updated KILL command.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/sqlparser/src/parser.rs Updated KILL command parsing to handle a literal string.
src/sqlparser/src/ast/mod.rs Changed the KILL variant to accept a String in the AST.
src/rpc_client/src/frontend_client.rs Added a new cancel_running_sql RPC call.
src/frontend/src/test_utils.rs Added a stub worker_id method for tests.
src/frontend/src/session.rs Introduced WorkerProcessId and refactored cancellation helpers.
src/frontend/src/rpc/mod.rs Added cancellation RPC endpoint implementation.
src/frontend/src/meta_client.rs Extended the meta client interface with a worker_id method.
src/frontend/src/handler/show.rs Updated process id formatting to use WorkerProcessId.
src/frontend/src/handler/mod.rs Modified KILL statement handling to pass WorkerProcessId.
src/frontend/src/handler/kill_process.rs Updated kill processing to parse and delegate based on worker process ID.
src/frontend/src/error.rs Adjusted error handling to include relevant conversions.
src/batch/src/worker_manager/worker_node_manager.rs Refactored worker node management using a HashMap instead of Vec.
proto/frontend_service.proto Added CancelRunningSql RPC and message definitions.
Comments suppressed due to low confidence (1)

src/frontend/src/handler/kill_process.rs:25

  • [nitpick] The parameter 's' is ambiguous; consider renaming it to 'worker_process_id_str' to improve code clarity.
pub(super) async fn handle_kill(handler_args: HandlerArgs, s: String) -> Result<RwPgResponse> {

@zwang28 zwang28 enabled auto-merge May 26, 2025 14:07
@zwang28
Copy link
Contributor Author
zwang28 commented May 27, 2025

Another frontend stack overflow:

frame_size.txt

The total stack size is 2093680 bytes after this PR, which slightly exceeds the 2MB limit.

I'll increase the RUST_MIN_STACK to 4MB for tests using the debug build.


By the way, I attempted to modify these in recursive.rs, but it still resulted in a stack overflow.

const RED_ZONE: usize = 1 * 1024; // 1KiB
const STACK_SIZE: usize = 4096 * RED_ZONE; // 4MiB

@zwang28 zwang28 added this pull request to the merge queue May 27, 2025
Merged via the queue into main with commit 6a2e153 May 27, 2025
31 of 32 checks passed
@zwang28 zwang28 deleted the wangzheng/global_k branch May 27, 2025 06:26
}
}

impl TryFrom<String> for WorkerProcessId {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may implement FromStr instead.

@@ -5011,8 +5011,8 @@ impl Parser<'_> {
}

pub fn parse_kill_process(&mut self) -> ModalResult<Statement> {
let process_id = self.parse_literal_uint()? as i32;
Ok(Statement::Kill(process_id))
let worker_process_id = self.parse_literal_string()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break Postgres compatibility?


pub(super) async fn handle_kill(
handler_args: HandlerArgs,
pub(super) async fn handle_kill(handler_args: HandlerArgs, s: String) -> Result<RwPgResponse> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we handle permission for KILL?

Comment on lines +3 to +5
if [[ -z "${RUST_MIN_STACK}" ]]; then
export RUST_MIN_STACK=4194304
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better find out why the stack usage increases, other than relaxing the constraints. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Type: New feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve KILL process_id
3 participants
0