8000 [Enhancement] prefer two-phase plan for distinct aggregation in single node env by murphyatwork · Pull Request #60029 · StarRocks/starrocks · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Enhancement] prefer two-phase plan for distinct aggregation in single node env #60029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Copy link
Contributor
@murphyatwork murphyatwork commented Jun 18, 2025

Why I'm doing:

SELECT
    get_json_string(data, 'commit.collection') AS event,
    count() AS count,
	  count(DISTINCT get_json_string(data, 'did')) AS users
FROM bluesky
WHERE (get_json_string(data, 'kind') = 'commit')
  AND (get_json_string(data, 'commit.operation') = 'create') 
GROUP BY event
ORDER BY count DESC;

For this kind of query:

  • low-cardinality GROUP_BY
  • high-cardinality COUNT_DISTINCT

Different query plans and performance in a single-node env:

  1. 1-phase: 9.4s
  2. 2-phase: 2.8s (best)
  3. 3-phase: 8.3s (default, without stats on GROUP_BY)
  4. 4-phase: 4.3s (with stats on GROUP_BY)

The 2-phase plan is clearly the better option; however, it is not the default. The default plan is the 3-phase plan, as it is more suitable for multi-node clusters due to its ability to handle data skew and provide better scalability. However, in a single-node cluster, the multi-phase plan offers no advantages and only adds unnecessary overhead.

Therefore, we aim to select the 2-phase plan for single-node environments.

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.5
    • 3.4
    • 3.3

Signed-off-by: Murphy <mofei@starrocks.com>
@murphyatwork murphyatwork requested a review from a team as a code owner June 18, 2025 09:04
if (isSingleNodeExecution(ConnectContext.get())) {
return true;
}

return CollectionUtils.isNotEmpty(operator.getGroupingKeys())
&& aggMode == AUTO.ordinal()
8000 && isTwoStageMoreEfficient(input, distinctColumns);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
The ConnectContext.get() method call assumes that a valid context instance will always be returned, which might not be the case. If it returns null, this would lead to a NullPointerException.

You can modify the code like this:

+        ConnectContext context = ConnectContext.get();
+        if (context != null && isSingleNodeExecution(context)) {
+            return true;
+        }

Signed-off-by: Murphy <mofei@starrocks.com>
Copy link

Copy link
< 98D0 tr class="d-block">

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0