10000 RFC: `approx_count_distinct` state table encoding · Issue #3414 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

RFC: approx_count_distinct state table encoding #3414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jon-chuang opened this issue Jun 23, 2022 · 12 comments
Open

RFC: approx_count_distinct state table encoding #3414

jon-chuang opened this issue Jun 23, 2022 · 12 comments
Assignees
Labels
A-agg Area: Aggregate. no-issue-activity type/feature Type: New feature.

Comments

@jon-chuang
Copy link
Contributor
jon-chuang commented Jun 23, 2022

RFC Template

Background

We need to consider approx_count_distinct state table encoding to facilitate:

  1. Recovery from checkpoint upon node failure
  2. Recovery from checkpoint upon config change actor placement

Design

Encoding of approx_count_distinct

key: table_id | bucket_id (16 bits) | count_index (8 bits) (count_index ranges from 0-48 and represents trailing zeroes)
value: count (u64)

bucket_id (16 bits) | count_index (8 bits) forms the PK for the item.

Seems like vnode does not factor into the encoding since we are assuming that approx_count_distinct is a simple agg singleton operator.

Future Optimizations

We could consider approx_count_distinct with group_by. Because of the large default in-memory state size of approx_count_distinct, we may only be able to support a small number of group_by keys.

Discussions

Q&A

@jon-chuang jon-chuang added the type/feature Type: New feature. label Jun 23, 2022
@fuyufjh
Copy link
Collaborator
fuyufjh commented Jun 23, 2022

An alternative design is to store all counts of a bucket in one KV-entry, the size would be 92 bytes and seems okay to me.

@fuyufjh
Copy link
Collaborator
fuyufjh commented Jun 23, 2022

I am also thinking about whether my design is good, because the memory usage it quite high (~1.5MB). This would be a problem for both 2-phase agg and HashAgg. Perhaps we should only provides an append-only approx_count_distinct? 🤔

@jon-chuang
Copy link
Contributor Author
jon-chuang commented Jun 23, 2022

Hmm, indeed.

One way is to use the bucket_id as distribution key to calculate vnode for 2-phase agg. So the state size is split n-way amongst n actors.

The partial agg can compute partial results of the harmonic mean.


As for hash agg, we may not need to support it. Or, we could throw a runtime error if the number of group by keys grows too large?

@fuyufjh
Copy link
Collaborator
fuyufjh commented Jun 24, 2022

One way is to use the bucket_id as distribution key to calculate vnode for 2-phase agg. So the state size is split n-way amongst n actors.

That's a very interesting insight! But it seems hard to be used along with other agg functions.

@fuyufjh
Copy link
Collaborator
fuyufjh commented Jun 24, 2022

As for hash agg, we may not need to support it. Or, we could throw a runtime error if the number of group by keys grows too large?

Yes, this is a feasible solution. I am just not sure whether this is what our users want. Anyway, the approx_count_distinct is expected to be a lightweight (but approximate) alternative for count(distinct), but it seems not lightweight and even cause more limitations.

@fuyufjh
Copy link
Collaborator
fuyufjh commented Jun 24, 2022

We may hold this RFC for a while, I think

@jon-chuang
Copy link
Contributor Author

but it seems not lightweight and even cause more limitations.

Yes, indeed. It is not lightweight when the distinct elements are small in number and group keys are large in number. I think we may need the sparse-dense transition to handle this.

@fuyufjh
Copy link
Collaborator
fuyufjh commented Sep 28, 2022

I have added back this issue to the project board, to remind that we have to need to make a decision on whether we decide to support or disable approx_count_distinct, especially the non-append-only one.

@fuyufjh
Copy link
Collaborator
fuyufjh commented Sep 28, 2022

The difficulty is: the state, as described in the RFC, is very special compared with these aggregate functions that we have supported (e.g. min/max or sum/count)

@jon-chuang
Copy link
Contributor Author
jon-chuang commented Oct 17, 2022

Note that one has to persist a different state per vnode to enable consistent scaling.

@xxchan
Copy link
Member
xxchan commented Mar 29, 2023

What's the status of this RFC?

@stdrc stdrc self-assigned this Apr 24, 2025
@stdrc
Copy link
Member
stdrc commented Apr 24, 2025

What's the status of this RFC?

Current status of approx_count_distinct is:

  • It can work on append-only stream, for both hash agg and simple agg. The state is encoded into a condensed int array as one datum in the "intermediate state table". The encoding/decoding process is not that efficient. I haven't compare the performance with normal count(distinct x) though.
  • It has an implementation for non-append-only stream, can count correctly if no recovery is involved. By recovery, I mean, actor rebuilding for simple agg, and, group cache eviction or actor rebuilding for hash agg. It encode its state by simply store the result value in the "intermediate state table", and when recovering, it simply use that result as "initial count" which is summed with the HLL result for new inputs.

As said in previous discussion, we may consider not to allow this function in hash agg, because the state can explode ridiculously. When there's no many rows within each group, with a correctly implemented non-append-only state, there're fixed 65536 rows for each group, and the value column is of type int64[]. At the same time, a normal count(distinct x) requires only M rows in the state where M is the number of distinct values of x, with value column to be a simple int64 as the duplicate count of each distinct value. It seems not reasonable to use approx_count_distinct anyway when doing a grouped aggregation.

If we do decide to only support this agg for simple agg, we can fearlessly have a state table storing each bucket as one row, then we can have a correct non-append-only implementation.

@stdrc stdrc added the A-agg Area: Aggregate. label May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-agg Area: Aggregate. no-issue-activity type/feature Type: New feature.
Projects
None yet
Development

No branches or pull requests

5 participants
0