-
Notifications
You must be signed in to change notification settings - Fork 647
docs: add more docs on different keys in RisingWave #21749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
## Order Key | ||
|
||
The order key is user-specified, in SQL. For example: | ||
|
||
```sql | ||
create materialized view m1 as select * from t1 order by i, id; | ||
``` | ||
|
||
The order key is `i, id`, which is the order of the columns in the `order by` clause. | ||
|
||
It is used to ensure locality of records in storage, and locality of record updates in streaming. | ||
To ensure storage locality, when deriving the storage key, we will use the order key as a prefix. | ||
To ensure streaming locality, we will use the order key as a prefix of the stream key. | ||
|
||
## Group Key | ||
|
||
Similar to the order key, the group key is also user-specified in SQL. | ||
We also apply the same derivations of group key in stream key, to ensure storage and streaming locality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think order key and group key are not really the same level of concept as the following keys.
They are user-facing concepts. And the following are how things work internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'll include a section for them inside stream_key and storage_pk?
Since they are used in the derivation step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, a section about their derivation (maybe with examples) would be nice to have, and the user facing part should also go there.
### User Specified Primary Key | ||
|
||
When the user specifies primary key in their `create table` statement, | ||
it is used to ensure that the records are unique in the table. | ||
We will always use the user-specified primary key as the stream key and storage primary key | ||
|
||
For other streaming jobs such as sinks, materialized views and indexes, | ||
we will use the user-specified primary key in our derivation of the stream key and storage primary key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to understand storage primary key, need to first understand what "Table" means.. (quite messy)
A table = TableCatalog = a hummock table = internal state table/ MV (i.e., the state table for MaterializeExec) / ...
While User Specified Primary Key is a very far concept from here
We want data to be distributed in a way that minimizes data skew, | ||
and maximizes data locality, for more efficient stateful processing. | ||
|
||
To ensure data consistency of updates (U-, U+), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to be U-
U+
. The order of updates in any form for records with the same stream key should be preserved.
and maximizes data locality, for more efficient stateful processing. | ||
|
||
To ensure data consistency of updates (U-, U+), | ||
the distribution key must always be a prefix of the stream key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no need to be a prefix? A subset is okay.
create materialized view m1 as select * from t1 order by i, id; | ||
``` | ||
|
||
The order key is `i, id`, which is the order of the columns in the `order by` clause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more precise, the order_key
also specifies an OrderType
for each column, including direction (ASC
DESC
) and nulls ordering.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Closes #9942
Checklist
Documentation
Release note