8000 docs: add more docs on different keys in RisingWave by kwannoel · Pull Request #21749 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

docs: add more docs on different keys in RisingWave #21749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kwannoel
Copy link
Contributor
@kwannoel kwannoel commented May 7, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Add doc on fundep

Closes #9942

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@kwannoel kwannoel requested review from BugenZhao and xxchan May 7, 2025 06:34
@github-actions github-actions bot added Invalid PR Title A-doc Area: Documentation. labels May 7, 2025
Comment on lines +5 to +22
## Order Key

The order key is user-specified, in SQL. For example:

```sql
create materialized view m1 as select * from t1 order by i, id;
```

The order key is `i, id`, which is the order of the columns in the `order by` clause.

It is used to ensure locality of records in storage, and locality of record updates in streaming.
To ensure storage locality, when deriving the storage key, we will use the order key as a prefix.
To ensure streaming locality, we will use the order key as a prefix of the stream key.

## Group Key

Similar to the order key, the group key is also user-specified in SQL.
We also apply the same derivations of group key in stream key, to ensure storage and streaming locality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think order key and group key are not really the same level of concept as the following keys.

They are user-facing concepts. And the following are how things work internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll include a section for them inside stream_key and storage_pk?
Since they are used in the derivation step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, a section about their derivation (maybe with examples) would be nice to have, and the user facing part should also go there.

Comment on lines +60 to +67
### User Specified Primary Key

When the user specifies primary key in their `create table` statement,
it is used to ensure that the records are unique in the table.
We will always use the user-specified primary key as the stream key and storage primary key

For other streaming jobs such as sinks, materialized views and indexes,
we will use the user-specified primary key in our derivation of the stream key and storage primary key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member
@xxchan xxchan May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to understand storage primary key, need to first understand what "Table" means.. (quite messy)

A table = TableCatalog = a hummock table = internal state table/ MV (i.e., the state table for MaterializeExec) / ...

While User Specified Primary Key is a very far concept from here

We want data to be distributed in a way that minimizes data skew,
and maximizes data locality, for more efficient stateful processing.

To ensure data consistency of updates (U-, U+),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be U- U+. The order of updates in any form for records with the same stream key should be preserved.

and maximizes data locality, for more efficient stateful processing.

To ensure data consistency of updates (U-, U+),
the distribution key must always be a prefix of the stream key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no need to be a prefix? A subset is okay.

create materialized view m1 as select * from t1 order by i, id;
```

The order key is `i, id`, which is the order of the columns in the `order by` clause.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, the order_key also specifies an OrderType for each column, including direction (ASC DESC) and nulls ordering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-doc Area: Documentation. Invalid PR Title
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write a doc on different types of keys in our database
3 participants
0