-
Notifications
You must be signed in to change notification settings - Fork 1
RFC: a detailed design for create connection #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
--- | ||
feature: Create Connection | ||
authors: | ||
- "Bohan Zhang" | ||
start_date: "2023/01/15" | ||
--- | ||
|
||
# RFC: Create Connection | ||
|
||
## Motivation | ||
|
||
A connection is used to describe how to connect to an external system that users want to read data from. Once created, a connection is reusable across `CREATE SOURCE` and `CREATE SINK` statements. This RFC proposes a new `CREATE CONNECTION` statement to create a connection. | ||
|
||
## Design | ||
|
||
### Syntax | ||
|
||
```sql | ||
-- create connection | ||
CREATE CONNECTION (IF NOT EXIST) {{ connection_name }} | ||
with ( | ||
connection_type = '{{ connection_type }}', | ||
{{ field }} = {{ value }}, ...); | ||
|
||
-- create source with connection | ||
CREATE SOURCE {{ source_name }} ( {{ field }} = {{ value }}, ... ) | ||
with ( | ||
connection = '{{ connection_name }}', | ||
{{ field }} = {{ value }}, ...); | ||
|
||
-- create sink with connection | ||
CREATE SINK {{ sink_name }} ( {{ field }} = {{ value }}, ... ) | ||
FROM {{ source_name }} | ||
with ( | ||
connection = '{{ connection_name }}', | ||
{{ field }} = {{ value }}, ...); | ||
``` | ||
|
||
If any field is specified both in `with` clause and `CREATE CONNECTION` statement, the value in `with` clause will be used. | ||
|
||
#### AWS Private Link | ||
|
||
An AWS Private Link establishes a private connection between a VPC and a service hosted on AWS. It is a secure and scalable way to connect to AWS services from within a VPC. The following fields are required to create an AWS Private Link connection: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, Private Link is orthogonal with the service to connect, and it only makes sense when being used with the definition of source/sink together, right? PS. I guess you borrowed the idea from https://materialize.com/docs/sql/create-connection/
10000
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, private link provides a way to allow the service in our vpc to access resources in another vpc but it cannot be reversed. |
||
|
||
```sql | ||
CREATE CONNECTION demo_connection | ||
with ( | ||
connection_type = 'aws_private_link', | ||
service_name = 'com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxxxxxxxxxx', | ||
availability_zones = 'us-east-1a,us-east-1b,us-east-1c'); | ||
``` | ||
|
||
|Field|Value|Required|Description| | ||
|--|--|--|--| | ||
|`SERVICE NAME`| varchar | yes | The name of the AWS PrivateLink service. | | ||
|`AVAILABILITY ZONES`| varchar | yes | The AWS availability zones in which the service is accessible. Listing the zones in a string, separated by comma. | | ||
|
||
> **Notice**: | ||
> | ||
> * After creating a connection, users must manually approve the connection from a Risingwave VPC in AWS console. | ||
> * When creating a private link to AWS MSK, users must fill in all availability zones of the MSK cluster. | ||
|
||
#### Kafka | ||
|
||
A Kafka connection establishes a link to a Kafka cluster. Users can use Kafka connections to create Kafka sources and sinks. The following fields are required to create a Kafka connection: | ||
|
||
```sql | ||
CREATE CONNECTION demo_connection_1 | ||
with ( | ||
connection_type = 'kafka', | ||
PROPERTIES.BOOTSTRAP.SERVER = 'broker1:9092,broker2:9092'); | ||
|
||
CREATE CONNECTION demo_connection_2 | ||
with ( | ||
connection_type = 'kafka', | ||
PROPERTIES.BOOTSTRAP.SERVER.PRIVATE_LINK = ['broker1:9092' USING AWS_PRIVATE_LINK {{ connection_name }} (PORT 9092), 'broker2:9092' USING AWS_PRIVATE_LINK {{ connection_name }} (PORT 9092)]]; | ||
``` | ||
|
||
For general usage, the following fields are allowed: | ||
|
||
|Field|Value|Required|Description| | ||
|--|--|--|--| | ||
|`PROPERTIES.BOOTSTRAP.SERVER`| varchar | Conditional | The Kafka bootstrap server address. | | ||
|`PROPERTIES.BOOTSTRAP.SERVER.PRIVATE_LINK`| varchar[] | Conditional | The Kafka bootstrap server address with AWS Private Link settings. This field is not allowed to set at the same time as field `PROPERTIES.BOOTSTRAP.SERVER`. | | ||
|
||
For security related fields (SSL/SASL), the fields are identical to [this doc](https://materialize.com/docs/sql/create-connection/#kafka-ssl). | ||
|
||
**Notice**: Risingwave will not check the connection to Kafka is valid or not. | ||
|
||
> Some notes for implementation | ||
> | ||
> 1. Implement libkafka's DNS resolve callback | ||
> 2. Store the dns mapping from the user's brokers address to private link endpoint ip | ||
> 3. Interface for creating and modifying dns mappings | ||
|
||
#### Kinesis | ||
|
||
A Kinesis connection establishes a link to a Kinesis stream. Users can use Kinesis connections to create Kinesis sources and sinks. The following fields are required to create a Kinesis connection: | ||
|
||
```sql | ||
CREATE CONNECTION demo_connection | ||
with ( | ||
connection_type = 'kinesis', | ||
aws.region='user_test_topic', | ||
endpoint='172.10.1.1:9090,172.10.1.2:9090', | ||
aws.credentials.role.arn='arn:aws-cn:iam::602389639824:role/demo_role', | ||
aws.credentials.role.external_id='demo_external_id', | ||
... | ||
); | ||
``` | ||
|
||
Accepted fields are listed below, which is a subset of [the doc](https://www.risingwave.dev/docs/current/create-source-kinesis/#parameters). | ||
|
||
|Field|Required|Description| | ||
|--|--|--| | ||
|`aws.region`| yes | AWS service region. For example, US East (N. Virginia). | | ||
|`endpoint`| Optional |URL of the entry point for the AWS Kinesis service.| | ||
|`aws.credentials.access_key_id`| Conditional | This field indicates the access key ID of AWS. It must appear in pairs with aws.credentials.secret_access_key.| | ||
|`aws.credentials.secret_access_key`|Conditional | This field indicates the secret access key of AWS. It must appear in pairs with aws.credentials.access_key_id.| | ||
|`aws.credentials.session_token`|Optional| The session token associated with the temporary security credentials.| | ||
|`aws.credentials.role.arn`|Optional| The Amazon Resource Name (ARN) of the role to assume.| | ||
|`aws.credentials.role.external_id`|Optional|The [external id](https://aws.amazon.com/blogs/security/how-to-use-external-id-when-granting-access-to-your-aws-resources/) used to authorize access to third-party resources.| | ||
|
||
**Notice**: Risingwave will not check the connection to Kinesis is valid or not. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reason? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. adding a default topic to do the validation is acceptable but has concerns about misuse. |
||
|
||
#### Pulsar | ||
|
||
```sql | ||
CREATE CONNECTION demo_connection | ||
with ( | ||
connection_type = 'pulsar', | ||
service.url = 'pulsar://localhost:6650/', | ||
admin.url='http://localhost:8080',); | ||
``` | ||
|
||
The accepted fields are listed below, which is a subset of [the doc](https://www.risingwave.dev/docs/current/create-source-pulsar/#parameters). | ||
|
||
|Field|Required|Description| | ||
|--|--|--| | ||
|`service.url`| Required| Address of the Pulsar service.| | ||
|`admin.url`|Required| Address of the Pulsar admin.| | ||
|
||
**Notice**: Risingwave will not check the connection to Pulsar is valid or not. | ||
|
||
## Future possibilities | ||
|
||
* store some sensitive information, eg. passwords and SSL keys, in a encrypted way (use `CREATE SECRET`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: In Flink if you connect to a catalog e.g. JDBC database, you also automatically get the schema of tables in that database. Will we support this? Does Materialize support this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. e.g. CREATE CATALOG my_catalog WITH(
'type' = 'jdbc',
'default-database' = '...',
'username' = '...',
'password' = '...',
'base-url' = '...'
);
USE CATALOG my_catalog;
SHOW TABLES;
-- Will print a list of tables
-- OR: SHOW TABLES FROM my_catalog
-- which does not require to `USE CATALOG` first.
DESCRIBE TABLE foo;
-- Will print the schema of that table. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it is related to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any stronger motivation? I mean, like, beside the convenience it brings, is there anything that was impossible but become doable after it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no proper abstraction other than
connection
to connect private link with the existing reader 🤣@wyhyhyhyh has asked for this feature in the last Q
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emmm, they ("private link" and "connection") look independent with each other. "Connection" here provides not only the network access to VPC but also the very credentials it needs to connect to a specific data source/sink.
I mean, since you are proposing to allow users to write AWS VPC credentials along with the Kafka/Kinesis properties e.g.
Then, why not just write AWS VPC credentials within
CREATE SOURCE
:Disclaimer: I am not arguing this syntax is better. I am saying the proposed solution seems not to be the solution to the problem of motivation.