This project sets up a ClickHouse Cluster with one shard and two replicas using ClickHouse Keeper. Each component is containerized using Docker Compose.
- 1 Shard, 2 Replicas: Provides data replication and failover capabilities.
- ClickHouse Keeper: Manages cluster state and leader election.
- Separation of Components: Each ClickHouse and Keeper instance runs in separate Docker Compose files, allowing for flexibility in deployment across different servers.
clickhouse-01
: First replica of the ClickHouse database.clickhouse-02
: Second replica of the ClickHouse database.clickhouse-keeper-01
,clickhouse-keeper-02
,clickhouse-keeper-03
: ClickHouse Keeper instances.ch-proxy
: Proxy to route requests to the ClickHouse cluster.
-
Clone the repository:
git clone https://github.com/saratqpy/clickhouse-cluster.git
-
Start the containers using Docker Compose:
docker-compose up -d
-
Verify the setup by connecting to the cluster:
docker exec -it clickhouse-01 clickhouse-client
-
Open DBeaver or any SQL client and create a new SQL script:
CREATE DATABASE test ON CLUSTER cluster_1S_2R;
The cluster name
cluster_1S_2R
is defined in theconfig.xml
under theremote_servers
section. -
Create a replicated table across the cluster:
CREATE TABLE test.my_table ON CLUSTER cluster_1S_2R ( id UInt32, name String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/my_table', '{replica}') ORDER BY id;
This uses the
ReplicatedMergeTree
engine to maintain a copy of the data across replicas. The table is namedmy_table
. -
Create a distributed table:
CREATE TABLE test.my_table_distributed ON CLUSTER cluster_1S_2R AS test.my_table ENGINE = Distributed(cluster_1S_2R, test, my_table, rand());
This will distribute data across shards, ensuring that if multiple shards exist, tables in the same shard hold the same data.
-
View cluster information:
SELECT * FROM system.clusters;
There are two ways to insert data into the tables:
-
Manual SQL Execution:
INSERT INTO test.my_table_distributed VALUES (1, 'abc');
Note: Direct inserts will throw an error if one of the nodes is down.
-
Using CH Proxy: You can use CH Proxy to automatically detect and avoid sending requests to nodes that are down. Here's an example of inserting data through the proxy:
curl -u admin_1S_2R:1 -X POST http://127.0.0.1:80/ -d "INSERT INTO test.my_table VALUES (1, 'abc')"
admin_1S_2R
: Username set in theconfig.yml
file for CH Proxy.1
: Password defined in the same file.- Port
80
: The configured port for CH Proxy.
To query data and see which node executed the query:
curl -u admin_1S_2R:1 -X POST http://127.0.0.1:80/ -d "SELECT hostName(), * FROM test.my_table LIMIT 1"
The hostName()
function returns the name of the node that executed the query.
This project is licensed under the MIT License. See the LICENSE file for more details.
Feel free to open issues or submit pull requests for any improvements or new features.
For any questions or feedback, please reach out via email.