8000 Cannot create new cluster with 25.3.x.x · Issue #1707 · Altinity/clickhouse-operator · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Cannot create new cluster with 25.3.x.x #1707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcio-absmartly opened this issue May 13, 2025 · 9 comments
Closed

Cannot create new cluster with 25.3.x.x #1707

marcio-absmartly opened this issue May 13, 2025 · 9 comments

Comments

@marcio-absmartly
Copy link

When creating a new cluster with CH 25.3.x.x. the first host won't boot up, because the setting clickhouse.remote_servers is not setup properly.

My suspicion is that the operator populates this setting after each host is running, but newer clickhouse versions validate and expect the cluster setup beforehand.
This is verified to work on CH 24.8.x.x

Operator version is 0.24.5

@Slach
Copy link
Collaborator
Slach commented May 13, 2025

could you provide full error message from clickhouse-server container with stacktrace?

@bmsilva
Copy link
bmsilva commented May 14, 2025

the logs:

Poco::Exception. Code: 1000, e.code() = 0, Not found: logger.log (version 25.3.3.42 (official build))
Poco::Exception. Code: 1000, e.code() = 0, Not found: logger.errorlog (version 25.3.3.42 (official build))
/entrypoint.sh: explicitly skip changing user 'default'
ClickHouse Database directory appears to contain a database; Skipping initialization
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/conf.d/chop-generated-hostname-ports.xml'.
Merging configuration file '/etc/clickhouse-server/conf.d/chop-generated-macros.xml'.
Merging configuration file '/etc/clickhouse-server/conf.d/chop-generated-zookeeper.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/01-listen.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/02-logger.xml'.
Merg
8000
ing configuration file '/etc/clickhouse-server/config.d/03-log-tables.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/04-compression.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/05-storage.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/06-memory.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/chop-generated-remote_servers.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/chop-generated-settings.xml'.
2025.05.14 11:04:02.091668 [ 1 ] {} <Information> SentryWriter: Sending crash reports is disabled
2025.05.14 11:04:02.153853 [ 1 ] {} <Information> Application: Starting ClickHouse 25.3.3.42 (revision: 54498, git hash: c4bfe68b052e4a15f731077d86d83b9bc2e5b71f, build id: 9973936C3E9C99EB047A24A5D0962B5E00E1A7E3), PID 1
2025.05.14 11:04:02.154055 [ 1 ] {} <Information> Application: starting up
2025.05.14 11:04:02.154124 [ 1 ] {} <Information> Application: OS name: Linux, version: 6.1.134+, architecture: x86_64
2025.05.14 11:04:02.154346 [ 1 ] {} <Information> Jemalloc: Value for background_thread set to true (from true)
2025.05.14 11:04:02.159030 [ 1 ] {} <Information> Application: Available RAM: 7.76 GiB; logical cores: 2; used cores: 2.
2025.05.14 11:04:02.159086 [ 1 ] {} <Information> Application: Available CPU instruction sets: SSE, SSE2, SSE3, SSSE3, SSE41, SSE42, F16C, POPCNT, BMI1, BMI2, PCLMUL, AES, AVX, FMA, AVX2, AVX512F, AVX512DQ, AVX512CD, AVX512BW, A
VX512VL, ADX, RDRAND, RDSEED, RDTSCP, CLFLUSHOPT, CLWB, XSAVE, OSXSAVE
2025.05.14 11:04:02.159907 [ 1 ] {} <Information> CgroupsReader: Will create cgroup reader from '/sys/fs/cgroup/' (cgroups version: v2)
2025.05.14 11:04:02.340853 [ 1 ] {} <Information> Application: Integrity check of the executable successfully passed (checksum: 4A692E92B3118E45A8AAFDD0CC48C1C5)
2025.05.14 11:04:02.340937 [ 1 ] {} <Information> Application: It looks like the process has no CAP_IPC_LOCK capability, binary mlock will be disabled. It could happen due to incorrect ClickHouse package installation. You could
resolve the problem manually with 'sudo setcap cap_ipc_lock=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.
2025.05.14 11:04:02.340985 [ 1 ] {} <Information> MemoryWorker: Starting background memory thread with period of 50ms, using Cgroups as source
2025.05.14 11:04:02.341209 [ 1 ] {} <Information> BackgroundSchedulePool/BgSchPool: Create BackgroundSchedulePool with 512 threads
2025.05.14 11:04:02.397455 [ 76 ] {} <Information> MemoryTracker: Correcting the value of global memory tracker from 2.89 MiB to 95.34 MiB
2025.05.14 11:04:02.416345 [ 1 ] {} <Information> Application: Lowered uncompressed cache size to 3.88 GiB because the system has limited RAM
2025.05.14 11:04:02.416813 [ 1 ] {} <Information> Application: Lowered mark cache size to 3.88 GiB because the system has limited RAM
2025.05.14 11:04:02.416830 [ 1 ] {} <Information> Application: Lowered primary index cache size to 3.88 GiB because the system has limited RAM
2025.05.14 11:04:02.417025 [ 1 ] {} <Information> Application: Lowered index mark cache size to 3.88 GiB because the system has limited RAM
2025.05.14 11:04:02.431662 [ 1 ] {} <Information> Application: Changed setting 'max_server_memory_usage' to 6.99 GiB (7.76 GiB available memory * 0.90 max_server_memory_usage_to_ram_ratio)
2025.05.14 11:04:02.431732 [ 1 ] {} <Information> Application: Setting merges_mutations_memory_usage_soft_limit was set to 3.88 GiB (7.76 GiB available * 0.50 merges_mutations_memory_usage_to_ram_ratio)
2025.05.14 11:04:02.431763 [ 1 ] {} <Information> Application: Merges and mutations memory limit is set to 3.88 GiB
2025.05.14 11:04:02.433339 [ 1 ] {} <Information> Application: Shutting down storages.
2025.05.14 11:04:02.437037 [ 1 ] {} <Information> Application: Waiting for background threads
2025.05.14 11:04:02.457147 [ 1 ] {} <Information> Application: Background threads finished in 20 ms
2025.05.14 11:04:02.457616 [ 1 ] {} <Error> Application: Code: 347. DB::Exception: Code: 297. DB::Exception: No cluster elements (shard, node) specified in config at path remote_servers.clickhouse. (SHARD_HAS_NO_CONNECTIONS), St
ack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000f4b18fb
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009f7ce6c
2. DB::Exception::Exception<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x0000000009f92f0b
3. DB::Cluster::Cluster(Poco::Util::AbstractConfiguration const&, DB::Settings const&, String const&, String const&) @ 0x000000001309ab68
4. DB::Clusters::updateClusters(Poco::Util::AbstractConfiguration const&, DB::Settings const&, String const&, Poco::Util::AbstractConfiguration*) @ 0x00000000130968c0
5. DB::Clusters::Clusters(Poco::Util::AbstractConfiguration const&, DB::Settings const&, std::shared_ptr<DB::Macros const>, String const&) @ 0x0000000013095e60
6. DB::Context::setClustersConfig(Poco::AutoPtr<Poco::Util::AbstractConfiguration> const&, bool, String const&) @ 0x00000000130d05d7
7. DB::Server::main(std::vector<String, std::allocator<String>> const&)::$_0::operator()(Poco::AutoPtr<Poco::Util::AbstractConfiguration>, bool) const @ 0x000000000f808d23
8. void std::__function::__policy_invoker<void (Poco::AutoPtr<Poco::Util::AbstractConfiguration>, bool)>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::Server::main(std::vector<String, std::allocator<String
>> const&)::$_0, void (Poco::AutoPtr<Poco::Util::AbstractConfiguration>, bool)>>(std::__function::__policy_storage const*, Poco::AutoPtr<Poco::Util::AbstractConfiguration>&&, bool) @ 0x000000000f807247
9. DB::ConfigReloader::reloadIfNewer(bool, bool, bool, bool) @ 0x000000001549eb89
10. DB::ConfigReloader::ConfigReloader(std::basic_string_view<char, std::char_traits<char>>, std::vector<String, std::allocator<String>> const&, String const&, zkutil::ZooKeeperNodeCache&&, std::shared_ptr<Poco::Event> const&, s
td::function<void (Poco::AutoPtr<Poco::Util::AbstractConfiguration>, bool)>&&) @ 0x000000001549d36d
11. DB::Server::main(std::vector<String, std::allocator<String>> const&) @ 0x000000000f7e6a59
12. Poco::Util::Application::run() @ 0x00000000183fe126
13. DB::Server::run() @ 0x000000000f7d5dd0
14. mainEntryClickHouseServer(int, char**) @ 0x000000000f7d2f13
15. main @ 0x0000000009f78881
16. ? @ 0x000079dd53bf8d90
17. ? @ 0x000079dd53bf8e40
18. _start @ 0x0000000006b4502e
 (version 25.3.3.42 (official build)). (CANNOT_LOAD_CONFIG), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000f4b18fb
1. DB::Exception::Exception(PreformattedMessage const&, int) @ 0x000000001129c0ec
2. std::__unique_if<DB::Exception>::__unique_single std::make_unique[abi:ne190107]<DB::Exception, PreformattedMessage const&, int const&>(PreformattedMessage const&, int const&) @ 0x00000000154a1904
3. DB::ConfigReloader::reloadIfNewer(bool, bool, bool, bool) @ 0x000000001549f790
4. DB::ConfigReloader::ConfigReloader(std::basic_string_view<char, std::char_traits<char>>, std::vector<String, std::allocator<String>> const&, String const&, zkutil::ZooKeeperNodeCache&&, std::shared_ptr<Poco::Event> const&, st
d::function<void (Poco::AutoPtr<Poco::Util::AbstractConfiguration>, bool)>&&) @ 0x000000001549d36d
5. DB::Server::main(std::vector<String, std::allocator<String>> const&) @ 0x000000000f7e6a59
6. Poco::Util::Application::run() @ 0x00000000183fe126
7. DB::Server::run() @ 0x000000000f7d5dd0
8. mainEntryClickHouseServer(int, char**) @ 0x000000000f7d2f13
9. main @ 0x0000000009f78881
10. ? @ 0x000079dd53bf8d90
11. ? @ 0x000079dd53bf8e40
12. _start @ 0x0000000006b4502e
 (version 25.3.3.42 (official build))
2025.05.14 11:04:02.457649 [ 1 ] {} <Information> Application: shutting down
2025.05.14 11:04:02.457747 [ 71 ] {} <Information> BaseDaemon: Stop SignalListener thread

@Slach
Copy link
Collaborator
Slach commented May 14, 2025

could you share your kind: ClickHouseInstallation manifest without sensitive credentials?

kubectl get chi -n <namespace> <chi-name> -o yaml?

@bmsilva
Copy link
bmsilva commented May 14, 2025

yaml:

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration:__REMOVED__
  creationTimestamp: "2025-05-14T11:02:10Z"
  finalizers:
  - finalizer.clickhouseinstallation.altinity.com
  generation: 1
  labels:
    name: clickhouse
  name: clickhouse
  namespace: clickhouse
  resourceVersion: "68277"
  uid: 069026e5-2727-4222-8817-2ca83e474974
spec:
  configuration:
    clusters:
    - layout:
        replicasCount: 3
        shardsCount: 1
      name: clickhouse
    profiles:
      clickhouse_operator/http_connection_timeout: 10
      clickhouse_operator/log_queries: 0
      clickhouse_operator/max_concurrent_queries_for_all_users: 0
      clickhouse_operator/os_thread_priority: 0
      clickhouse_operator/skip_unavailable_shards: 1
      default/allow_experimental_analyzer: 1
      default/allow_experimental_bigint_types: 1
      default/allow_experimental_database_replicated: 1
      default/allow_experimental_projection_optimization: 1
      default/compile_aggregate_expressions: 1
      default/connect_timeout_with_failover_ms: 2000
      default/distributed_aggregation_memory_efficient: 1
      default/insert_quorum: 2
      default/join_algorithm: parallel_hash
      default/join_use_nulls: 1
      default/log_queries: 1
      default/log_query_threads: 0
      default/optimize_arithmetic_operations_in_aggregate_functions: 1
      default/parallel_view_processing: 1
      default/short_circuit_function_evaluation: force_enable
      readonly/allow_experimental_analyzer: 1
      readonly/allow_experimental_bigint_types: 1
      readonly/allow_experimental_database_replicated: 1
      readonly/allow_experimental_projection_optimization: 1
      readonly/compile_aggregate_expressions: 1
      readonly/connect_timeout_with_failover_ms: 2000
      readonly/distributed_aggregation_memory_efficient: 1
      readonly/insert_quorum: 2
      readonly/join_algorithm: parallel_hash
      readonly/join_use_nulls: 1
      readonly/log_queries: 1
      readonly/log_query_threads: 0
      readonly/optimize_arithmetic_operations_in_aggregate_functions: 1
      readonly/parallel_view_processing: 1
      readonly/readonly: 2
      readonly/short_circuit_function_evaluation: force_enable
    settings:
      default_session_timeout: 1
      remote_servers/clickhouse/secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      shutdown_wait_unfinished_queries: 1
    users:
      default/password_sha256_hex: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      readonly/networks/ip:
      - ::/0
      - 0.0.0.0/0
      readonly/password_sha256_hex: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      readonly/profile: readonly
      readonly/quota: default
      root/access_management: 1
      root/networks/ip:
      - ::/0
      - 0.0.0.0/0
      root/password_sha256_hex: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      root/profile: default
      root/quota: default
    zookeeper:
      nodes:
      - host: zk-0.zk.clickhouse.svc.cluster.local
        port: 2181
      - host: zk-1.zk.clickhouse.svc.cluster.local
        port: 2181
      - host: zk-2.zk.clickhouse.svc.cluster.local
        port: 2181
  defaults:
    storageManagement:
      provisioner: Operator
    templates:
      dataVolumeClaimTemplate: clickhouse-data-pvc
      podTemplate: clickhouse-pod
      serviceTemplate: clickhouse-svc
  reconciling:
    policy: wait
  templates:
    podTemplates:
    - name: clickhouse-pod
      podDistribution:
      - topologyKey: kubernetes.io/hostname
        type: ClickHouseAntiAffinity
      - topologyKey: failure-domain.beta.kubernetes.io/zone
        type: ReplicaAntiAffinity
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: optimized-for-clickhouse
                  operator: In
                  values:
                  - "true"
              topologyKey: kubernetes.io/hostname
        containers:
        - image: clickhouse/clickhouse-server:25.3.3.42
          name: clickhouse-server
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-data-pvc
          - mountPath: /var/lib/clickhouse-cold
            name: clickhouse-cold-data-pvc
        - command:
          - /bin/clickhouse-backup
          - server
          env:
          - name: LOG_LEVEL
            valueFrom:
              configMapKeyRef:
                key: log_level
                name: clickhousebackup-config
          - name: CLICKHOUSE_HOST
            valueFrom:
              configMapKeyRef:
                key: clickhouse_host
                name: clickhousebackup-config
          - name: CLICKHOUSE_PORT
            valueFrom:
              configMapKeyRef:
                key: clickhouse_port
                name: clickhousebackup-config
          - name: CLICKHOUSE_USERNAME
            valueFrom:
              configMapKeyRef:
                key: clickhouse_username
                name: clickhousebackup-config
          - name: CLICKHOUSE_PASSWORD
            valueFrom:
              secretKeyRef:
                key: clickhouse_password
                name: clickhousebackup-secret
          - name: CLICKHOUSE_USE_EMBEDDED_BACKUP_RESTORE
            valueFrom:
              configMapKeyRef:
                key: use_embedded_backup_restore
                name: clickhousebackup-config
          - name: ALLOW_EMPTY_BACKUPS
            valueFrom:
              configMapKeyRef:
                key: allow_empty_backups
                name: clickhousebackup-config
          - name: API_LISTEN
            valueFrom:
              configMapKeyRef:
                key: api_listen
                name: clickhousebackup-config
          - name: API_CREATE_INTEGRATION_TABLES
            valueFrom:
              configMapKeyRef:
                key: api_create_integration_tables
                name: clickhousebackup-config
          - name: BACKUPS_TO_KEEP_REMOTE
            valueFrom:
              configMapKeyRef:
                key: backups_to_keep_remote
                name: clickhousebackup-config
          - name: REMOTE_STORAGE
            valueFrom:
              configMapKeyRef:
                key: remote_storage
                name: clickhousebackup-config
          - name: GCS_EMBEDDED_ACCESS_KEY
            valueFrom:
              configMapKeyRef:
                key: gcs_embedded_access_key
                name: clickhousebackup-config
          - name: GCS_EMBEDDED_SECRET_KEY
            valueFrom:
              secretKeyRef:
                key: gcs_embedded_secret_key
                name: clickhousebackup-secret
          - name: GCS_BUCKET
            valueFrom:
              configMapKeyRef:
                key: gcs_bucket
                name: clickhousebackup-config
          - name: UPLOAD_CONCURRENCY
            valueFrom:
              configMapKeyRef:
                key: upload_concurrency
                name: clickhousebackup-config
          - name: CLICKHOUSE_TIMEOUT
            valueFrom:
              configMapKeyRef:
                key: timeout
                name: clickhousebackup-config
          - name: CLICKHOUSE_SKIP_TABLES
            valueFrom:
              configMapKeyRef:
                key: skip_tables
                name: clickhousebackup-config
          - name: S3_MAX_PARTS_COUNT
            value: "32"
          image: altinity/clickhouse-backup:2.6.15
          imagePullPolicy: Always
          name: clickhouse-backup
          ports:
          - containerPort: 7171
            name: backup-rest
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-data-pvc
          - mountPath: /var/lib/clickhouse-cold
            name: clickhouse-cold-data-pvc
        tolerations:
        - effect: NoSchedule
          key: clickhouse
          operator: Exists
        - effect: NoSchedule
          key: kubernetes.io/arch
          operator: Equal
          value: arm64
        - effect: NoSchedule
          key: node.kubernetes.io/memory-pressure
          operator: Exists
    serviceTemplates:
    - generateName: '{chi}'
      name: clickhouse-svc
      spec:
        ClusterIP: ""
        ports:
        - name: http
          port: 8123
        - name: client
          port: 9000
        type: ClusterIP
    volumeClaimTemplates:
    - name: clickhouse-data-pvc
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: clickhouse-ext4fs
    - name: clickhouse-cold-data-pvc
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 130Gi
        storageClassName: clickhouse-cold-ext4fs
status:
  chop-commit: 2a4b0f2
  chop-date: 2025-03-14T11:41:00
  chop-ip: 10.96.2.10
  chop-version: 0.24.5
  clusters: 1
  endpoint: clickhouse.clickhouse.svc.cluster.local
  fqdns:
  - chi-clickhouse-clickhouse-0-0.clickhouse.svc.cluster.local
  - chi-clickhouse-clickhouse-0-1.clickhouse.svc.cluster.local
  - chi-clickhouse-clickhouse-0-2.clickhouse.svc.cluster.local
  hosts: 3
  pods:
  - chi-clickhouse-clickhouse-0-0-0
  - chi-clickhouse-clickhouse-0-1-0
  - chi-clickhouse-clickhouse-0-2-0
  shards: 1
  status: Aborted
  taskID: a209fbe8-716b-48f8-afe1-9672c6919845
  taskIDsCompleted:
  - a209fbe8-716b-48f8-afe1-9672c6919845
  taskIDsStarted:
  - a209fbe8-716b-48f8-afe1-9672c6919845

@alex-zaitsev
Copy link
Member
alex-zaitsev commented May 15, 2025

@marcio-absmartly, thank you for heads up. Operator 0.25.0 has not been released yet. We will re-check it against CH 25.3 before the release.

Meanwhile, we are running a number of 25.3+ clusters with operator 0.24.5 with no issues.

@alex-zaitsev
Copy link
Member

Update: we could not reproduce it in tests. But code allows empty remote_servers to appear, so it could be a race condition. We will make sure empty remote_servers would not be created at all

@marcio-absmartly
Copy link
Author

It works fine with already existing clusters. It's when creating a new one that this happens. For us it was happening consistently, the only way to get past it was to create the cluster with 24.8 and then upgrade to 25.3.

@alex-zaitsev
Copy link
Member

@marcio-absmartly , could you check it with 0.25.0 operator version?

@alex-zaitsev
Copy link
Member

Should be fixed in 0.25.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0