Issue with anti affinity rules #1036

vigodeltoro · 2022-10-31T15:00:54Z

Hi there,

I have a problem with the anti affinity rules.. maybe there is somebody out there who can help me out.
I have a three nodes Kubernetes setup with a 3 shard cluster with one replica each..

So there 6 pods at the cluster. I try to use anti affinity rules to distribute the pods to the 3 nodes. My goal is to have 2 pods per node but not the same shard or the same replica. Something like the the example below..

chi-protobuf-example-dev-0-0-0 node1
chi-protobuf-example-dev-0-1-0 node2
chi-protobuf-example-dev-1-0-0 node3
chi-protobuf-example-dev-1-1-0 node1
chi-protobuf-example-dev-2-0-0 node2
chi-protobuf-example-dev-2-1-0 node3

The anti affinity rules I'm using are like the example of the /docs/chi-examples dir ( https://github.com/Altinity/clickhouse-operator/blob/master/docs/chi-examples/99-clickhouseinstallation-max.yaml)

podTemplates:
- name: pod-template-with-init-container
podDistribution:
- type: ShardAntiAffinity
- type: MaxNumberPerNode
number: 2
topologyKey: "kubernetes.io/hostname"
- type: ReplicaAntiAffinity
- type: MaxNumberPerNode
number: 2
topologyKey: "kubernetes.io/hostname"

But what's happening every time I deploy is:

chi-protobuf-example-dev-0-0-0 node1
chi-protobuf-example-dev-0-1-0 node2
chi-protobuf-example-dev-1-0-0 node1
chi-protobuf-example-dev-1-1-0 node2
chi-protobuf-example-dev-2-0-0 node3
chi-protobuf-example-dev-2-1-0 Pending because no free node is available

That's really problematic, because I can't use my resources properly..

Does anybody has an idea ?

Thanks a lot and best regards

alex-zaitsev · 2022-11-01T15:49:15Z

You only need ReplicaAntiAffinity and that's it.

    - name: pod-template-with-init-container
      podDistribution:
      - scope: ClickHouseInstallation
        type: ReplicaAntiAffinity
        topologyKey: "kubernetes.io/hostname"

vigodeltoro · 2022-11-03T09:27:14Z

Hi Alex,

okay.. thanks a lot for that hint. I tried it out and achieved the following distribution

chi-protobuf-example-dev-0-0-0 node1
chi-protobuf-example-dev-0-1-0 node1
chi-protobuf-example-dev-1-0-0 node2
chi-protobuf-example-dev-1-1-0 node2
chi-protobuf-example-dev-2-0-0 node3
chi-protobuf-example-dev-2-1-0 node3

With that I got a distribution over all three nodes but shard and replica are on the same node which means that if I loose one node I loose 30% of my database, so redundancy is gone..

If I try it with :

name: pod-template-with-init-container
podDistribution:
- scope: ClickHouseInstallation
type: ShardAntiAffinity
topologyKey: "kubernetes.io/hostname"

I got only a 2 node distribution:

chi-protobuf-example-dev-0-0-0 node1
chi-protobuf-example-dev-0-1-0 node2
chi-protobuf-example-dev-1-0-0 node1
chi-protobuf-example-dev-1-1-0 node2
chi-protobuf-example-dev-2-0-0 node1
chi-protobuf-example-dev-2-1-0 node2

Do you have any other suggestions ?

Thanks a lot

prashant-shahi · 2022-12-05T05:45:11Z

@alex-zaitsev There seems to be lack of proper docs on podDistribution with list of all possible values of each keys and their significance.

vigodeltoro · 2023-01-26T08:05:12Z

@prashant-shahi
Indeed.. and in my eyes there is a bug in circular replication..

I was able to workaround that problem with

"hardcoded" pod templates

podTemplates:
  - name: sh0-rep0-template     
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - "zone-1"

  name: sh0-rep1-template     
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - "zone-2"

   - name: sh1-rep0-template     
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - "zone-2"

   - name: sh1-rep1-template     
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - "zone-3"

   - name: sh2-rep0-template     
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - "zone-3"

   - name: sh2-rep1-template     
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - "zone-1"

But with that I'm facing issues with the podDisruptionBudgets ( #1081)

So a fix would be really helpful..

karthik-thiyagarajan · 2025-05-06T17:00:02Z

In a setup of 6 shards and 3 replica setup with two different cluster - (cluster-01 and cluster-02) and I tried maxnumberpernode as 2 and used replicaantiaffinity but still i see 6 pods are getting scheduled (3 from cluster-01 and 3 from cluster-02) which is unexpected. I used topology key as "kubernetes.io/hostname". Can someone help ?

karthik-thiyagarajan · 2025-05-20T10:55:58Z

@alex-zaitsev what do you mean by the statement "You only need ReplicaAntiAffinity". I thought we may need only shard ShardAntiAffinity which means replicas of the same shard repel each other and go away. If we do ReplicaAntiAffinity, there is still a risk of same different replicas of same shard sit on the same node. Is it not true ?

aep · 2025-05-21T15:35:22Z

according to this https://github.com/Altinity/clickhouse-operator/blob/master/docs/chi-examples/99-clickhouseinstallation-max.yaml#L506 you are correct and it should be ShardAntiAffinity not ReplicaAntiAffinity.

i tested ShardAntiAffinity and this appears to do the correct thing.
the distribution ends up being terrible anyway with multiple shards AND replicas so doing it by hand seems like the way to go

vigodeltoro mentioned this issue Jan 26, 2023

Disable automated creation of podDisruptionbudgets #1081

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with anti affinity rules #1036

Issue with anti affinity rules #1036

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Issue with anti affinity rules #1036

Issue with anti affinity rules #1036

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!