8000 Support custom node taint prefix to support scaling to zero nodes in GKE · Issue #19241 · cilium/cilium · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Support custom node taint prefix to support scaling to zero nodes in GKE #19241
Closed
@thejosephstevens

Description

@thejosephstevens

Proposal / RFE

Is your feature request related to a problem?
We run Cilium with NoSchedule cilium taints in GKE to ensure pods are started after Cilium is able to bootstrap the node. The problem is that we run multi-zone node pools, and multi-zone pools are configured in GKE as a union of multiple Cluster-Autoscaler pools under the hood, and we regularly scale the multi-zone pool down to 1 node (and consequently, several of the CA pools internally scale down to zero). Autoscaling works fine with the Cilium node taints when there are nodes in a pool, but if it scales down to zero the Cluster Autoscaler changes behavior from inspecting a running node (which doesn't have a cilium taint any more, and is eligible for app pod scheduling), so if there are insufficient resources it can easily determine that a new node would alleviate resource pressure and allow pending pods to schedule.

The problem shows up because Once there are no running nodes, the CA inspects the node group template, which includes the Cilium taint, and then the CA determines that adding a node would not allow pending pods to run because they would not tolerate the cilium taint (even though it would be removed in bootstrap by a daemonset).

There is config in the cluster autoscaler to ignore taints by way of command line argument to the CA with --ignore-taint, however that is not exposed for configuration in GKE (all they expose is "PROFILE", which is just "balanced" or "high-utilization").

There is another option in the CA though, shown here, where if you prefix a taint with ignore-taint.cluster-autoscaler.kubernetes.io/, the CA will ignore it when simulating scheduling, allowing you to scale up from zero.

Describe the solution you'd like

I would like an option to override the taint (or taint prefix) used by Cilium to manage agent readiness taints so that the CA ignores that taint when making scaling decisions, but it is still respected by the k8s scheduler when placing pods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureThis introduces new functionality.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0