Closed
Description
Hello Minio team,
I've noticed an abnormal behavior on my cluster: one of the minio-operator pods is stucked and can't stop. If I understand the logs correctly, this is probably due to API-Server downtime. I've also observed that the pod consumes an entire CPU in this phase (that's how I noticed it).
kubectl top pods -n minio
NAME CPU(cores) MEMORY(bytes)
minio-operator-9d9788f54-6vz29 1005m 56Mi # This one is stuck
minio-operator-9d9788f54-zk6l6 1m 32Mi # This one works fine
Expected Behavior
The pod should stop itself normally and restart. A Liveness probe could be a potential solution.
Current Behavior
The pod can't stop itself and "eat" 1000m cpu.
I0402 13:30:27.416836 1 status.go:89] Hit conflict issue, getting latest version of tenant
I0402 13:30:29.713082 1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0402 13:30:34.746203 1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0402 13:30:40.095878 1 status.go:89] Hit conflict issue, getting latest version of tenant
I0402 13:57:30.403343 1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0402 13:57:35.439961 1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0410 05:54:03.233143 1 monitoring.go:123] 'minio/lucca' Failed to get cluster health: Get "http://minio.minio.svc.cluster.local/minio/health/cluster": d
E0430 20:05:28.927784 1 leaderelection.go:429] Failed to update lock optimitically: Put "https://10.3.0.1:443/apis/coordination.k8s.io/v1/namespaces/mini
E0430 20:05:28.927923 1 leaderelection.go:436] error retrieving resource lock minio/minio-operator-lock: client rate limiter Wait returned an error: cont
I0430 20:05:28.927941 1 leaderelection.go:297] failed to renew lease minio/minio-operator-lock: timed out waiting for the condition
E0430 20:05:43.928907 1 leaderelection.go:322] Failed to release lock: Put "https://10.3.0.1:443/apis/coordination.k8s.io/v1/namespaces/minio/leases/mini
I0430 20:05:43.928954 1 main-controller.go:559] leader lost, removing any leader labels that I 'minio-operator-9d9788f54-6vz29' might have
I0430 20:05:53.924466 1 main-controller.go:617] Stopping the minio controller webservers
W0430 20:05:53.924504 1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Secret ended with: an error on the server (
W0430 20:05:53.924538 1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1beta1.PolicyBinding ended with: an error on
W0430 20:05:53.924587 1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Deployment ended with: an error on the serv
W0430 20:05:53.924585 1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Service ended with: an error on the server
W0430 20:05:53.924584 1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v2.Tenant ended with: an error on the server (
W0430 20:05:53.924614 1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Pod ended with: an error on the server ("un
W0430 20:05:53.924601 1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.StatefulSet ended with: an error on the ser
I0430 20:05:53.924652 1 main-controller.go:624] Stopping the minio controller
Steps to Reproduce (for bugs)
- Deploy a cluster with a single controlplane
- Install Minio Operator (ensure both pods are not on the cp)
- Restart the controlplane to create a downtime of api-server
Your Environment
- Version used (
minio-operator
):quay.io/minio/operator:v7.0.0
- Environment name and version (e.g. kubernetes v1.17.2): v1.32.2
- Server type and version: Talos 1.9.3 (VM)
- Link to your deployment file: https://gist.github.com/qjoly/b96a1509d130d3902ef4957e8dba8d85
Thank you for your help