8000 Minio operator can't stop due to api-server downtime · Issue #2450 · minio/operator · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Minio operator can't stop due to api-server downtime #2450
Closed
@qjoly

Description

@qjoly

Hello Minio team,

I've noticed an abnormal behavior on my cluster: one of the minio-operator pods is stucked and can't stop. If I understand the logs correctly, this is probably due to API-Server downtime. I've also observed that the pod consumes an entire CPU in this phase (that's how I noticed it).

 kubectl top pods -n minio
NAME                                    CPU(cores)   MEMORY(bytes)
minio-operator-9d9788f54-6vz29          1005m        56Mi # This one is stuck
minio-operator-9d9788f54-zk6l6          1m           32Mi # This one works fine

Expected Behavior

The pod should stop itself normally and restart. A Liveness probe could be a potential solution.

Current Behavior

The pod can't stop itself and "eat" 1000m cpu.

I0402 13:30:27.416836       1 status.go:89] Hit conflict issue, getting latest version of tenant
I0402 13:30:29.713082       1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0402 13:30:34.746203       1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0402 13:30:40.095878       1 status.go:89] Hit conflict issue, getting latest version of tenant
I0402 13:57:30.403343       1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0402 13:57:35.439961       1 event.go:377] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio", Name:"lucca", UID:"6501a523-f3db-49fe-a80c-3d10341e66c6"
I0410 05:54:03.233143       1 monitoring.go:123] 'minio/lucca' Failed to get cluster health: Get "http://minio.minio.svc.cluster.local/minio/health/cluster": d
E0430 20:05:28.927784       1 leaderelection.go:429] Failed to update lock optimitically: Put "https://10.3.0.1:443/apis/coordination.k8s.io/v1/namespaces/mini
E0430 20:05:28.927923       1 leaderelection.go:436] error retrieving resource lock minio/minio-operator-lock: client rate limiter Wait returned an error: cont
I0430 20:05:28.927941       1 leaderelection.go:297] failed to renew lease minio/minio-operator-lock: timed out waiting for the condition
E0430 20:05:43.928907       1 leaderelection.go:322] Failed to release lock: Put "https://10.3.0.1:443/apis/coordination.k8s.io/v1/namespaces/minio/leases/mini
I0430 20:05:43.928954       1 main-controller.go:559] leader lost, removing any leader labels that I 'minio-operator-9d9788f54-6vz29' might have
I0430 20:05:53.924466       1 main-controller.go:617] Stopping the minio controller webservers
W0430 20:05:53.924504       1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Secret ended with: an error on the server (
W0430 20:05:53.924538       1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1beta1.PolicyBinding ended with: an error on
W0430 20:05:53.924587       1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Deployment ended with: an error on the serv
W0430 20:05:53.924585       1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Service ended with: an error on the server
W0430 20:05:53.924584       1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v2.Tenant ended with: an error on the server (
W0430 20:05:53.924614       1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.Pod ended with: an error on the server ("un
W0430 20:05:53.924601       1 reflector.go:484] k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: watch of *v1.StatefulSet ended with: an error on the ser
I0430 20:05:53.924652       1 main-controller.go:624] Stopping the minio controller

Steps to Reproduce (for bugs)

  1. Deploy a cluster with a single controlplane
  2. Install Minio Operator (ensure both pods are not on the cp)
  3. Restart the controlplane to create a downtime of api-server

Your Environment

Thank you for your help

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0