cluster-monitoring is bein 8000 g constantly re-deployed #19945

moschlar · 2019-05-02T12:43:50Z

What kind of request is this (question/bug/enhancement/feature request):
Bug

Steps to reproduce (least amount of steps as possible):
I'm using single node Rancher v2.2.2 to manage a 5 node custom cluster.
Rancher has been continuously upgraded since v2.0.something.

Since v2.2.0, I've deployed cluster monitoring.

Result:
When looking at the Rancher server logs, I notice that the cluster-monitoring app constantly gets redeployed (therefore it's at release v2150 already).

Other details that may be helpful:

Rancher server log:

[main] 2019/05/02 12:34:10 Starting Tiller v2.10+unreleased (tls=false)
[main] 2019/05/02 12:34:10 GRPC listening on :44913
[main] 2019/05/02 12:34:10 Probes listening on :40971
[main] 2019/05/02 12:34:10 Storage driver is ConfigMap
[main] 2019/05/02 12:34:10 Max history per release is 0
[tiller] 2019/05/02 12:34:12 getting history for release cluster-monitoring
[storage] 2019/05/02 12:34:12 getting release history for "cluster-monitoring"
2019-05-02 12:34:13.054867 W | etcdserver: apply entries took too long [104.778757ms for 1 entries]
2019-05-02 12:34:13.054902 W | etcdserver: avoid queries with large range/delete range!
2019/05/02 12:34:19 [INFO] Handling backend connection request [c-dx942]
W0502 12:34:31.905776       6 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.ServiceAccount ended with: too old resource version: 43215153 (43215177)
W0502 12:34:37.406333       6 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1beta2.StatefulSet ended with: too old resource version: 42263581 (43215140)
[tiller] 2019/05/02 12:34:53 preparing update for cluster-monitoring
[storage] 2019/05/02 12:34:53 getting deployed releases from "cluster-monitoring" history
[storage] 2019/05/02 12:34:56 getting last revision of "cluster-monitoring"
[storage] 2019/05/02 12:34:56 getting release history for "cluster-monitoring"
2019-05-02 12:35:05.341889 W | etcdserver: apply entries took too long [159.086638ms for 1 entries]
2019-05-02 12:35:05.350246 W | etcdserver: avoid queries with large range/delete range!
I0502 12:35:08.220692       6 trace.go:76] Trace[893457890]: "List /apis/batch/v1/jobs" (started: 2019-05-02 12:35:06.995935195 +0000 UTC m=+346.801612872) (total time: 1.194511788s):
Trace[893457890]: [564.459245ms] [564.459245ms] About to List from storage
Trace[893457890]: [1.155893531s] [591.434286ms] Listing from storage done
I0502 12:35:08.693135       6 trace.go:76] Trace[1148687621]: "Get /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:35:07.123719087 +0000 UTC m=+346.929396864) (total time: 1.569368849s):
Trace[1148687621]: [922.22788ms] [922.22788ms] About to Get from storage
Trace[1148687621]: [1.560869836s] [638.641956ms] About to write a response
I0502 12:35:08.799540       6 trace.go:76] Trace[604434125]: "Get /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:35:06.995889195 +0000 UTC m=+346.801566972) (total time: 1.8035294s):
Trace[604434125]: [1.025679936s] [1.025679936s] About to Get from storage
Trace[604434125]: [1.697771742s] [672.091806ms] About to write a response
I0502 12:35:09.247551       6 trace.go:76] Trace[999981084]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:35:08.54257431 +0000 UTC m=+348.348259587) (total time: 704.921156ms):
Trace[999981084]: [681.879421ms] [517.404074ms] About to write a response
2019-05-02 12:35:09.706738 W | etcdserver: apply entries took too long [330.468894ms for 1 entries]
2019-05-02 12:35:09.709777 W | etcdserver: avoid queries with large range/delete range!
2019-05-02 12:35:10.265190 W | etcdserver: apply entries took too long [443.163764ms for 1 entries]
2019-05-02 12:35:10.265260 W | etcdserver: avoid queries with large range/delete range!
I0502 12:35:10.365530       6 trace.go:76] Trace[1924322950]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:35:09.143987711 +0000 UTC m=+348.949667488) (total time: 1.081415118s):
Trace[1924322950]: [898.620045ms] [808.17711ms] Transaction committed
Trace[1924322950]: [1.081415118s] [182.795073ms] END
I0502 12:35:10.366403       6 trace.go:76] Trace[1092698666]: "Update /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:35:09.066767795 +0000 UTC m=+348.872448072) (total time: 1.299569445s):
Trace[1092698666]: [1.299236345s] [1.235414149s] Object stored in database
I0502 12:35:10.367237       6 trace.go:76] Trace[1724272058]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-05-02 12:35:09.144949612 +0000 UTC m=+348.950629689) (total time: 1.22223483s):
Trace[1724272058]: [1.222051729s] [1.12942169s] Transaction committed
I0502 12:35:10.367527       6 trace.go:76] Trace[209556531]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:35:09.066614995 +0000 UTC m=+348.872294272) (total time: 1.300868647s):
Trace[209556531]: [1.300667547s] [1.240385357s] Object stored in database
I0502 12:35:11.331300       6 trace.go:76] Trace[842726899]: "Get /api/v1/namespaces/default" (started: 2019-05-02 12:35:10.350041516 +0000 UTC m=+350.155721493) (total time: 977.166962ms):
Trace[842726899]: [974.508258ms] [974.488858ms] About to write a response
I0502 12:35:35.141236       6 trace.go:76] Trace[1085038790]: "Get /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:35:34.539153719 +0000 UTC m=+374.344836696) (total time: 524.029284ms):
Trace[1085038790]: [518.333576ms] [518.054976ms] About to write a response
[tiller] 2019/05/02 12:35:45 rendering rancher-monitoring chart using values
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/metrics-service.yaml" is empty. Skipping.
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/rbac.yaml" is empty. Skipping.
2019-05-02 12:35:45.639852 W | etcdserver: apply entries took too long [256.017983ms for 1 entries]
2019-05-02 12:35:45.642692 W | etcdserver: avoid queries with large range/delete range!
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/deployment.yaml" is empty. Skipping.
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/servicemonitor.yaml" is empty. Skipping.
2019/05/02 12:35:45 info: manifest "rancher-monitoring/charts/grafana/templates/rbac.yaml" is empty. Skipping.
[tiller] 2019/05/02 12:35:46 creating updated release for cluster-monitoring
[storage] 2019/05/02 12:35:46 creating release "cluster-monitoring.v2151"
[tiller] 2019/05/02 12:35:47 performing update for cluster-monitoring
[tiller] 2019/05/02 12:35:47 executing 0 pre-upgrade hooks for cluster-monitoring
[tiller] 2019/05/02 12:35:47 hooks complete for pre-upgrade cluster-monitoring
[kube] 2019/05/02 12:35:47 building resources from updated manifest
[kube] 2019/05/02 12:35:47 checking 45 resources for changes
[kube] 2019/05/02 12:35:47 Looks like there are no changes for Secret "prometheus-cluster-monitoring-additional-scrape-configs"
[kube] 2019/05/02 12:35:48 Looks like there are no changes for Secret "prometheus-cluster-monitoring-additional-alertmanager-configs"
[kube] 2019/05/02 12:35:50 Looks like there are no changes for ConfigMap "grafana-cluster-monitoring-dashboards"
[kube] 2019/05/02 12:35:50 Looks like there are no changes for ConfigMap "grafana-cluster-monitoring-nginx"
[kube] 2019/05/02 12:35:51 Looks like there are no changes for ConfigMap "grafana-cluster-monitoring-provisionings"
[kube] 2019/05/02 12:35:51 Looks like there are no changes for ConfigMap "prometheus-cluster-monitoring-nginx"
[kube] 2019/05/02 12:35:51 Looks like there are no changes for PersistentVolumeClaim "grafana-cluster-monitoring"
[kube] 2019/05/02 12:35:52 Looks like there are no changes for ServiceAccount "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:35:52 Looks like there are no changes for ServiceAccount "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:35:53 Looks like there are no changes for ServiceAccount "cluster-monitoring"
I0502 12:35:54.754151       6 trace.go:76] Trace[1816935957]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:35:54.087286572 +0000 UTC m=+393.892969649) (total time: 536.721403ms):
Trace[1816935957]: [472.985508ms] [469.552603ms] Object stored in database
2019-05-02 12:35:54.796815 W | etcdserver: apply entries took too long [120.14968ms for 1 entries]
2019-05-02 12:35:54.796864 W | etcdserver: avoid queries with large range/delete range!
I0502 12:35:55.052895       6 trace.go:76] Trace[1322689897]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:35:53.957728978 +0000 UTC m=+393.763407055) (total time: 1.095075639s):
Trace[1322689897]: [1.021682329s] [964.783743ms] About to write a response
[kube] 2019/05/02 12:35:55 Looks like there are no changes for ClusterRole "exporter-kube-state-cluster-monitoring"
I0502 12:35:57.849154       6 trace.go:76] Trace[1245451139]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:35:56.540114743 +0000 UTC m=+396.345793420) (total time: 1.308965558s):
Trace[1245451139]: [1.179745265s] [1.051463173s] About to write a response
[kube] 2019/05/02 12:35:58 Looks like there are no changes for ClusterRole "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:35:58 Looks like there are no changes for ClusterRole "prometheus-cluster-monitoring-cattle-prometheus"
[kube] 2019/05/02 12:35:59 Looks like there are no changes for ClusterRoleBinding "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:35:59 Looks like there are no changes for ClusterRoleBinding "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:35:59 Looks like there are no changes for ClusterRoleBinding "prometheus-cluster-monitoring-cattle-prometheus"
2019-05-02 12:36:00.978950 W | etcdserver: apply entries took too long [524.338685ms for 1 entries]
2019-05-02 12:36:00.978996 W | etcdserver: avoid queries with large range/delete range!
[kube] 2019/05/02 12:36:01 Looks like there are no changes for Service "expose-kube-cm-metrics"
[kube] 2019/05/02 12:36:02 Looks like there are no changes for Service "expose-kube-etcd-metrics"
[kube] 2019/05/02 12:36:02 Looks like there are no changes for Service "expose-kube-scheduler-metrics"
[kube] 2019/05/02 12:36:03 Looks like there are no changes for Service "expose-kubernetes-metrics"
[kube] 2019/05/02 12:36:03 Looks like there are no changes for Service "expose-node-metrics"
[kube] 2019/05/02 12:36:04 Looks like there are no changes for Service "expose-grafana-metrics"
[kube] 2019/05/02 12:36:04 Looks like there are no changes for Service "access-grafana"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for Service "expose-prometheus-metrics"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for Service "access-prometheus"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for DaemonSet "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for Deployment "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:36:06 Looks like there are no changes for Deployment "grafana-cluster-monitoring"
[kube] 2019/05/02 12:36:06 Looks like there are no changes for Endpoints "expose-kube-cm-metrics"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for Endpoints "expose-kube-scheduler-metrics"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for Prometheus "cluster-monitoring"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for PrometheusRule "exporter-kube-scheduler-cluster-monitoring"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for PrometheusRule "exporter-kubernetes-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for PrometheusRule "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for ServiceMonitor "exporter-fluentd-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for ServiceMonitor "exporter-kube-controller-manager-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for ServiceMonitor "exporter-kube-scheduler-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-kubelets-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-kubernetes-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-node-cluster-monitoring"
2019-05-02 12:36:11.087649 W | etcdserver: apply entries took too long [255.054882ms for 1 entries]
2019-05-02 12:36:11.087760 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:13.540290       6 trace.go:76] Trace[998317708]: "List /apis/batch/v1/jobs" (started: 2019-05-02 12:36:11.265347876 +0000 UTC m=+411.071028853) (total time: 2.274783903s):
Trace[998317708]: [2.274543103s] [2.274524303s] Listing from storage done
I0502 12:36:13.636303       6 trace.go:76] Trace[714306574]: "Get /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:36:10.611340097 +0000 UTC m=+410.417025374) (total time: 3.024863826s):
Trace[714306574]: [3.023739524s] [2.981793561s] About to write a response
I0502 12:36:13.663983       6 trace.go:76] Trace[1488463856]: "Get /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:36:11.237425734 +0000 UTC m=+411.043105011) (total time: 2.426462531s):
Trace[1488463856]: [2.423544226s] [2.423420026s] About to write a response
I0502 12:36:13.814720       6 trace.go:76] Trace[1710684550]: "List /api/v1/nodes" (started: 2019-05-02 12:36:13.195849664 +0000 UTC m=+413.001534941) (total time: 618.811126ms):
Trace[1710684550]: [527.074889ms] [527.040989ms] Listing from storage done
2019-05-02 12:36:15.013718 W | etcdserver: apply entries took too long [874.423208ms for 1 entries]
2019-05-02 12:36:15.014455 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:15.034090       6 trace.go:76] Trace[87873255]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:36:14.020251798 +0000 UTC m=+413.825937475) (total time: 1.013777217s):
Trace[87873255]: [1.013607016s] [953.940527ms] Transaction committed
I0502 12:36:15.034747       6 trace.go:76] Trace[886827659]: "Update /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:36:13.87457748 +0000 UTC m=+413.680255157) (total time: 1.159748835s):
Trace[886827659]: [1.159573735s] [1.101642749s] Object stored in database
2019-05-02 12:36:15.166229 W | etcdserver: apply entries took too long [151.702427ms for 1 entries]
2019-05-02 12:36:15.176109 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:15.266880       6 trace.go:76] Trace[1573627939]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-05-02 12:36:14.075586781 +0000 UTC m=+413.881264258) (total time: 1.189276679s):
Trace[1573627939]: [1.17634186s] [1.173612656s] Transaction committed
I0502 12:36:15.342717       6 trace.go:76] Trace[317205404]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:36:14.026507307 +0000 UTC m=+413.832192884) (total time: 1.240698256s):
Trace[317205404]: [1.240421356s] [1.232373944s] Object stored in database
I0502 12:36:15.631714       6 trace.go:76] Trace[1318689638]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:36:11.496155121 +0000 UTC m=+411.301836098) (total time: 4.133583785s):
Trace[1318689638]: [3.975911349s] [3.975764249s] About to write a response
[kube] 2019/05/02 12:36:16 Looks like there are no changes for ServiceMonitor "grafana-cluster-monitoring"
I0502 12:36:17.116550       6 trace.go:76] Trace[133531972]: "Get /api/v1/namespaces/default" (started: 2019-05-02 12:36:16.027424101 +0000 UTC m=+415.833109678) (total time: 1.088996029s):
Trace[133531972]: [1.088745629s] [1.087600627s] About to write a response
2019-05-02 12:36:18.770343 W | etcdserver: apply entries took too long [156.701835ms for 1 entries]
2019-05-02 12:36:18.770418 W | etcdserver: avoid queries with large range/delete range!
2019-05-02 12:36:20.105068 W | etcdserver: apply entries took too long [782.09627ms for 1 entries]
2019-05-02 12:36:20.105261 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:20.131518       6 trace.go:76] Trace[552466000]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:36:18.110900618 +0000 UTC m=+417.916585995) (total time: 1.998022589s):
Trace[552466000]: [418.066526ms] [418.066526ms] initial value restored
Trace[552466000]: [1.997914789s] [1.57775636s] Transaction committed
I0502 12:36:20.132838       6 trace.go:76] Trace[843006566]: "Update /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:36:18.109459316 +0000 UTC m=+417.915144593) (total time: 2.023285527s):
Trace[843006566]: [2.022204225s] [2.021761724s] Object stored in database
I0502 12:36:20.151601       6 trace.go:76] Trace[1990736941]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:36:17.541201466 +0000 UTC m=+417.346887543) (total time: 2.610275105s):
Trace[1990736941]: [393.783989ms] [393.783989ms] initial value restored
Trace[1990736941]: [1.676912109s] [1.28312812s] Transaction prepared
Trace[1990736941]: [2.610017105s] [933.104996ms] Transaction committed
I0502 12:36:20.168575       6 trace
8000
.go:76] Trace[1216243324]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-05-02 12:36:19.437285703 +0000 UTC m=+419.242971180) (total time: 729.953592ms):
Trace[1216243324]: [727.483588ms] [725.545586ms] Transaction committed
I0502 12:36:20.210958       6 trace.go:76] Trace[752902646]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:36:19.391714734 +0000 UTC m=+419.197392811) (total time: 818.107324ms):
Trace[752902646]: [777.008563ms] [776.772762ms] Object stored in database
I0502 12:36:20.499649       6 trace.go:76] Trace[301300681]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:36:19.476764362 +0000 UTC m=+419.282526139) (total time: 1.02270943s):
Trace[301300681]: [825.505935ms] [825.379235ms] About to write a response
Trace[301300681]: [1.02270943s] [197.203495ms] END
I0502 12:36:20.551690       6 trace.go:76] Trace[1520707606]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:36:19.359289586 +0000 UTC m=+419.164970663) (total time: 1.192315184s):
Trace[1520707606]: [1.152026023s] [1.151875723s] About to write a response
[kube] 2019/05/02 12:36:21 Looks like there are no changes for ServiceMonitor "prometheus-cluster-monitoring"
2019-05-02 12:36:21.348268 W | etcdserver: apply entries took too long [153.826331ms for 1 entries]
2019-05-02 12:36:21.348858 W | etcdserver: avoid queries with large range/delete range!
[tiller] 2019/05/02 12:36:22 executing 0 post-upgrade hooks for cluster-monitoring
[tiller] 2019/05/02 12:36:22 hooks complete for post-upgrade cluster-monitoring
[storage] 2019/05/02 12:36:22 updating release "cluster-monitoring.v2150"

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): v2.2.2
Installation option (single install/HA): single install

Cluster information

Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Custom
Machine type (cloud/VM/metal) and specifications (CPU/memory): 5x VM, 6 CPUs, 12 GiB
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:30:48Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Docker version (use docker version):

Client:
 Version:           18.09.5
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        e8ff056dbc
 Built:             Thu Apr 11 04:44:28 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.4
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       d14af54
  Built:            Wed Mar 27 18:01:48 2019
  OS/Arch:          linux/amd64
  Experimental:     false

The text was updated successfully, but these errors were encountered:

XzenTorXz · 2019-05-02T13:59:58Z

We have simular issues with other apps (https://github.com/getsentry/sentry/). The deployment takes a long time and then it seems like a timelimit hit (after 5 minutes), and it starts to redeploy the whole app (I never noticed this on previous version).

happydenn · 2019-05-06T17:39:40Z

We have also experienced the same issue, here's a screenshot of the ConfigMaps resulted from redeploying:

Fresh Rancher 2.2.2 install
Kubernetes 1.13.5
Docker 18.09.5

Oats87 · 2019-05-08T20:22:06Z

Encountered this on a 2.2.2 environment. This is bad, because it will very quickly cause the environment to become non-functional for the system project for that specific cluster.

jiaqiluo · 2019-05-10T00:13:46Z

Got the bug reproduced on v2.2.2

Steps:

add a cluster with 3 etcd nodes, 2 control plane nodes and any number of worker nodes
enable the cluster monitoring

Result:

right after enabling the cluster monitoring, the app cluster-monitoring is deployed 6 times.

jiaqiluo · 2019-05-10T17:13:37Z

The bug fix is validated on Rancher: master 5d74988

Steps:

add a cluster with 3 etcd nodes, 2 control plane nodes and any number of worker nodes
enable the cluster monitoring
enable the project monitoring

Result:

notice that the app cluster-monitoring and project-monitoring are deployed only one time.

Test 2:

run Rancher: v2.2.2
add two identical clusters and each has 3 etcd nodes, 2 control plane nodes and any number of worker nodes
on cluster1 enable the cluster monitoring
upgrade Rancher to v2.2.3-rc8
check if the cluster-monitoring app gets re-deployed multiple times
on cluster1 enable the project monitoring
on cluster2 enable the cluster monitoring and the project monitoring

Result:

the app cluster-monitoring and project-monitoring are deployed only one time.

jiaqiluo · 2019-05-10T21:18:28Z

The issue is not fixed becau 8000 se the app cluster-monitoring still gets redeployed several times when the cluster monitoring is enabled.

We also see that whenever we deploy apps and workloads, the cluster-monitoring also get re-deployed.

Problem: Monitoring get redeployed due to etcd params updated. However, it is not expected as the etcd address doesn't change at all. Solution: Sort it before assign. Issue: #19945

cjellick · 2019-05-14T15:58:38Z

fixed in master

jiaqiluo · 2019-05-15T00:21:01Z

The bug fix is validated on Rancher: master beffccd

Steps:

add several different kinds of clusters, in my case they are EC2 cluster, custom RKE cluster, imported GKE cluster
enable the cluster monitoring
deploy some workloads and apps
run automation tests on the cluster
leave the cluster idle for a while

Result:

notice that the app cluster-monitoring and project-monitoring are deployed only one time.

manarhusrieh · 2019-07-07T06:57:33Z

Will this fix be released in version 2.2.x or we will wait for this until 2.3.x? As we are currently unable to deploy monitoring to our cluster.

spencergilbert · 2019-07-08T11:49:29Z

@manarhusrieh this has been resolved for us, we're on 2.2.4 but i believe since 2.2.2

manarhusrieh · 2019-07-11T11:39:05Z

@spencergilbert We are currently running 2.2.2 and the problem exists. We will apply the patch 2.2.5 once released.

manarhusrieh · 2019-07-22T06:27:27Z

Just to confirm the problem is resolved on patch 2.2.5.

cjellick added [zube]: To Triage and removed [zube]: To Triage labels May 7, 2019

deniseschannon added this to the v2.2.3 milestone May 8, 2019

deniseschannon added the kind/bug Issues that are defects reported by users or that we know have reached a real release label May 8, 2019

cjellick added the team/cn label May 8, 2019

loganhz assigned orangedeng and thxCode May 9, 2019

orangedeng mentioned this issue May 9, 2019

Sort address in monitoring app answers #20103

Merged

orangedeng added status/working labels May 9, 2019

loganhz added the status/resolved label May 9, 2019

loganhz mentioned this issue May 9, 2019

cluster-monitoring config maps not deleted #18986

Closed

alena1108 mentioned this issue May 9, 2019

Backport: cluster-monitoring is being constantly re-deployed #20111

Closed

alena1108 modified the milestones: v2.2.3, v2.3 May 9, 2019

alena1108 added [zube]: To Test labels May 9, 2019

alena1108 assigned sangeethah May 9, 2019

alena1108 added the status/triaged label May 9, 2019

sangeethah assigned jiaqiluo and unassigned sangeethah May 9, 2019

jiaqiluo closed this as completed May 10, 2019

zube bot added [zube]: Done and removed [zube]: To Test labels May 10, 2019

jiaqiluo reopened this May 10, 2019

zube bot added [zube]: To Triage and removed [zube]: Done labels May 10, 2019

jiaqiluo added [zube]: Reopened and removed status/resolved labels May 10, 2019

zube bot removed the [zube]: To Triage label May 10, 2019

jiaqiluo removed status/ready-for-review labels May 10, 2019

This was referenced May 11, 2019

Sorting etcd tls config before setup the config #20170

Merged

[2.2.2-patch]Sorting etcd tls config before setup the config #20171

Merged

[2.2.3]Sorting etcd tls config before setup the config #20172

Merged

loganhz mentioned this issue May 14, 2019

Monitoring - the app cluster-monitoring is stuck on installing state #20166

Closed

cjellick added the [zube]: To Test label May 14, 2019

zube bot removed the [zube]: Reopened label May 14, 2019

jiaqiluo closed this as completed May 15, 2019

zube bot added [zube]: Done and removed [zube]: To Test labels May 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cluster-monitoring is bein 8000 g constantly re-deployed #19945

cluster-monitoring is being constantly re-deployed #19945

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cluster-monitoring is bein 8000 g constantly re-deployed #19945

cluster-monitoring is being constantly re-deployed #19945

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!