8000 cluster-monitoring is being constantly re-deployed · Issue #19945 · rancher/rancher · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cluster-monitoring is bein 8000 g constantly re-deployed #19945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
moschlar opened this issue May 2, 2019 · 12 comments
Closed

cluster-monitoring is being constantly re-deployed #19945

moschlar opened this issue May 2, 2019 · 12 comments
Assignees
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release
Milestone

Comments

@moschlar
Copy link
moschlar commented May 2, 2019

What kind of request is this (question/bug/enhancement/feature request):
Bug

Steps to reproduce (least amount of steps as possible):
I'm using single node Rancher v2.2.2 to manage a 5 node custom cluster.
Rancher has been continuously upgraded since v2.0.something.

Since v2.2.0, I've deployed cluster monitoring.

Result:
When looking at the Rancher server logs, I notice that the cluster-monitoring app constantly gets redeployed (therefore it's at release v2150 already).

Other details that may be helpful:

Rancher server log:

[main] 2019/05/02 12:34:10 Starting Tiller v2.10+unreleased (tls=false)
[main] 2019/05/02 12:34:10 GRPC listening on :44913
[main] 2019/05/02 12:34:10 Probes listening on :40971
[main] 2019/05/02 12:34:10 Storage driver is ConfigMap
[main] 2019/05/02 12:34:10 Max history per release is 0
[tiller] 2019/05/02 12:34:12 getting history for release cluster-monitoring
[storage] 2019/05/02 12:34:12 getting release history for "cluster-monitoring"
2019-05-02 12:34:13.054867 W | etcdserver: apply entries took too long [104.778757ms for 1 entries]
2019-05-02 12:34:13.054902 W | etcdserver: avoid queries with large range/delete range!
2019/05/02 12:34:19 [INFO] Handling backend connection request [c-dx942]
W0502 12:34:31.905776       6 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.ServiceAccount ended with: too old resource version: 43215153 (43215177)
W0502 12:34:37.406333       6 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1beta2.StatefulSet ended with: too old resource version: 42263581 (43215140)
[tiller] 2019/05/02 12:34:53 preparing update for cluster-monitoring
[storage] 2019/05/02 12:34:53 getting deployed releases from "cluster-monitoring" history
[storage] 2019/05/02 12:34:56 getting last revision of "cluster-monitoring"
[storage] 2019/05/02 12:34:56 getting release history for "cluster-monitoring"
2019-05-02 12:35:05.341889 W | etcdserver: apply entries took too long [159.086638ms for 1 entries]
2019-05-02 12:35:05.350246 W | etcdserver: avoid queries with large range/delete range!
I0502 12:35:08.220692       6 trace.go:76] Trace[893457890]: "List /apis/batch/v1/jobs" (started: 2019-05-02 12:35:06.995935195 +0000 UTC m=+346.801612872) (total time: 1.194511788s):
Trace[893457890]: [564.459245ms] [564.459245ms] About to List from storage
Trace[893457890]: [1.155893531s] [591.434286ms] Listing from storage done
I0502 12:35:08.693135       6 trace.go:76] Trace[1148687621]: "Get /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:35:07.123719087 +0000 UTC m=+346.929396864) (total time: 1.569368849s):
Trace[1148687621]: [922.22788ms] [922.22788ms] About to Get from storage
Trace[1148687621]: [1.560869836s] [638.641956ms] About to write a response
I0502 12:35:08.799540       6 trace.go:76] Trace[604434125]: "Get /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:35:06.995889195 +0000 UTC m=+346.801566972) (total time: 1.8035294s):
Trace[604434125]: [1.025679936s] [1.025679936s] About to Get from storage
Trace[604434125]: [1.697771742s] [672.091806ms] About to write a response
I0502 12:35:09.247551       6 trace.go:76] Trace[999981084]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:35:08.54257431 +0000 UTC m=+348.348259587) (total time: 704.921156ms):
Trace[999981084]: [681.879421ms] [517.404074ms] About to write a response
2019-05-02 12:35:09.706738 W | etcdserver: apply entries took too long [330.468894ms for 1 entries]
2019-05-02 12:35:09.709777 W | etcdserver: avoid queries with large range/delete range!
2019-05-02 12:35:10.265190 W | etcdserver: apply entries took too long [443.163764ms for 1 entries]
2019-05-02 12:35:10.265260 W | etcdserver: avoid queries with large range/delete range!
I0502 12:35:10.365530       6 trace.go:76] Trace[1924322950]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:35:09.143987711 +0000 UTC m=+348.949667488) (total time: 1.081415118s):
Trace[1924322950]: [898.620045ms] [808.17711ms] Transaction committed
Trace[1924322950]: [1.081415118s] [182.795073ms] END
I0502 12:35:10.366403       6 trace.go:76] Trace[1092698666]: "Update /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:35:09.066767795 +0000 UTC m=+348.872448072) (total time: 1.299569445s):
Trace[1092698666]: [1.299236345s] [1.235414149s] Object stored in database
I0502 12:35:10.367237       6 trace.go:76] Trace[1724272058]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-05-02 12:35:09.144949612 +0000 UTC m=+348.950629689) (total time: 1.22223483s):
Trace[1724272058]: [1.222051729s] [1.12942169s] Transaction committed
I0502 12:35:10.367527       6 trace.go:76] Trace[209556531]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:35:09.066614995 +0000 UTC m=+348.872294272) (total time: 1.300868647s):
Trace[209556531]: [1.300667547s] [1.240385357s] Object stored in database
I0502 12:35:11.331300       6 trace.go:76] Trace[842726899]: "Get /api/v1/namespaces/default" (started: 2019-05-02 12:35:10.350041516 +0000 UTC m=+350.155721493) (total time: 977.166962ms):
Trace[842726899]: [974.508258ms] [974.488858ms] About to write a response
I0502 12:35:35.141236       6 trace.go:76] Trace[1085038790]: "Get /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:35:34.539153719 +0000 UTC m=+374.344836696) (total time: 524.029284ms):
Trace[1085038790]: [518.333576ms] [518.054976ms] About to write a response
[tiller] 2019/05/02 12:35:45 rendering rancher-monitoring chart using values
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/metrics-service.yaml" is empty. Skipping.
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/rbac.yaml" is empty. Skipping.
2019-05-02 12:35:45.639852 W | etcdserver: apply entries took too long [256.017983ms for 1 entries]
2019-05-02 12:35:45.642692 W | etcdserver: avoid queries with large range/delete range!
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/deployment.yaml" is empty. Skipping.
2019/05/02 12:35:45 info: manifest "rancher-monitoring/templates/servicemonitor.yaml" is empty. Skipping.
2019/05/02 12:35:45 info: manifest "rancher-monitoring/charts/grafana/templates/rbac.yaml" is empty. Skipping.
[tiller] 2019/05/02 12:35:46 creating updated release for cluster-monitoring
[storage] 2019/05/02 12:35:46 creating release "cluster-monitoring.v2151"
[tiller] 2019/05/02 12:35:47 performing update for cluster-monitoring
[tiller] 2019/05/02 12:35:47 executing 0 pre-upgrade hooks for cluster-monitoring
[tiller] 2019/05/02 12:35:47 hooks complete for pre-upgrade cluster-monitoring
[kube] 2019/05/02 12:35:47 building resources from updated manifest
[kube] 2019/05/02 12:35:47 checking 45 resources for changes
[kube] 2019/05/02 12:35:47 Looks like there are no changes for Secret "prometheus-cluster-monitoring-additional-scrape-configs"
[kube] 2019/05/02 12:35:48 Looks like there are no changes for Secret "prometheus-cluster-monitoring-additional-alertmanager-configs"
[kube] 2019/05/02 12:35:50 Looks like there are no changes for ConfigMap "grafana-cluster-monitoring-dashboards"
[kube] 2019/05/02 12:35:50 Looks like there are no changes for ConfigMap "grafana-cluster-monitoring-nginx"
[kube] 2019/05/02 12:35:51 Looks like there are no changes for ConfigMap "grafana-cluster-monitoring-provisionings"
[kube] 2019/05/02 12:35:51 Looks like there are no changes for ConfigMap "prometheus-cluster-monitoring-nginx"
[kube] 2019/05/02 12:35:51 Looks like there are no changes for PersistentVolumeClaim "grafana-cluster-monitoring"
[kube] 2019/05/02 12:35:52 Looks like there are no changes for ServiceAccount "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:35:52 Looks like there are no changes for ServiceAccount "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:35:53 Looks like there are no changes for ServiceAccount "cluster-monitoring"
I0502 12:35:54.754151       6 trace.go:76] Trace[1816935957]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:35:54.087286572 +0000 UTC m=+393.892969649) (total time: 536.721403ms):
Trace[1816935957]: [472.985508ms] [469.552603ms] Object stored in database
2019-05-02 12:35:54.796815 W | etcdserver: apply entries took too long [120.14968ms for 1 entries]
2019-05-02 12:35:54.796864 W | etcdserver: avoid queries with large range/delete range!
I0502 12:35:55.052895       6 trace.go:76] Trace[1322689897]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:35:53.957728978 +0000 UTC m=+393.763407055) (total time: 1.095075639s):
Trace[1322689897]: [1.021682329s] [964.783743ms] About to write a response
[kube] 2019/05/02 12:35:55 Looks like there are no changes for ClusterRole "exporter-kube-state-cluster-monitoring"
I0502 12:35:57.849154       6 trace.go:76] Trace[1245451139]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:35:56.540114743 +0000 UTC m=+396.345793420) (total time: 1.308965558s):
Trace[1245451139]: [1.179745265s] [1.051463173s] About to write a response
[kube] 2019/05/02 12:35:58 Looks like there are no changes for ClusterRole "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:35:58 Looks like there are no changes for ClusterRole "prometheus-cluster-monitoring-cattle-prometheus"
[kube] 2019/05/02 12:35:59 Looks like there are no changes for ClusterRoleBinding "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:35:59 Looks like there are no changes for ClusterRoleBinding "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:35:59 Looks like there are no changes for ClusterRoleBinding "prometheus-cluster-monitoring-cattle-prometheus"
2019-05-02 12:36:00.978950 W | etcdserver: apply entries took too long [524.338685ms for 1 entries]
2019-05-02 12:36:00.978996 W | etcdserver: avoid queries with large range/delete range!
[kube] 2019/05/02 12:36:01 Looks like there are no changes for Service "expose-kube-cm-metrics"
[kube] 2019/05/02 12:36:02 Looks like there are no changes for Service "expose-kube-etcd-metrics"
[kube] 2019/05/02 12:36:02 Looks like there are no changes for Service "expose-kube-scheduler-metrics"
[kube] 2019/05/02 12:36:03 Looks like there are no changes for Service "expose-kubernetes-metrics"
[kube] 2019/05/02 12:36:03 Looks like there are no changes for Service "expose-node-metrics"
[kube] 2019/05/02 12:36:04 Looks like there are no changes for Service "expose-grafana-metrics"
[kube] 2019/05/02 12:36:04 Looks like there are no changes for Service "access-grafana"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for Service "expose-prometheus-metrics"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for Service "access-prometheus"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for DaemonSet "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:36:05 Looks like there are no changes for Deployment "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:36:06 Looks like there are no changes for Deployment "grafana-cluster-monitoring"
[kube] 2019/05/02 12:36:06 Looks like there are no changes for Endpoints "expose-kube-cm-metrics"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for Endpoints "expose-kube-scheduler-metrics"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for Prometheus "cluster-monitoring"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for PrometheusRule "exporter-kube-scheduler-cluster-monitoring"
[kube] 2019/05/02 12:36:07 Looks like there are no changes for PrometheusRule "exporter-kubernetes-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for PrometheusRule "exporter-node-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for ServiceMonitor "exporter-fluentd-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for ServiceMonitor "exporter-kube-controller-manager-cluster-monitoring"
[kube] 2019/05/02 12:36:08 Looks like there are no changes for ServiceMonitor "exporter-kube-scheduler-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-kube-state-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-kubelets-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-kubernetes-cluster-monitoring"
[kube] 2019/05/02 12:36:09 Looks like there are no changes for ServiceMonitor "exporter-node-cluster-monitoring"
2019-05-02 12:36:11.087649 W | etcdserver: apply entries took too long [255.054882ms for 1 entries]
2019-05-02 12:36:11.087760 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:13.540290       6 trace.go:76] Trace[998317708]: "List /apis/batch/v1/jobs" (started: 2019-05-02 12:36:11.265347876 +0000 UTC m=+411.071028853) (total time: 2.274783903s):
Trace[998317708]: [2.274543103s] [2.274524303s] Listing from storage done
I0502 12:36:13.636303       6 trace.go:76] Trace[714306574]: "Get /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:36:10.611340097 +0000 UTC m=+410.417025374) (total time: 3.024863826s):
Trace[714306574]: [3.023739524s] [2.981793561s] About to write a response
I0502 12:36:13.663983       6 trace.go:76] Trace[1488463856]: "Get /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:36:11.237425734 +0000 UTC m=+411.043105011) (total time: 2.426462531s):
Trace[1488463856]: [2.423544226s] [2.423420026s] About to write a response
I0502 12:36:13.814720       6 trace.go:76] Trace[1710684550]: "List /api/v1/nodes" (started: 2019-05-02 12:36:13.195849664 +0000 UTC m=+413.001534941) (total time: 618.811126ms):
Trace[1710684550]: [527.074889ms] [527.040989ms] Listing from storage done
2019-05-02 12:36:15.013718 W | etcdserver: apply entries took too long [874.423208ms for 1 entries]
2019-05-02 12:36:15.014455 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:15.034090       6 trace.go:76] Trace[87873255]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:36:14.020251798 +0000 UTC m=+413.825937475) (total time: 1.013777217s):
Trace[87873255]: [1.013607016s] [953.940527ms] Transaction committed
I0502 12:36:15.034747       6 trace.go:76] Trace[886827659]: "Update /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:36:13.87457748 +0000 UTC m=+413.680255157) (total time: 1.159748835s):
Trace[886827659]: [1.159573735s] [1.101642749s] Object stored in database
2019-05-02 12:36:15.166229 W | etcdserver: apply entries took too long [151.702427ms for 1 entries]
2019-05-02 12:36:15.176109 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:15.266880       6 trace.go:76] Trace[1573627939]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-05-02 12:36:14.075586781 +0000 UTC m=+413.881264258) (total time: 1.189276679s):
Trace[1573627939]: [1.17634186s] [1.173612656s] Transaction committed
I0502 12:36:15.342717       6 trace.go:76] Trace[317205404]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:36:14.026507307 +0000 UTC m=+413.832192884) (total time: 1.240698256s):
Trace[317205404]: [1.240421356s] [1.232373944s] Object stored in database
I0502 12:36:15.631714       6 trace.go:76] Trace[1318689638]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:36:11.496155121 +0000 UTC m=+411.301836098) (total time: 4.133583785s):
Trace[1318689638]: [3.975911349s] [3.975764249s] About to write a response
[kube] 2019/05/02 12:36:16 Looks like there are no changes for ServiceMonitor "grafana-cluster-monitoring"
I0502 12:36:17.116550       6 trace.go:76] Trace[133531972]: "Get /api/v1/namespaces/default" (started: 2019-05-02 12:36:16.027424101 +0000 UTC m=+415.833109678) (total time: 1.088996029s):
Trace[133531972]: [1.088745629s] [1.087600627s] About to write a response
2019-05-02 12:36:18.770343 W | etcdserver: apply entries took too long [156.701835ms for 1 entries]
2019-05-02 12:36:18.770418 W | etcdserver: avoid queries with large range/delete range!
2019-05-02 12:36:20.105068 W | etcdserver: apply entries took too long [782.09627ms for 1 entries]
2019-05-02 12:36:20.105261 W | etcdserver: avoid queries with large range/delete range!
I0502 12:36:20.131518       6 trace.go:76] Trace[552466000]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:36:18.110900618 +0000 UTC m=+417.916585995) (total time: 1.998022589s):
Trace[552466000]: [418.066526ms] [418.066526ms] initial value restored
Trace[552466000]: [1.997914789s] [1.57775636s] Transaction committed
I0502 12:36:20.132838       6 trace.go:76] Trace[843006566]: "Update /api/v1/namespaces/kube-system/endpoints/kube-scheduler" (started: 2019-05-02 12:36:18.109459316 +0000 UTC m=+417.915144593) (total time: 2.023285527s):
Trace[843006566]: [2.022204225s] [2.021761724s] Object stored in database
I0502 12:36:20.151601       6 trace.go:76] Trace[1990736941]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2019-05-02 12:36:17.541201466 +0000 UTC m=+417.346887543) (total time: 2.610275105s):
Trace[1990736941]: [393.783989ms] [393.783989ms] initial value restored
Trace[1990736941]: [1.676912109s] [1.28312812s] Transaction prepared
Trace[1990736941]: [2.610017105s] [933.104996ms] Transaction committed
I0502 12:36:20.168575       6 trace
8000
.go:76] Trace[1216243324]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-05-02 12:36:19.437285703 +0000 UTC m=+419.242971180) (total time: 729.953592ms):
Trace[1216243324]: [727.483588ms] [725.545586ms] Transaction committed
I0502 12:36:20.210958       6 trace.go:76] Trace[752902646]: "Update /api/v1/namespaces/kube-system/configmaps/cattle-controllers" (started: 2019-05-02 12:36:19.391714734 +0000 UTC m=+419.197392811) (total time: 818.107324ms):
Trace[752902646]: [777.008563ms] [776.772762ms] Object stored in database
I0502 12:36:20.499649       6 trace.go:76] Trace[301300681]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:36:19.476764362 +0000 UTC m=+419.282526139) (total time: 1.02270943s):
Trace[301300681]: [825.505935ms] [825.379235ms] About to write a response
Trace[301300681]: [1.02270943s] [197.203495ms] END
I0502 12:36:20.551690       6 trace.go:76] Trace[1520707606]: "Get /apis/management.cattle.io/v3/clusters/c-dx942" (started: 2019-05-02 12:36:19.359289586 +0000 UTC m=+419.164970663) (total time: 1.192315184s):
Trace[1520707606]: [1.152026023s] [1.151875723s] About to write a response
[kube] 2019/05/02 12:36:21 Looks like there are no changes for ServiceMonitor "prometheus-cluster-monitoring"
2019-05-02 12:36:21.348268 W | etcdserver: apply entries took too long [153.826331ms for 1 entries]
2019-05-02 12:36:21.348858 W | etcdserver: avoid queries with large range/delete range!
[tiller] 2019/05/02 12:36:22 executing 0 post-upgrade hooks for cluster-monitoring
[tiller] 2019/05/02 12:36:22 hooks complete for post-upgrade cluster-monitoring
[storage] 2019/05/02 12:36:22 updating release "cluster-monitoring.v2150"

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): v2.2.2
  • Installation option (single install/HA): single install

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Custom
  • Machine type (cloud/VM/metal) and specifications (CPU/memory): 5x VM, 6 CPUs, 12 GiB
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:30:48Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version (use docker version):
Client:
 Version:           18.09.5
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        e8ff056dbc
 Built:             Thu Apr 11 04:44:28 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.4
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       d14af54
  Built:            Wed Mar 27 18:01:48 2019
  OS/Arch:          linux/amd64
  Experimental:     false
@XzenTorXz
Copy link

We have simular issues with other apps (https://github.com/getsentry/sentry/). The deployment takes a long time and then it seems like a timelimit hit (after 5 minutes), and it starts to redeploy the whole app (I never noticed this on previous version).

@happydenn
Copy link

We have also experienced the same issue, here's a screenshot of the ConfigMaps resulted from redeploying:

image

Fresh Rancher 2.2.2 install
Kubernetes 1.13.5
Docker 18.09.5

@deniseschannon deniseschannon added this to the v2.2.3 milestone May 8, 2019
@deniseschannon deniseschannon added the kind/bug Issues that are defects reported by users or that we know have reached a real release label May 8, 2019
@Oats87
Copy link
Contributor
Oats87 commented May 8, 2019

Encountered this on a 2.2.2 environment. This is bad, because it will very quickly cause the environment to become non-functional for the system project for that specific cluster.

@jiaqiluo
Copy link
Member

Got the bug reproduced on v2.2.2

Steps:

  • add a cluster with 3 etcd nodes, 2 control plane nodes and any number of worker nodes
  • enable the cluster monitoring

Result:

  • right after enabling the cluster monitoring, the app cluster-monitoring is deployed 6 times.

screenshot

@jiaqiluo
Copy link
Member
jiaqiluo commented May 10, 2019

The bug fix is validated on Rancher: master 5d74988

Steps:

  • add a cluster with 3 etcd nodes, 2 control plane nodes and any number of worker nodes
  • enable the cluster monitoring
  • enable the project monitoring

Result:

  • notice that the app cluster-monitoring and project-monitoring are deployed only one time.

screenshot

screenshot


Test 2:

  • run Rancher: v2.2.2
  • add two identical clusters and each has 3 etcd nodes, 2 control plane nodes and any number of worker nodes
  • on cluster1 enable the cluster monitoring
  • upgrade Rancher to v2.2.3-rc8
  • check if the cluster-monitoring app gets re-deployed multiple times
  • on cluster1 enable the project monitoring
  • on cluster2 enable the cluster monitoring and the project monitoring

Result:

  • the app cluster-monitoring and project-monitoring are deployed only one time.

@jiaqiluo
Copy link
Member
jiaqiluo commented May 10, 2019

The issue is not fixed becau 8000 se the app cluster-monitoring still gets redeployed several times when the cluster monitoring is enabled.

We also see that whenever we deploy apps and workloads, the cluster-monitoring also get re-deployed.

Screen Shot 2019-05-10 at 2 16 30 PM

Screen Shot 2019-05-10 at 2 17 03 PM

@jiaqiluo jiaqiluo reopened this May 10, 2019
@zube zube bot removed the [zube]: To Triage label May 10, 2019
alena1108 pushed a commit that referenced this issue May 11, 2019
Problem:

Monitoring get redeployed due to etcd params updated. However, it is not expected as the etcd address doesn't change at all.

Solution:

Sort it before assign.

Issue:

#19945
alena1108 pushed a commit that referenced this issue May 11, 2019
Problem:

Monitoring get redeployed due to etcd params updated. However, it is not expected as the etcd address doesn't change at all.

Solution:

Sort it before assign.

Issue:

#19945
cjellick pushed a commit that referenced this issue May 14, 2019
Problem:

Monitoring get redeployed due to etcd params updated. However, it is not expected as the etcd address doesn't change at all.

Solution:

Sort it before assign.

Issue:

#19945
@zube zube bot removed the [zube]: Reopened label May 14, 2019
@cjellick
Copy link

fixed in master

@jiaqiluo
Copy link
Member

The bug fix is validated on Rancher: master beffccd

Steps:

  • add several different kinds of clusters, in my case they are EC2 cluster, custom RKE cluster, imported GKE cluster
  • enable the cluster monitoring
  • deploy some workloads and apps
  • run automation tests on the cluster
  • leave the cluster idle for a while

Result:

  • notice that the app cluster-monitoring and project-monitoring are deployed only one time.

@manarhusrieh
Copy link

Will this fix be released in version 2.2.x or we will wait for this until 2.3.x? As we are currently unable to deploy monitoring to our cluster.

@spencergilbert
Copy link

@manarhusrieh this has been resolved for us, we're on 2.2.4 but i believe since 2.2.2

@manarhusrieh
Copy link

@spencergilbert We are currently running 2.2.2 and the problem exists. We will apply the patch 2.2.5 once released.

@manarhusrieh
Copy link

Just to confirm the problem is resolved on patch 2.2.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release
Projects
None yet
Development

No branches or pull requests

0