Open
Description
Currently silence metric collection happens during scrape time. In scenarios where AlertManager is under heavy load, lock contention can occur and causes high latency in scraping. One such scenario is when there are lots of aggregation groups and new silences are being added
Would it be acceptable to collect silences count in the background instead of collecting it at the time of scraping? Doing so reduces latency in scraping by removing lock contention at the time of scraping. Lock contention can still occur in the Goroutine.
Profile captured during high latency in scraping
-----------+-------------------------------------------------------
runtime.gopark build/lib/src/runtime/proc.go:424
runtime.goparkunlock build/lib/src/runtime/proc.go:430 (inline)
runtime.semacquire1 build/lib/src/runtime/sema.go:178
sync.runtime_SemacquireMutex build/lib/src/runtime/sema.go:95
sync.(*Mutex).lockSlow build/lib/src/sync/mutex.go:173
sync.(*Mutex).Lock build/lib/src/sync/mutex.go:92 (inline)
sync.(*RWMutex).Lock build/lib/src/sync/rwmutex.go:148
github.com/prometheus/alertmanager/silence.(*Silences).Query /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:797
github.com/prometheus/alertmanager/silence.(*Silencer).Mutes /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:145
github.com/prometheus/alertmanager/notify.(*MuteStage).Exec /build/gopath/src/github.com/prometheus/alertmanager/notify/notify.go:599
github.com/prometheus/alertmanager/notify.MultiStage.Exec /build/gopath/src/github.com/prometheus/alertmanager/notify/notify.go:512
github.com/prometheus/alertmanager/notify.RoutingStage.Exec /build/gopath/src/github.com/prometheus/alertmanager/notify/notify.go:495
github.com/prometheus/alertmanager/dispatch.(*Dispatcher).processAlert.func1 /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:423
github.com/prometheus/alertmanager/dispatch.(*aggrGroup).run.func1 /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:548
github.com/prometheus/alertmanager/dispatch.(*aggrGroup).flush /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:611
github.com/prometheus/alertmanager/dispatch.(*aggrGroup).run /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:547
-----------+-------------------------------------------------------
runtime.gopark build/lib/src/runtime/proc.go:424
runtime.goparkunlock build/lib/src/runtime/proc.go:430 (inline)
runtime.semacquire1 build/lib/src/runtime/sema.go:178
sync.runtime_SemacquireMutex build/lib/src/runtime/sema.go:95
sync.(*Mutex).lockSlow build/lib/src/sync/mutex.go:173
sync.(*Mutex).Lock build/lib/src/sync/mutex.go:92 (inline)
sync.(*RWMutex).Lock build/lib/src/sync/rwmutex.go:148
github.com/prometheus/alertmanager/silence.(*Silences).Query /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:797
github.com/prometheus/alertmanager/silence.(*Silences).CountState /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:827
github.com/prometheus/alertmanager/silence.newSilenceMetricByState.func1 /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:242
github.com/prometheus/client_golang/prometheus.(*valueFunc).Write /build/gopath/src/github.com/prometheus/client_golang/prometheus/value.go:95
github.com/prometheus/client_golang/prometheus.processMetric /build/gopath/src/github.com/prometheus/client_golang/prometheus/registry.go:633
github.com/prometheus/client_golang/prometheus.(*Registry).Gather /build/gopath/src/github.com/prometheus/client_golang/prometheus/registry.go:502
-----------+-------------------------------------------------------
PR to collect silence counts in a separate goroutine