Description
What did you do?
I added an alert_relabel_configs
section to my prometheus configuration. Afterwards I noticed that prometheus_notifications_dropped_total
was increasing. I did not expect this metric to increase because the dropping is intentionally configured.
Reading through notifier.sendAll()
it looks like the boolean result driving this metric is based on numSuccess > 0
. I believe there is a scenario where the call to relabelAlerts
returns a slice of length 0 for each AlertManager, we continue
and numSuccess == 0
when we hit the bottom of the function. This then returns false, which incorrectly increments the dropped alerts by the number of alerts before relabeling.
The way we caught this is we want to fire an alert internally to prometheus but not send it to any of our alertmanager instances.
I would expect that prometheus_notifications_dropped_total
excludes intentionally dropped alerts such that unintentional drops can be surfaced. Right now the metric is including intentional and unintentional drops, which makes it very difficult to tell if there is a problem sending alerts to AlertManager.
What did you expect to see?
prometheus_notifications_dropped_total
staying at 0.
What did you see instead? Under which circumstances?
prometheus_notifications_dropped_total
increasing by the number of relabel drop
matches.
System information
No response
Prometheus version
v3.1.0 linux/amd64
Prometheus configuration file
alertmanagers:
- alert_relabel_configs:
- action: drop
regex: node-condition-k8s
source_labels:
- notify
Alertmanager version
Alertmanager configuration file
Logs