-
Notifications
You must be signed in to change notification settings - Fork 9.6k
notifier: fix drop alertmanager on apply config step #13119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: tommy0 <tommy0@mail.ru>
d4838e8
to
7fe64e9
Compare
We already have a proper fix for this, which is the command line flag Can you please try it? |
@roidelapluie |
@roidelapluie any news? This bug can't be fixed with new service discovery and it looks important enough to me as users lose some alerts just because of config reload. |
I need to continue testing this. I have started already. |
@roidelapluie Any updates? |
@roidelapluie sorry for ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across this, again.
Thanks for looking into this.
I have some remarks.
@@ -264,6 +264,14 @@ func (n *Manager) ApplyConfig(conf *config.Config) error { | |||
return err | |||
} | |||
|
|||
// If alertmanagers already exists, don't skip it, in reload step will be refreshed new targets. | |||
oldAmSets, ok := n.alertmanagers[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we should rely on k
for this, given how it’s constructed:
Lines 910 to 917 in 5c41768
// ToMap converts a slice of *AlertmanagerConfig to a map. | |
func (a AlertmanagerConfigs) ToMap() map[string]*AlertmanagerConfig { | |
ret := make(map[string]*AlertmanagerConfig) | |
for i := range a { | |
ret[fmt.Sprintf("config-%d", i)] = a[i] | |
} | |
return ret | |
} |
ams
which isn't desired.
I'd expect a test case for that as well.
I think we'll need to compare the configs as we do in other places. If the AlertmanagerConfig
is already known, temporarily use the corresponding alertmanagerSet
's ams
.
We'll need to agree on certain behaviors:
- Should we set the existing
droppedAms
as well? - Perhaps we don't want the entire
AlertmanagerConfig
to be the same. Maybe equalServiceDiscoveryConfigs
(andRelabelConfigs
) are sufficient, but it wouldn't be safe.
I think this is fixed by #14174 |
I don't think #14174 was meant to fix this, the I'll try to revive this with the fix I proposed here #13119 (comment) |
we can close this as #14987 was merged. |
When the config is applied, set of alertmanagers are recreated with an empty list of URLs (
ams []alertmanager
init alertmanagerSet). Before reload will happen,sendAll
will skip the notification batch, because alertmnager list is empty ( skip for ). I think it is necessary to save the list of alertmanagers when applying the config, since it will be updated at the next reload step.A test case is added to demonstrate issue above.
Fixes #13064
In our cases, we observe that metric prometheus_notifications_dropped_total increases and heartbeat notifications in alertmanager are skipped, this fix repare this.