Description
Sentry
- On Monday check all the alerts for the past week-end
- Fix recurring alerts
- Deploy fixes to production for issues that cause major disruption or complete downtime
- Verify no alerts are being triggered anymore
- More complicated issues should be brought to the team and prioritized correctly.
- Clean all the past alerts so we have easy to navigate dashboard
- Link GitHub and Sentry issues
Grafana
React to alerts arriving through email and check the SLO monitoring page (Packit section) and respond to the email so others know what is happening. Suggest updates of the alert thresholds if needed.
Watch our other two Grafana dashboards as well:
SLO1 issues investigation
We are investigating SLO1 issues. They could be related to short running tasks taking more than half a minute to complete.
When looking at the Celery monitoring dashboard pay attention to short running tasks and how long they took to complete.
For the moment we can report misbehaving here.
CI/Zuul
You are responsible throughout the week for keeping the CI green, that is to look for and drive the resolution of systematic CI failures.
It can happen that a CI system has an outage. For problems related to Zuul, please reach out to the team at #sf-ops
matrix.org or #rhos-ops
Slack channel.
pre-commit-ci
Once the pre-commit-ci user creates updates to our pre-commit configs, take care of the pull requests:
Openshift
If you think there's something wrong with the Openshift instance we're running in:
- Automotive cluster - ask in
packit-auto-shared-infra
in internal Google chat or mailto auto-packit-shared-infra@redhat.com - Managed Platform Plus - ask in
#help-it-cloud-openshift
in internal Slack
Metadata
Metadata
Assignees
Type
Projects
Status