You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For services that are not webapps, what does Dockerflow recommend we do for healthcheck endpoints?
For example, the Socorro processor is not a webapp and doesn't have anything to respond to HTTP, so there's nothing to implement healthchecks with.
Is it the case that all services must implement a webapp to handle Dockerflow healthcheck enpoints? Should we have something else for non-webapp services?
The text was updated successfully, but these errors were encountered:
process_email_from_sqs.py is a long-running process that loops to poll a AWS Queue and processes any emails. It periodically writes a healthcheck file to disk with the timestamp and some data. Email is unpredictable, and the standard library email processing expects spec-compliant emails, so there are uncaught exceptions that cause the process to crash. The AWS client library has some built-in retry logic, so connection issues can appear as a stuck process.
check_health.py is a second management command that attempts to read the healthcheck file. If it doesn't exist, or there is an issue like the in-data timestamp is too old, it exits with an error code. If everything is copacetic, it returns with a 0 error code for success.
The process_email_from_sqs.py command is run as a Kubernetes deployment with several replicas. The check_health.py command runs as a liveness probe. The spec looks something like this:
We have hundreds of liveness probe failures a day according to Sentry, but require several in a row to terminate a process. It is more common for a process to terminate due to a uncaught exception, but the liveness check does prevent zombie replicas from sticking around until the next deployment.
I'm negative on a webservice for each background process, but we could re-implement this as a webservice that runs process_emails_from_sqs.py in a fork, sends health data over a pipe, and serves the health data at /__heartbeat__, with a proper status code for a stalled process. I don't think it would make much sense to expose this webservice to the world, it would just be for making a background service look like a web service.
For services that are not webapps, what does Dockerflow recommend we do for healthcheck endpoints?
For example, the Socorro processor is not a webapp and doesn't have anything to respond to HTTP, so there's nothing to implement healthchecks with.
Is it the case that all services must implement a webapp to handle Dockerflow healthcheck enpoints? Should we have something else for non-webapp services?
The text was updated successfully, but these errors were encountered: