Replies: 4 comments
-
Since your load balancers are already redirecting traffic away from servers running old code versions, I think it makes sense to also stop RQ workers running on those servers and start again after the new code is properly deployed. Does this make sense? Having said that, I'm also open to ideas about implementing some kind of "filter" such that certain workers will skip certain jobs, but I'm not sure how we can achieve that in a scalable way. |
Beta Was this translation helpful? Give feedback.
-
Hey @selwin thanks for the reply! I misspoke about our load balancers' capabilities. They don't detect old code versions, they only detect servers undergoing deployment. So in a 3-server deployment A-B-C, server B will be not receive traffic during its "deployment", while A and C will receive traffic using different code versions. This is typically fine, but with workers server A could enqueue a Job for a function that C doesn't have yet (since it's behind a code version). If server C's workers pick up that job, it fails. The only thing we can come up with so far is not a "filter", but overrides import time
from rq.worker import SimpleWorker
class AssertCodeVersionSimpleWorker(SimpleWorker):
def check_appropriate_code_version(self):
# TODO: DETERMINE IF THE CODE VERSION IS WRONG - HOW DO WE DO THIS?
return result
def check_for_suspension(self, *args, **kwargs):
before_state = None
notified = False
super().check_for_suspension(*args, **kwargs)
wrong_code_version = not self.check_appropriate_code_version()
while wrong_code_version and not self._stop_requested:
if not notified:
self.log.info('Worker suspended due to outdated code version')
before_state = self.get_state()
self.set_state(WorkerStatus.SUSPENDED)
notified = True
time.sleep(5) # we'll probably use 1 second in production
wrong_code_version = not self.check_appropriate_code_version()
if before_state:
self.set_state(before_state) Do you see any issues with this implementation? It works well in our isolated tests (although the biggest struggle right now is the code for I could see some scalable way of doing this. This implementation of
|
Beta Was this translation helpful? Give feedback.
-
Maybe you could use a different Queue for the other version? Old workers with 'old' queues and new workers with 'new' queues for example |
Beta Was this translation helpful? Give feedback.
-
This is an interesting issue indeed. However interesting this may be, I do think this will always be very specific to the use case/environment: what is This does seem a perfect example of how useful custom I'm moving this to a Discussion. |
Beta Was this translation helpful? Give feedback.
-
We use django-rq across a multi-server fleet, and use rolling-deployments to deploy new code 1 server at a time. This leaves significant time windows where different servers are running different code versions.
Our load-balancers properly redirect traffic away from servers running old code versions, but those servers have RQ workers that will continue to pick up jobs out of the queues.
We are very interested in any advice on how we could override either the worker class or queue class, in order to verify the code version of the running worker, before taking the job out of the queue and running it on that worker.
Basically, a "filter" for which jobs a worker takes. The worker can check its code version vs the "leader" code version, and pass on any jobs if they do not match until it's updated and restarted. I hope this makes sense!
Thanks to anyone who has devised a solution for this.
Beta Was this translation helpful? Give feedback.
All reactions