Description
This issue was motivated by the PR #2010, that addresses #646 and #652. The problem, however, is more general and possibly causes problems when stopping multiple services/reactors.
In short, when we call Stop()
for an implementation of service.Service
, the service.BaseService
implementation of this method on service/service.go does the following:
- Sanity check: is the service started and not already stopped
- Changes the status of the service to stopped, causing
IsRunning()
to return false - Calls
OnStop()
, which is supposed to be overridden by service implementations - Closes the
Quit()
channel, which also releases blocking calls toWait()
Implementations of the OnStop()
method are supposed, in particular, to stop sub-routines, typically started upon OnStart()
. The problem is how the routines know they have to leave, when the service was or is being stopped.
The typical approach is to use the Quit()
channel for that, since once it is closed the service has been stopped. But this channel is only closed once closed once OnStop()
returns (see above). While OnStop()
implementations Are supposed to stop all sub-routines. We thus have short circuit here.
The solution adopted in #2010 consists in overriding the Stop()
method, not the OnStop()
method as expected by the service definition. The overridden method calls BaseService.Stop()
, so that to, in particular, close the Quit()
channel. The sub-routines should eventually, once realizing that this channel is closed, return and notify this to a synchronized abstraction (typically, a sync.WaitGroup
). The overridden Stop()
method will then block on the synchronized abstraction before returning.
While this solution, despite the added complexity, mostly works, we have an additional problem here. The Wait()
call returns once the Quit()
channel is closed. This channel is closed by BaseService.Stop()
, once the OnStop()
method returns. This is the right way to implement this functionality. But when we override the Stop()
method, so that it waits for all sub-routines to quit, we first call BaseService.Stop()
, as above described. Which means that the Wait()
method is released and returns. While the service is not yet completely stopped, as the overridden version of Stop()
is still waiting for sub-routines to quit. Moreover, the same problem arises if another component or service is using Quit()
to identify whether this service has been stopped.
In summary, we can get rid of the short circuit created by OnStop()
unblocking Quit()
, while Quit()
being unblocked is a condition for sub-routines to quit, which is what OnStop()
is supposed to implement. But with this work-around we break the definition of the Wait()
and Quit()
methods, which do not actually wait until the service is fully stopped.