Description
After presenting Scorch at GopherCon UK, a suggestion was made to consider alternate goroutine patterns for scorch.
Background
Scorch currently uses dedicated goroutines for the following activities:
- introducer (introducing new segments to the current snapshot)
- persister (persist in-memory segments to disk, also persist snapshot information)
- merger (attempt to merge segments, such that they conform to the desired distribution)
One of the biggest challenges we face is how to balance the behaviors of these goroutines. Both the perister and merger write to disk, which means no matter what we do they are competing for bandwidth on the I/O channel. Second, the merge operations can become quite time consuming as the segments grow larger, this makes it harder to "schedule" when to merge because the other choices complete significantly faster. In practice this means often the persister can "run far ahead of the merger" meaning we consume too many file descriptors. Letting the merger run helps this (by reducing the number of files), but means introducing some artificial coupling between the merger and persister. Attempts to do this so far have been unsatisfactory. Finally, while forcing the persister to go slower, we end up consuming more memory because the introducer now runs ahead introduce more in-memory segments.
One aspect not covered in the talk was that we also perform some in-memory merging of segments, and this could also be key to a complete solution. But again, the current situation is a bit unsatisfactory as the in-memory merging is done by the persister (perhaps not expected, obvious or ideal).
Suggestion
https://twitter.com/marccoury/status/1025700998332014592
1 go-routine per-segment (more go-like approach)