This repository was archived by the owner on Jun 20, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 681
don't stall gossip broadcasting when there are blocked connections #1831
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
71e8edc
to
1306946
Compare
dc1ebb0
to
9702faf
Compare
1306946
to
5651a17
Compare
9702faf
to
e952021
Compare
fa74e07
to
4341ec2
Compare
- level 2 headings for top-level instead of level 3 - put all status reporting as L3s under one L2 heading - add 'Reboots' to ToC - move 'Stopping weave' into its own section and and move that
Update 'reboots' info Fixes #1774.
and eliminate duplication Fixes #1842.
And also make the error message more meaningful. Fixes #1843.
instead just don't auto-detect TLS args in that case. This makes `weave --local launch` work (again). Fixes #1844.
...rather than via an env var. This is cleaner.
changed my mind This reverts commit 20a8488.
…line" changed my mind This reverts commit be8260a.
clean up GossipSender
Previously broadcasts were being handled by one GossipSender per broadcast source (and channel), which sends the broadcast to all next hops, and hence can stall when a single destination is stalled. Here we get rid of these per broadcast source GossipSenders. Instead broadcasts are sent to per-connection GossipSenders for all next hops. For this we use the existing GossipSenders we have set up for ordinary gossip. In order to deal with broadcasts they need one GossipData cell per broadcast source, since only broadcasts from the same source can be Merge()ed. So we add a PeerName->GossipData map of cells, in addition to the existing cell for ordinary gossip. The GossipSender goroutine picks one of the cells at a time, Encode()s the contents and sends it. It prefers the ordinary gossip cell over the broadcast cells since typically ordinary gossip is more important. To reduce coupling, the GossipSenders don't actually know how to construct protocol messages. They just invoke a couple of functions for that - one for ordinary gossip and one for broadcast - which are supplied by the GossipChannel. There are two downsides to this change: 1. broadcasts get encoded per connection, rather than just once 2. we can potentially end up with O(n_peers^2) cells, each containing accumulated (via GossipData.Merge()) broadcasts. For this to happen, the peer must - deal with gossip from most nodes. This doesn't happen in (near)complete connection topologies. Hypercube topologies are probably the worst case uniform topology. And a star topology is the worst case for a single (the centre) peer. - have backlogged connections to most of its neighbours, without those connections being completely stalled (since that would cause heartbeat timeouts to terminate them). Furthermore, for this to matter in practise, the accumulated broadcast GossipData must be sizeable: - For topology gossip, GossipData is just a set of PeerNames. Which takes very little space and is bounded, since there is a finite number of peers. And, if we could get rid of workaround for #1793 (cbaa92d), then topology broadcasts would only ever contain information about the source peer, so the GossipData would contain just one PeerName. - IPAM only employs broadcast during initialisation and shutdown. - DNS broadcasts contain DNS entries for containers on the source peer. Each entry will typically be 100-200 bytes. The accumulated broadcast GossipData from a peer will contain entries for all the peer's DNS entries, worst case. This includes tombstones, i.e. entries for containers that have died. If there is churn, i.e. DNS entries being added and removed continuously, and the churn rate exceeds the rate at which we can forward those entries, then the accumulated broadcast GossipData can grow unbounded. Note that this is the case on master too; the difference here is that we can have up to n_peers copies of that GossipData.
4341ec2
to
8d30953
Compare
replaced by #1855 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See commits for explanation.
Note that this PR is based on #1826.