database/raft: wait for ConfChange to be applied #1335

jbowens · 2017-06-14T00:58:12Z

When adding a new node to the cluster, wait for the conf change to be
committed and applied before taking a state snapshot and responding
to the new node.

Previously, Join had a race condition between the application of the
conf change entry and the taking of the snapshot. If the snapshot was
taken before the conf change was committed or applied, the new node
would try to boot its state machine from a state that did not include
itself in the configuration. It could mistakeningly think that it is a
single-node cluster and try to elect itself, panicking when its node
id doesn't exist in the progress list: See issue #1330.

Waiting for the conf change to be applied ensures that:

The /raft/join endpoint only returns if adding the node to the cluster
was committed.
The snapshot returned from the /raft/join endpoint includes the new
node in its configuration.

This is a backport fix for the 1.2.x release line.

When adding a new node to the cluster, wait for the conf change to be committed and applied before taking a state snapshot and responding to the new node. Previously, Join had a race condition between the application of the conf change entry and the taking of the snapshot. If the snapshot was taken before the conf change was committed or applied, the new node would try to boot its state machine from a state that did not include itself in the configuration. It could mistakeningly think that it is a single-node cluster and try to elect itself, panicking when its node id doesn't exist in the progress list: See issue #1330. Waiting for the conf change to be applied ensures that: * The /raft/join endpoint only returns if adding the node to the cluster was committed. * The snapshot returned from the /raft/join endpoint includes the new node in its configuration. This is a backport fix for the 1.2.x release line.

jbowens · 2017-06-14T00:58:52Z

PTAL, backport of #1332 onto 1.2-stab 8000 le

tessr · 2017-06-14T01:00:45Z

LGTM

jbowens added 1.2 PTAL labels Jun 14, 2017

tessr approved these changes Jun 14, 2017

View reviewed changes

auto rev id

1ae36ae

iampogo merged commit 2778c25 into 1.2-stable Jun 14, 2017

iampogo deleted the 1.2-join-wait branch June 14, 2017 01:04

jbowens mentioned this pull request Jun 14, 2017

net/raft: panic joining existing cluster #1330

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database/raft: wait for ConfChange to be applied #1335

database/raft: wait for ConfChange to be applied #1335

database/raft: wait for ConfChange to be applied #1335

database/raft: wait for ConfChange to be applied #1335

Conversation