8000 database/raft: wait for ConfChange to be applied by jbowens · Pull Request #1335 · chain/Core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

database/raft: wait for ConfChange to be applied #1335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 14, 2017
Merged

Conversation

jbowens
Copy link
Contributor
@jbowens jbowens commented Jun 14, 2017

When adding a new node to the cluster, wait for the conf change to be
committed and applied before taking a state snapshot and responding
to the new node.

Previously, Join had a race condition between the application of the
conf change entry and the taking of the snapshot. If the snapshot was
taken before the conf change was committed or applied, the new node
would try to boot its state machine from a state that did not include
itself in the configuration. It could mistakeningly think that it is a
single-node cluster and try to elect itself, panicking when its node
id doesn't exist in the progress list: See issue #1330.

Waiting for the conf change to be applied ensures that:

  • The /raft/join endpoint only returns if adding the node to the cluster
    was committed.
  • The snapshot returned from the /raft/join endpoint includes the new
    node in its configuration.

This is a backport fix for the 1.2.x release line.

When adding a new node to the cluster, wait for the conf change to be
committed and applied before taking a state snapshot and responding
to the new node.

Previously, Join had a race condition between the application of the
conf change entry and the taking of the snapshot. If the snapshot was
taken before the conf change was committed or applied, the new node
would try to boot its state machine from a state that did not include
itself in the configuration. It could mistakeningly think that it is a
single-node cluster and try to elect itself, panicking when its node
id doesn't exist in the progress list: See issue #1330.

Waiting for the conf change to be applied ensures that:
* The /raft/join endpoint only returns if adding the node to the cluster
  was committed.
* The snapshot returned from the /raft/join endpoint includes the new
  node in its configuration.

This is a backport fix for the 1.2.x release line.
@jbowens
Copy link
Contributor Author
jbowens commented Jun 14, 2017

PTAL, backport of #1332 onto 1.2-stab 8000 le

@tessr
Copy link
Contributor
tessr commented Jun 14, 2017

LGTM

@iampogo iampogo merged commit 2778c25 into 1.2-stable Jun 14, 2017
@iampogo iampogo deleted the 1.2-join-wait branch June 14, 2017 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0