8000 Implement uncoordinated upgrades on e2e by sergio-mena · Pull Request #238 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Implement uncoordinated upgrades on e2e #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 31, 2023
Merged

Conversation

sergio-mena
Copy link
Contributor
@sergio-mena sergio-mena commented Jan 30, 2023

Closes #56 and #167

This PR builds on existing "multiversion" infra to change a node's docker image on the fly (stop, change image, restart).

I spent some time researching solutions, with the intention of finding solutions as surgical as possible. Please review the two approaches included in this PR:

  • Approach 1: please review corresponding commit in isolation.
    • This approach changes the docker compose file just after having started all nodes
    • Then, upon a node upgrade, we call "docker-compose up", which recreates the container with the new image, but keeps the old data and config (since they are stored in a volume)
    • This is more surgical than Approach 2, but has a BIG disadvantage: we lose the docker logs for the original container
    • This commit has been reverted, because Approach2 was preferred
  • Approach 2: please review corresponding commit in isolation.
    • In this approach, when the upgrade version and the node's version don't match, we create an extra container for that node in docker compose, which shares the same volumes (same config and data dirs)
    • Upgrades are modeled as perturbations (surgical), which allows for uncoordinated (minor) upgrade
    • When it is time to upgrade, we stop the node's initial container, and start the extra container
    • This is less surgical than Approach 1, but it is preferred because we can keep all logs

The first commit of this PR is also fixing the e2e infra for tendermint-based releases (<= v0.34.24)

With this PR, we can define testnet manifests that can:

  • Test uncoordinated upgrades for minor versions. Many nodes can be defined with a mix of initial versions
  • Test upgrades for major versions with a single node
  • The logic can be easily extended to test coordinated (major) upgrades (not done in this PR)

I still need to update the changelog and the doc, but first, I'd like to have the two approaches reviewed.


PR checklist

  • Tests written/updated, or no t 8000 ests needed
  • CHANGELOG_PENDING.md updated, or no changelog entry needed
  • Updated relevant documentation (docs/) and code comments, or no
    documentation updates needed

@sergio-mena sergio-mena requested a review from a team as a code owner January 30, 2023 10:10
@sergio-mena sergio-mena self-assigned this Jan 30, 2023
@sergio-mena sergio-mena added the rename Renaming our fork label Jan 30, 2023
Copy link
Contributor
@thanethomson thanethomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I tried running this locally but got the same kinds of errors that were seen in the nightlies, so I assume that #237 will fix that.

Comment on lines 73 to +74
- ./{{ .Name }}:/cometbft
- ./{{ .Name }}:/tendermint
Copy link
Contributor
@thanethomson thanethomson Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work as expected? (i.e. the same host folder being mounted in two different places)

Never tried this before with Docker 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do this, since older images ( < v0.34.24) use /tendermint as home path, and we can no longer change them.

In normal conditions, I agree it's dangerous to mount the same volume in two different places, but in this case, I think we are safe because:

  • Old images will only use /tendermint
  • New images will only use /cometbft

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the docker-compose didn't work for old images

{{- if eq .ABCIProtocol "builtin" }}
entrypoint: /usr/bin/entrypoint-builtin
{{- else }}{{ if eq .ABCIProtocol "builtin_unsync" }}
{{- if or (eq .ABCIProtocol "builtin") (eq .ABCIProtocol "builtin_unsync") }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Didn't know one could do this in templates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :-)
I needed to get familiar with templates while working on this, and found this trick by chance

@sergio-mena
Copy link
Contributor Author

so I assume that #237 will fix that.

Let me know if that's not the case. I'll keep this PR open until tomorrow

@sergio-mena sergio-mena merged commit b27bb3a into main 8000 Jan 31, 2023
@sergio-mena sergio-mena deleted the sergio/e2e-upgrade branch January 31, 2023 09:46
@sergio-mena sergio-mena added the backport-to-v0.37.x Tell Mergify to backport the PR to v0.37.x label Jan 31, 2023
mergify bot pushed a commit that referenced this pull request Jan 31, 2023
* Fix multiversion e2e when `version <= v0.34.24`

* Changes we want no matter what we do

* Approach 1: With docker-compose up recreating image

* Revert "Approach 1: With docker-compose up recreating image"

This reverts commit 068f5b78d3b7cf99fb21296707117e4f4e2ef3e0.

* Approach 2: With distinct docker-compose service

* Fix the case when 'upgrade' is not the last perturbation

* Update generator

* Simplify template

* bump

* Update test/e2e/pkg/manifest.go

* changelog

* doc

* fix doc

(cherry picked from commit b27bb3a)
sergio-mena added a commit that referenced this pull request Jan 31, 2023
* Fix multiversion e2e when `version <= v0.34.24`

* Changes we want no matter what we do

* Approach 1: With docker-compose up recreating image

* Revert "Approach 1: With docker-compose up recreating image"

This reverts commit 068f5b78d3b7cf99fb21296707117e4f4e2ef3e0.

* Approach 2: With distinct docker-compose service

* Fix the case when 'upgrade' is not the last perturbation

* Update generator

* Simplify template

* bump

* Update test/e2e/pkg/manifest.go

* changelog

* doc

* fix doc

(cherry picked from commit b27bb3a)

Co-authored-by: Sergio Mena <sergio@informal.systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-v0.37.x Tell Mergify to backport the PR to v0.37.x rename Renaming our fork
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Improve QA infra to test upgrades on e2e/testnets
2 participants
0