8000 sudo reboot mb validator recovery fails · Issue #35190 · solana-labs/solana · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
8000
Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
sudo reboot mb validator recovery fails #35190
Closed
@john-smith-solana

Description

@john-smith-solana

Problem

1.17.20, 10-12 Feb 2024.

I was curious about a mb validator’s recovery ability. And so I used a spare non-voting mb validator to see if it could recover from an abrupt sudo reboot

I ran default validator settings, so incremental snapshots happen every minute and full-snapshots every 3 hours.

I tried 2 different validator startup scripts:
Script A: included —use-snapshot-archives-at-startup when newest
Script B: it was removed


Test Methodology 1:

sudo reboot when incremental snapshots are available and less than 1 minute old:

Script A: 3 reboots, 3 successful recoveries each in approx 13 mins
Script B: 3 reboots, 3 successful recoveries each in approx 15 mins

So far so good!


Test Methodology 2:

However, approximately 10 mins before the 3-hour full snapshot is due, the validator stops creating minute-by-minute incrementals and starts only creating the next full snapshot. This means the last incremental gets up to ~15 mins old. [At least, this is my interpretation of what it looks like it's doing!].

sudo reboot at various times with an old/aging incremental during full snapshot creation (for clarity this is approx a 15 minute window every 3 hours):

Script A:
Incremental 8 mins old: Failed
Incremental 6 mins old: Failed
Incremental 5 mins old: Failed

Script B:
Incremental 7 mins old: Success, took 19 mins
Incremental 9 mins old: Success, took 20 mins
Incremental 13 mins old: Success, took 23 mins

For Test Methodology 2 using Script A, each time it failed for ERROR solana_ledger::bank_forks_utils] Failed to load bank: AccountsFile error: AppendVecError: incorrect layout/length/data in the appendvec at path /mnt/solana-accounts/run/247471897.30603

Proposed Solution

On the advice of Brooks in the discord this issue is opened to address Script A - Test Methodology 2 - failing.

Metadata

Metadata

Assignees

Labels

communityCommunity contribution

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0