-
Notifications
You must be signed in to change notification settings - Fork 8
feat: do not panic when block validation fails #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jeluard <jeluard@users.noreply.github.com>
Signed-off-by: jeluard <jeluard@users.noreply.github.com>
WalkthroughThis update tweaks error handling and logging in the consensus forward chain and ledger stages. It changes how hash values are shown in logs, simplifies error propagation, and improves explicitness in control flow by matching on results and logging errors directly instead of panicking. Changes
Sequence Diagram(s)sequenceDiagram
participant Worker
participant Ledger
Worker->>Ledger: roll_forward(block)
alt roll_forward returns Ok(None)
Worker-->>Worker: return BlockValidated
else roll_forward returns Ok(Some(err))
Worker-->>Worker: return BlockValidationFailed
else roll_forward returns Err(err)
Worker->>Worker: log error
Worker-->>Worker: return BlockValidationFailed
end
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
crates/amaru/src/stages/ledger.rs (2)
115-118
: Missing telemetry of why a block is invalidYou now bubble
BlockValidation::Invalid
up asOk(Some(err))
, but the caller fully discardserr
, leaving us none-the-wiser about which rule tripped. A singletracing::warn!(?err, …)
right here would give ops folk the clue-bat they need without re-plumbing types downstream.- BlockValidation::Invalid(err) => Ok(Some(err)), + BlockValidation::Invalid(err) => { + tracing::warn!(?err, "block failed ledger rules"); + Ok(Some(err)) + },
175-187
: Early clones chew memory like Pac-Man – only clone when you know you need ’em
point.clone()
andblock.to_vec()
happen before we know the verdict. For most invalid blocks that’s wasted alloc work. Shuffle the cloning inside each branch and you’ll spare a few cycles per block – not earth-shattering, but every frame counts when you’re chasing 60 fps.- ValidateBlockEvent::Validated { point, block, span } => { - let point = point.clone(); - let block = block.to_vec(); + ValidateBlockEvent::Validated { point, block, span } => { let span = restore_span(span); - match stage.roll_forward(point.clone(), block.clone()) { - Ok(None) => BlockValidationResult::BlockValidated { point, block, span }, - Ok(Some(_)) => BlockValidationResult::BlockValidationFailed { point, span }, + match stage.roll_forward(point.clone(), block.to_vec()) { + Ok(None) => BlockValidationResult::BlockValidated { + point: point.clone(), + block: block.to_vec(), + span, + }, + Ok(Some(_)) => BlockValidationResult::BlockValidationFailed { point: point.clone(), span }, Err(err) => { error!(?err, "Failed to validate block"); BlockValidationResult::BlockValidationFailed { point: point.clone(), span } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/amaru/src/stages/consensus/forward_chain.rs
(1 hunks)crates/amaru/src/stages/ledger.rs
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Coverage
- GitHub Check: Build on ubuntu-latest with target riscv32im-risc0-zkvm-elf
- GitHub Check: Snapshots (preprod, 10.1.4)
- GitHub Check: Simulation
- GitHub Check: Build on ubuntu-latest with target x86_64-unknown-linux-gnu
- GitHub Check: Build on ubuntu-latest with target wasm32-unknown-unknown
- GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
- GitHub Check: Build on ubuntu-latest with target aarch64-unknown-linux-musl
- GitHub Check: Build on macos-latest with target aarch64-apple-darwin
- GitHub Check: Sanity
🔇 Additional comments (1)
crates/amaru/src/stages/consensus/forward_chain.rs (1)
318-323
: Sweet catch swapping to{:?}
– avoids a compile-time face-plantSwitching to the debug formatter removes the need for an explicit
Display
impl onHash<32>
, neatly dodging the old “doesn’t implementfmt::Display
” panic-fest. Logs still print the hex thanks toDebug
onHash
, so we keep human-readability.No further action from me; log on, mate.
Codecov ReportAttention: Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
Great, thank you! For now, while we're receiving invalid blocks from upstream (trusted) peers, we know that something is wrong with our logic. But once we're connected to unknown peers we'll have to handle this appropriately. This is one piece of that. |
What we need to do in this case is to disconnect from the upstream peer which for all intents and purposes, in the current stage of the system, translates to crashing. What do you refer to
|
Yes, from the ledger perspective it should just continue processing blocks. There is consensus and networking things that have to occur, as Arnaud said, such as removing the peer. When we have a single peer, that is effectively stopping the process, so it's no different than the current situation. But I don't mind removing the panic and preparing for more robust logic. I don't know enough about the networking/consensus specs, so @abailly will be better for that side |
@yHSJ right, I meant the consensus panic not the ledger one |
Definitely! But as I said, the correct behaviour when a block is invalid is to assume the remote peer is adversarial and therefore to disconnect from it. In the current state of amaru this is exactly equivalent to crashing, because there's no more progress the node can make once disconnected from its one and only peer. So do you suggest we just ignore block validation failures and assume we can proceed, at least until we add this disconnection behaviour? |
In the absence of node rating and eventual subsequent disconnection I would assume we consider them honest. Thus amaru should keep accepting blocks from them? The current error sounds a bit unrelated. |
Once a block fails to validate, the tip of the chain is not updated and therefore all subsequent blocks will fail to validate isn't it? There's another issue in the |
As I said on discord, happy to discuss f2f if that helps |
Right! I think it shouldn't crash |
Fair enough. Then happy to go down that path, let's talk on Monday? |
Let
amaru
continue validating after a block validation fails.This currently does not work and still panics higher up in the stack due to a mishandling in the consensus layer. This will be addressed in a subsequent PR.
Summary by CodeRabbit
Bug Fixes
Refactor