Refactor Script::bytes_to_asm_fmt to use iterator #662

Kixunil · 2021-09-20T17:20:04Z

Because of this change being somewhat large I didn't do it together with #658.
I also don't mind if this is merged after Taproot (feel free to put off review).

This refactors `Script::bytes_to_asm_fmt`` function to use an iterator
instead of index. Such change makes it easier to reason about overflows
or out-of-bounds accesses. As a result this also fixes three unlikely
overflows and happens to improve formatting to not output space at the
beginning in some weird cases.

To improve robustness even better it also moves read_uint
implementation to internal function which returns a more specific error
type which can be exhaustively matched on to guarantee correct error
handling. Probably because of lack of this the code was previously
checking the same condition twice, the second time being unreachable and
attempting to behave differently than the first one.

Finally this uses macro to deduplicate code which differs only in single
number, ensuring the code stays in sync across all branches.

I also attempted to get this work with no_panic crate to prove absence of panic but looks
like the compiler can not do that. (Yes, I did change writer to be trivial and a bunch of other things.)
If someone wants to give it a try I can publish that change.

Unresolved question

Since Error seems to be related to script execution maybe UintError should be public?
At least NumericOverflow variant is strange.

apoelstra

Wow! This is a huge improvement over the buggy C-style code.

ACK 55c6277

src/blockdata/script.rs

TheBlueMatt · 2021-09-27T21:16:47Z

Would be really nice to get a fuzz test for this, though.

apoelstra · 2021-09-27T22:19:12Z

Will let @Kixunil answer Matt's comments before merging.

Kixunil · 2021-09-27T22:31:02Z

Yeah, I hoped I can get rustc to prove there's no panic which would make fuzz redundant. Sadly I was unable to do it so fuzz it will be.

sanket1729 · 2021-09-28T00:28:23Z

I would like to review this(hopefully by tomorrow). Requesting to hold off merging till then

dr-orlovsky

Just a small nit. Do not approve yet since run out of time to complete review and will do it later

dr-orlovsky · 2021-09-28T05:45:56Z

src/blockdata/script.rs

@@ -154,6 +154,22 @@ impl fmt::Display for Error {
 #[cfg_attr(docsrs, doc(cfg(feature = "std")))]
 impl ::std::error::Error for Error {}

+// Our internal error proves that we only return these two cases from `read_uint_iter`.
+// Since it's private we don't bother with trait impls besides From.
+enum UintError {


Nit: I do not see any harm of doing derives here with just a single line. But pls do not discard other reviews just because of this one Nit

The type is private (at least for now) and none of the derives is needed, so let's not needlessly blow up compile time. If we decide to make it public (I'm in favor of this) there will be all applicable derives and trait impls.

"data types should eagerly implement derives" (from rust guidelines). :)

I am not insisting at all, just trying to understand the logic. Ok, the logic is not to blow the computing time.

However from my experience frequently these private error types become public and ppl forget to add derives to them - at least it cost me a lot of time to update rust-secp256k1, rust-bitcoin and minor crates with derives required to do a proper error handling on my own error types :(

I'd rather not add gratuitous derives. I find it confusing to read derives on private types because then I need to understand why they are needed.

If this type ever becomes public we can add derives at that time.

@dr-orlovsky I should have realized there's "public" word missing in that line when reviewing. Yeah, forgetting is unfortunate. Maybe have some checklists for various common tasks that could include this when changing a type to public? I'm thinking of something similar LND has.

dr-orlovsky

utACK 659468681743458e5467bbb92e47ee3c6f084411 with two nits

src/blockdata/script.rs

RCasatta · 2021-09-28T09:25:36Z

src/blockdata/script.rs

    if data.len() < size {
-        Err(Error::EarlyEndOfScript)
+        Err(UintError::EarlyEndOfScript)
+    } else if size > usize::from(u16::max_value() / 8) {


maybe even be more strict with size > 8 since we haven't arch greater than 64 bits?

I agree. This line is clever but we probably don't get any extra generality vs just using 8 as a bound.

Was thinking about it but felt wrong to return NumericOverflow for what is actually too large input. Was also thinking of

enum UintSize { OneByte = 1, TwoBytes = 2, FourBytes = 4, EightBytes = 8, }

This enforces sane values but is API-breaking and a bit cumbersome. Maybe worth it considering most uses will use literal anyway?

(Also, maybe "width" is better than "size"?)

src/blockdata/script.rs

Kixunil · 2021-09-28T17:48:03Z

I have force-pushed all obvious style changes. Will address size checking when replied and fuzz test in a different commit, not sure if I will have time to do it today.

Kixunil · 2021-09-28T17:59:35Z

CI error: error: failed to run LLVM passes: unknown pass name 'sancov'

dr-orlovsky · 2021-09-28T23:40:36Z

@Kixunil this is known issue: #667

#664 will fix that and after we will need a rebase for nearly all PRs

This refactors `Script::bytes_to_asm_fmt`` function to use an iterator instead of index. Such change makes it easier to reason about overflows or out-of-bounds accesses. As a result this also fixes three unlikely overflows and happens to improve formatting to not output space at the beginning in some weird cases. To improve robustness even better it also moves `read_uint` implementation to internal function which returns a more specific error type which can be exhaustively matched on to guarantee correct error handling. Probably because of lack of this the code was previously checking the same condition twice, the second time being unreachable and attempting to behave differently than the first one. Finally this uses macro to deduplicate code which differs only in single number, ensuring the code stays in sync across all branches.

Kixunil · 2021-09-30T12:39:05Z

Rebased on top of master to include #664 and added the fuzz test. Disclaimer: didn't try to compile the fuzz test, (ab)using CI for that.

This adds fuzz target for `Script::bytes_to_asm_fmt` which could panic due to overflow in the past. Fuzzing should decrease the risk of other panics.

Kixunil · 2021-09-30T13:06:58Z

@dr-orlovsky that was not CI related but my mistake. :)

dr-orlovsky

utACK 0e1b993, however extended test will be welcome (see my comment)

dr-orlovsky · 2021-09-30T13:31:18Z

src/blockdata/script.rs

        assert_eq!(hex_script!("4effffffff01").asm(),
-                   " OP_PUSHDATA4 <push past end>");
+                   "OP_PUSHDATA4 <push past end>");


We may need tests with > 1 push opcode to check formatting when multiple mixed instructions are present

sanket1729

ACK 0e1b993 as this PR as it for bytes_to_asm_fmt. Though I would like to note the panic in instructions API I mentioned previously is not addressed by this.

#658 (review)

This instructions().next().next() should panic for the last testcase in the first commit on 32 bit machines.

Kixunil · 2021-10-01T07:01:16Z

@sanket1729 ah, I got confused when reading your comment previously. You're right there's a bunch of similar issues. I hope I can look at them soon. (In a different PR, also changing the slice to iterator) I came here just for quick sync need to go AFK.

dr-orlovsky · 2021-10-01T11:44:42Z

@TheBlueMatt comment was addressed, so merging now

This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin#662 (review)

2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak) e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak) 0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak) bc76325 Move repeated code to functions in script (Martin Habovstiak) 1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak) Pull request description: This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: #662 (review) ACKs for top commit: tcharding: ACK 2c28d3b sanket1729: ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later. Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6

This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin/rust-bitcoin#662 (review)

…pt::Instructions` 2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak) e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak) 0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak) bc76325 Move repeated code to functions in script (Martin Habovstiak) 1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak) Pull request description: This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin/rust-bitcoin#662 (review) ACKs for top commit: tcharding: ACK 2c28d3b sanket1729: ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later. Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6

Kixunil force-pushed the script-fmt-iter branch from 172aa0a to 6594686 Compare September 20, 2021 17:22

Kixunil mentioned this pull request Sep 20, 2021

[backport #658] Check for overflow in Script::bytes_to_asm_fmt() #661

Merged

dr-orlovsky added this to the 0.28.0 milestone Sep 25, 2021

apoelstra previously approved these changes Sep 27, 2021

View reviewed changes

TheBlueMatt previously approved these changes Sep 27, 2021

View reviewed changes

src/blockdata/script.rs Outdated Show resolved Hide resolved

dr-orlovsky reviewed Sep 28, 2021

View reviewed changes

dr-orlovsky previously approved these changes Sep 28, 2021

View reviewed changes

src/blockdata/script.rs Show resolved Hide resolved

RCasatta previously approved these changes Sep 28, 2021

View reviewed changes

Kixunil dismissed stale reviews from RCasatta, dr-orlovsky, TheBlueMatt, and apoelstra via c83ef8f September 28, 2021 17:46

Kixunil force-pushed the script-fmt-iter branch from 6594686 to c83ef8f Compare September 28, 2021 17:46

Kixunil force-pushed the script-fmt-iter branch from c83ef8f to 2f237ee Compare September 30, 2021 12:37

Kixunil force-pushed the script-fmt-iter branch 2 times, most recently from a6e6bb1 to 339664e Compare September 30, 2021 12:55

Added fuzz test for Script::bytes_to_asm_fmt

0e1b993

This adds fuzz target for `Script::bytes_to_asm_fmt` which could panic due to overflow in the past. Fuzzing should decrease the risk of other panics.

Kixunil force-pushed the script-fmt-iter branch from 339664e to 0e1b993 Compare September 30, 2021 13:06

dr-orlovsky approved these changes Sep 30, 2021

View reviewed changes

sanket1729 approved these changes Sep 30, 2021

View reviewed changes

dr-orlovsky merged commit a961ab4 into rust-bitcoin:master Oct 1, 2021

Kixunil deleted the script-fmt-iter branch October 2, 2021 06:59

Kixunil mentioned this pull request Oct 2, 2021

Use iterator in blockdata::script::Instructions #673

Merged

Kixunil mentioned this pull request Oct 13, 2021

New Witness struct to improve ser/de perfomance #672

Merged

dr-orlovsky mentioned this pull request Jan 15, 2022

Bump version to v0.28 #785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Script::bytes_to_asm_fmt to use iterator #662

Refactor Script::bytes_to_asm_fmt to use iterator #662

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Refactor Script::bytes_to_asm_fmt to use iterator #662

Refactor Script::bytes_to_asm_fmt to use iterator #662

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!