-
Notifications
You must be signed in to change notification settings - Fork 831
Refactor Script::bytes_to_asm_fmt to use iterator #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
172aa0a
to
6594686
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! This is a huge improvement over the buggy C-style code.
ACK 55c6277
Would be really nice to get a fuzz test for this, though. |
Will let @Kixunil answer Matt's comments before merging. |
Yeah, I hoped I can get rustc to prove there's no panic which would make fuzz redundant. Sadly I was unable to do it so fuzz it will be. |
I would like to review this(hopefully by tomorrow). Requesting to hold off merging till then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nit. Do not approve yet since run out of time to complete review and will do it later
@@ -154,6 +154,22 @@ impl fmt::Display for Error { | |||
#[cfg_attr(docsrs, doc(cfg(feature = "std")))] | |||
impl ::std::error::Error for Error {} | |||
|
|||
// Our internal error proves that we only return these two cases from `read_uint_iter`. | |||
// Since it's private we don't bother with trait impls besides From. | |||
enum UintError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I do not see any harm of doing derives here with just a single line. But pls do not discard other reviews just because of this one Nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is private (at least for now) and none of the derives is needed, so let's not needlessly blow up compile time. If we decide to make it public (I'm in favor of this) there will be all applicable derives and trait impls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"data types should eagerly implement derives" (from rust guidelines). :)
I am not insisting at all, just trying to understand the logic. Ok, the logic is not to blow the computing time.
However from my experience frequently these private error types become public and ppl forget to add derives to them - at least it cost me a lot of time to update rust-secp256k1
, rust-bitcoin
and minor crates with derives required to do a proper error handling on my own error types :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not add gratuitous derive
s. I find it confusing to read derive
s on private types because then I need to understand why they are needed.
If this type ever becomes public we can add derives at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-orlovsky I should have realized there's "public" word missing in that line when reviewing. Yeah, forgetting is unfortunate. Maybe have some checklists for various common tasks that could include this when changing a type to public? I'm thinking of something similar LND has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 659468681743458e5467bbb92e47ee3c6f084411 with two nits
if data.len() < size { | ||
Err(Error::EarlyEndOfScript) | ||
Err(UintError::EarlyEndOfScript) | ||
} else if size > usize::from(u16::max_value() / 8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe even be more strict with size > 8
since we haven't arch greater than 64 bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This line is clever but we probably don't get any extra generality vs just using 8 as a bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was thinking about it but felt wrong to return NumericOverflow
for what is actually too large input. Was also thinking of
enum UintSize {
OneByte = 1,
TwoBytes = 2,
FourBytes = 4,
EightBytes = 8,
}
This enforces sane values but is API-breaking and a bit cumbersome. Maybe worth it considering most uses will use literal anyway?
(Also, maybe "width" is better than "size"?)
c83ef8f
6594686
to
c83ef8f
Compare
I have force-pushed all obvious style changes. Will address size checking when replied and fuzz test in a different commit, not sure if I will have time to do it today. |
CI error: |
This refactors `Script::bytes_to_asm_fmt`` function to use an iterator instead of index. Such change makes it easier to reason about overflows or out-of-bounds accesses. As a result this also fixes three unlikely overflows and happens to improve formatting to not output space at the beginning in some weird cases. To improve robustness even better it also moves `read_uint` implementation to internal function which returns a more specific error type which can be exhaustively matched on to guarantee correct error handling. Probably because of lack of this the code was previously checking the same condition twice, the second time being unreachable and attempting to behave differently than the first one. Finally this uses macro to deduplicate code which differs only in single number, ensuring the code stays in sync across all branches.
c83ef8f
to
2f237ee
Compare
Rebased on top of master to include #664 and added the fuzz test. Disclaimer: didn't try to compile the fuzz test, (ab)using CI for that. |
a6e6bb1
to
339664e
Compare
This adds fuzz target for `Script::bytes_to_asm_fmt` which could panic due to overflow in the past. Fuzzing should decrease the risk of other panics.
339664e
to
0e1b993
Compare
@dr-orlovsky that was not CI related but my mistake. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 0e1b993, however extended test will be welcome (see my comment)
assert_eq!(hex_script!("4effffffff01").asm(), | ||
" OP_PUSHDATA4 <push past end>"); | ||
"OP_PUSHDATA4 <push past end>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need tests with > 1 push opcode to check formatting when multiple mixed instructions are present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 0e1b993 as this PR as it for bytes_to_asm_fmt
. Though I would like to note the panic in instructions API I mentioned previously is not addressed by this.
This instructions().next().next()
should panic for the last testcase in the first commit on 32 bit machines.
@sanket1729 ah, I got confused when reading your comment previously. You're right there's a bunch of similar issues. I hope I can look at them soon. (In a different PR, also changing the slice to iterator) I came here just for quick sync need to go AFK. |
@TheBlueMatt comment was addressed, so merging now |
This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin#662 (review)
2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak) e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak) 0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak) bc76325 Move repeated code to functions in script (Martin Habovstiak) 1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak) Pull request description: This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: #662 (review) ACKs for top commit: tcharding: ACK 2c28d3b sanket1729: ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later. Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6
This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin/rust-bitcoin#662 (review)
…pt::Instructions` 2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak) e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak) 0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak) bc76325 Move repeated code to functions in script (Martin Habovstiak) 1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak) Pull request description: This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin/rust-bitcoin#662 (review) ACKs for top commit: tcharding: ACK 2c28d3b sanket1729: ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later. Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6
Because of this change being somewhat large I didn't do it together with #658.
I also don't mind if this is merged after Taproot (feel free to put off review).
This refactors `Script::bytes_to_asm_fmt`` function to use an iterator
instead of index. Such change makes it easier to reason about overflows
or out-of-bounds accesses. As a result this also fixes three unlikely
overflows and happens to improve formatting to not output space at the
beginning in some weird cases.
To improve robustness even better it also moves
read_uint
implementation to internal function which returns a more specific error
type which can be exhaustively matched on to guarantee correct error
handling. Probably because of lack of this the code was previously
checking the same condition twice, the second time being unreachable and
attempting to behave differently than the first one.
Finally this uses macro to deduplicate code which differs only in single
number, ensuring the code stays in sync across all branches.
I also attempted to get this work with
no_panic
crate to prove absence of panic but lookslike the compiler can not do that. (Yes, I did change writer to be trivial and a bunch of other things.)
If someone wants to give it a try I can publish that change.
Unresolved question
Since
Error
seems to be related to script execution maybeUintError
should be public?At least
NumericOverflow
variant is strange.