8000 Refactor Script::bytes_to_asm_fmt to use iterator by Kixunil · Pull Request #662 · rust-bitcoin/rust-bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Refactor Script::bytes_to_asm_fmt to use iterator #662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 1, 2021

Conversation

Kixunil
Copy link
Collaborator
@Kixunil Kixunil commented Sep 20, 2021

Because of this change being somewhat large I didn't do it together with #658.
I also don't mind if this is merged after Taproot (feel free to put off review).

This refactors `Script::bytes_to_asm_fmt`` function to use an iterator
instead of index. Such change makes it easier to reason about overflows
or out-of-bounds accesses. As a result this also fixes three unlikely
overflows and happens to improve formatting to not output space at the
beginning in some weird cases.

To improve robustness even better it also moves read_uint
implementation to internal function which returns a more specific error
type which can be exhaustively matched on to guarantee correct error
handling. Probably because of lack of this the code was previously
checking the same condition twice, the second time being unreachable and
attempting to behave differently than the first one.

Finally this uses macro to deduplicate code which differs only in single
number, ensuring the code stays in sync across all branches.

I also attempted to get this work with no_panic crate to prove absence of panic but looks
like the compiler can not do that. (Yes, I did change writer to be trivial and a bunch of other things.)
If someone wants to give it a try I can publish that change.

Unresolved question

Since Error seems to be related to script execution maybe UintError should be public?
At least NumericOverflow variant is strange.

apoelstra
apoelstra previously approved these changes Sep 27, 2021
Copy link
Member
@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! This is a huge improvement over the buggy C-style code.

ACK 55c6277

TheBlueMatt
TheBlueMatt previously approved these changes Sep 27, 2021
@TheBlueMatt
Copy link
Member

Would be really nice to get a fuzz test for this, though.

@apoelstra
Copy link
Member

Will let @Kixunil answer Matt's comments before merging.

@Kixunil
Copy link
Collaborator Author
Kixunil commented Sep 27, 2021

Yeah, I hoped I can get rustc to prove there's no panic which would make fuzz redundant. Sadly I was unable to do it so fuzz it will be.

@sanket1729
Copy link
Member

I would like to review this(hopefully by tomorrow). Requesting to hold off merging till then

Copy link
Collaborator
@dr-orlovsky dr-orlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit. Do not approve yet since run out of time to complete review and will do it later

@@ -154,6 +154,22 @@ impl fmt::Display for Error {
#[cfg_attr(docsrs, doc(cfg(feature = "std")))]
impl ::std::error::Error for Error {}

// Our internal error proves that we only return these two cases from `read_uint_iter`.
// Since it's private we don't bother with trait impls besides From.
enum UintError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I do not see any harm of doing derives here with just a single line. But pls do not discard other reviews just because of this one Nit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is private (at least for now) and none of the derives is needed, so let's not needlessly blow up compile time. If we decide to make it public (I'm in favor of this) there will be all applicable derives and trait impls.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"data types should eagerly implement derives" (from rust guidelines). :)

I am not insisting at all, just trying to understand the logic. Ok, the logic is not to blow the computing time.

However from my experience frequently these private error types become public and ppl forget to add derives to them - at least it cost me a lot of time to update rust-secp256k1, rust-bitcoin and minor crates with derives required to do a proper error handling on my own error types :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not add gratuitous derives. I find it confusing to read derives on private types because then I need to understand why they are needed.

If this type ever becomes public we can add derives at that time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-orlovsky I should have realized there's "public" word missing in that line when reviewing. Yeah, forgetting is unfortunate. Maybe have some checklists for various common tasks that could include this when changing a type to public? I'm thinking of something similar LND has.

dr-orlovsky
dr-orlovsky previously approved these changes Sep 28, 2021
Copy link
Collaborator
@dr-orlovsky dr-orlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 659468681743458e5467bbb92e47ee3c6f084411 with two nits

RCasatta
RCasatta previously approved these changes Sep 28, 2021
if data.len() < size {
Err(Error::EarlyEndOfScript)
Err(UintError::EarlyEndOfScript)
} else if size > usize::from(u16::max_value() / 8) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe even be more strict with size > 8 since we haven't arch greater than 64 bits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This line is clever but we probably don't get any extra generality vs just using 8 as a bound.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking about it but felt wrong to return NumericOverflow for what is actually too large input. Was also thinking of

enum UintSize {
    OneByte = 1,
    TwoBytes = 2,
    FourBytes = 4,
    EightBytes = 8,
}

This enforces sane values but is API-breaking and a bit cumbersome. Maybe worth it considering most uses will use literal anyway?

(Also, maybe "width" is better than "size"?)

@Kixunil
Copy link
Collaborator Author
Kixunil commented Sep 28, 2021

I have force-pushed all obvious style changes. Will address size checking when replied and fuzz test in a different commit, not sure if I will have time to do it today.

@Kixunil
Copy link
Collaborator Author
Kixunil commented Sep 28, 2021

CI error: error: failed to run LLVM passes: unknown pass name 'sancov'

@dr-orlovsky
Copy link
Collaborator

@Kixunil this is known issue: #667

#664 will fix that and after we will need a rebase for nearly all PRs

This refactors `Script::bytes_to_asm_fmt`` function to use an iterator
instead of index. Such change makes it easier to reason about overflows
or out-of-bounds accesses. As a result this also fixes three unlikely
overflows and happens to improve formatting to not output space at the
beginning in some weird cases.

To improve robustness even better it also moves `read_uint`
implementation to internal function which returns a more specific error
type which can be exhaustively matched on to guarantee correct error
handling. Probably because of lack of this the code was previously
checking the same condition twice, the second time being unreachable and
attempting to behave differently than the first one.

Finally this uses macro to deduplicate code which differs only in single
number, ensuring the code stays in sync across all branches.
@Kixunil
Copy link
Collaborator Author
Kixunil commented Sep 30, 2021

Rebased on top of master to include #664 and added the fuzz test. Disclaimer: didn't try to compile the fuzz test, (ab)using CI for that.

@Kixunil Kixunil force-pushed the script-fmt-iter branch 2 times, most recently from a6e6bb1 to 339664e Compare September 30, 2021 12:55
This adds fuzz target for `Script::bytes_to_asm_fmt` which could
panic due to overflow in the past. Fuzzing should decrease the risk of
other panics.
@Kixunil
Copy link
Collaborator Author
Kixunil commented Sep 30, 2021

@dr-orlovsky that was not CI related but my mistake. :)

Copy link
Collaborator
@dr-orlovsky dr-orlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 0e1b993, however extended test will be welcome (see my comment)

assert_eq!(hex_script!("4effffffff01").asm(),
" OP_PUSHDATA4 <push past end>");
"OP_PUSHDATA4 <push past end>");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need tests with > 1 push opcode to check formatting when multiple mixed instructions are present

Copy link
Member
@sanket1729 sanket1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0e1b993 as this PR as it for bytes_to_asm_fmt. Though I would like to note the panic in instructions API I mentioned previously is not addressed by this.

#658 (review)

This instructions().next().next() should panic for the last testcase in the first commit on 32 bit machines.

@Kixunil
Copy link
Collaborator Author
Kixunil commented Oct 1, 2021

@sanket1729 ah, I got confused when reading your comment previously. You're right there's a bunch of similar issues. I hope I can look at them soon. (In a different PR, also changing the slice to iterator) I came here just for quick sync need to go AFK.

@dr-orlovsky
Copy link
Collaborator

@TheBlueMatt comment was addressed, so merging now

@dr-orlovsky dr-orlovsky merged commit a961ab4 into rust-bitcoin:master Oct 1, 2021
@Kixunil Kixunil deleted the script-fmt-iter branch October 2, 2021 06:59
Kixunil added a commit to Kixunil/rust-bitcoin that referenced this pull request Apr 20, 2022
This refactors `blockdata::script::Instructions` to use
`::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express
the intention and to avoid some slicing mistakes. Similarly to a
previous change this uses a macro to deduplicate the common logic and
the new `read_uint_iter` internal function to automatically advance the
iterator.

Addresses:
rust-bitcoin#662 (review)
sanket1729 added a commit that referenced this pull request Apr 30, 2022
2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak)
e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak)
0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak)
bc76325 Move repeated code to functions in script (Martin Habovstiak)
1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak)

Pull request description:

  This refactors `blockdata::script::Instructions` to use
  `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express
  the intention and to avoid some slicing mistakes. Similarly to a
  previous change this uses a macro to deduplicate the common logic and
  the new `read_uint_iter` internal function to automatically advance the
  iterator.

  Addresses:
  #662 (review)

ACKs for top commit:
  tcharding:
    ACK 2c28d3b
  sanket1729:
    ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later.

Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6
ChallengeDev210 pushed a commit to ChallengeDev210/rust-bitcoin that referenced this pull request Aug 1, 2022
This refactors `blockdata::script::Instructions` to use
`::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express
the intention and to avoid some slicing mistakes. Similarly to a
previous change this uses a macro to deduplicate the common logic and
the new `read_uint_iter` internal function to automatically advance the
iterator.

Addresses:
rust-bitcoin/rust-bitcoin#662 (review)
ChallengeDev210 pushed a commit to ChallengeDev210/rust-bitcoin that referenced this pull request Aug 1, 2022
…pt::Instructions`

2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak)
e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak)
0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak)
bc76325 Move repeated code to functions in script (Martin Habovstiak)
1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak)

Pull request description:

  This refactors `blockdata::script::Instructions` to use
  `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express
  the intention and to avoid some slicing mistakes. Similarly to a
  previous change this uses a macro to deduplicate the common logic and
  the new `read_uint_iter` internal function to automatically advance the
  iterator.

  Addresses:
  rust-bitcoin/rust-bitcoin#662 (review)

ACKs for top commit:
  tcharding:
    ACK 2c28d3b
  sanket1729:
    ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later.

Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0