-
Notifications
You must be signed in to change notification settings - Fork 831
Hash functions for public keys and scripts #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #388 +/- ##
==========================================
- Coverage 85.55% 84.94% -0.62%
==========================================
Files 40 40
Lines 8660 7311 -1349
==========================================
- Hits 7409 6210 -1199
+ Misses 1251 1101 -150
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like to have these methods spread around. I'd be more interested in some kind of ScriptHash::from_script
constructor so all the methods are confined to the hash types. But I'm also not sure of this opinion yet.
@stevenroose I am not happy with all hash-related stuff still. Hashes are used everywhere, and even after hash type refactoring there is a lot of duplicated code (#391 (comment)) or unclear hash generation conventions ( My arguments on this matter are the following:
|
Hmm, yeah I agree semantically it makes more sense to have I don't think the engine-based hash generation needs a shortcut. It's actually pretty common to do that in many other languages: you create a hasher (engine), write data in (we use |
I'd actually even say that most languages have only a hasher by default (except obviously python) even the rust That's not to say that one line hash can't be useful. it is very useful. but I don't think that here and there doing 3 lines is that bad |
I think I can explain the pattern with the Engine. It's kind of Java-style which have appeared from Enterprise patterns and used for doing some work that require internal state, costly at initialization. So instead of initializing all internal state time after time for each operation, a class is preferred, that stores that pre-initialized state and does operation-specific tasks in form of instance functions. However, this is not the case if you initialize such engine each time, so in our situation from best practice perspective, we either have to add a class-level method for engine to do the hashing task, or work on the persistence of pre-initialized |
The reason hashes usually come in engines is different. |
However bitcoin hashes operate with a very small-sized data, "streamed" in 95% cases at once, so they do not benefit from this pattern at all. And when this is the case (like Taproot tagged hashes), the midstate is stored in a different way and embedded into the type itself... |
I think even for small values, we can benefit from this pattern. There is by definition 0 overhead of using an engine over using the So even for very small bits of data, where allocation might be small, it's still more efficient. But then there is the problem that even for small values, the compiler nor the code can estimate the size of the allocation. So in many cases there will be frequent re-allocations. The two main things that are hashed are block headers and transactions. Headers are always 80 bytes, but the compiler doesn't know that and our code also doesn't anticipate that. So here it depends on the default allocation size of For transactions, it becomes a lot more complex. There is really no way for the compiler, nor the serialization code, to know the size of a serialized tx. Transactions can range in size from a ~100 bytes to many kilobytes. While serializing them into a I really think the usage of the hash engines you mention is both standard practice and not a problem for readability or concise-ness. It's 2 lines of boilerplate instead of 1. |
So I would suggest to reduce this PR to just adding those hash methods where they are missing, using the same pattern used elsewhere. let mut engine = HashYouWant::engine();
whatever_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
more_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
HashYouWant::from_engine(engine) |
Maybe it worth adding something like macro_rules! hash {
($hash:ident, $($item:expr),+) => {
{
let mut engine = $hash::engine();
$(
$item.consensus_encode(&mut engine).expect("engines don't error");
)+
$hash::from_engine(engine)
}
}
} |
While that looks nice in theory, in many cases it won't work because also other data is pushed into the hash engine. And I feel like it's better to always do the same thing instead of have a macro that hides what's actually going on in half of the cases. With
109: let mut encoder = WitnessCommitment::engine();
110- witness_root.consensus_encode(&mut encoder).unwrap();
111- encoder.input(witness_reserved_value);
112- WitnessCommitment::from_engine(encoder)
566: let mut engine = XpubIdentifier::engine();
567- self.public_key.write_into(&mut engine);
568- XpubIdentifier::from_engine(engine)
49: let mut enc = SigHash::engine();
50- for txin in &tx.input {
51- txin.previous_output.consensus_encode(&mut enc).unwrap();
52- }
53- SigHash::from_engine(enc)
--
57: let mut enc = SigHash::engine();
58- for txin in &tx.input {
59- txin.sequence.consensus_encode(&mut enc).unwrap();
60- }
61- SigHash::from_engine(enc)
--
65: let mut enc = SigHash::engine();
66- for txout in &tx.output {
67- txout.consensus_encode(&mut enc).unwrap();
68- }
69- SigHash::from_engine(enc)
246: let mut hash_engine = PubkeyHash::engine();
247- pk.write_into(&mut hash_engine);
--
268: let mut hash_engine = WPubkeyHash::engine();
269- pk.write_into(&mut hash_engine);
--
283: let mut hash_engine = WPubkeyHash::engine();
284- pk.write_into(&mut hash_engine); |
True. Ok, agree with you approach then. Sorry for taking that long to discuss: I'd like to understand rationale for design decisions in rust-bitcoin better (to follow it in other projects on top of it), that's why keep digging :) |
We have to decide on closing this issue. While there is not a lot of use cases for the macro in The reference implementation is here: https://github.com/pandoracore/rust-bitcoin/blob/ddb3efff95d8b39ce7a6fee0514c18d1a9c785b6/src/hash_types.rs#L84 If you are up for it I'll add it to this PR instead of the current code. Otherwise I will close the request. |
My feeling is that this functionality belongs in miniscript. I'm also not thrilled about doing things to support p2sh, which I feel is insecure (only 80 bits collision resistant), and is less efficient and more complicated than p2wsh in all cases. |
I try to move it there, will share WIP |
acbef2d
to
ce5a7f3
Compare
Rebased and fixed serializaiton (now it's non-allocating one, @stevenroose; I did it as a separate commit for you to easier review) @apoelstra pls let me know if you still think it belongs to miniscript – it seems your comment was related to other PR (#387, see discussion #387 (comment)). I was going to transfer it to miniscript, but it's impossible to extend external rust types, and the function looks very native for Regarding support for p2sh, well, it is already in the library and this function adds nothing to it. |
Update: @stevenroose Sorry, after re-reviewing this I do not see where the allocation happens in directly calling |
ce5a7f3
to
e7cc8f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I skipped over the discussion about hash engines vs single-line functions, as it seems like it's moot now.
e7cc8f9
to
1342d73
Compare
@stevenroose can you pls look at this PR so we can close it? |
Introduces functions to compute bitcoin hashes for
PublicKey
andScript
which were strangely missed out