Hash functions for public keys and scripts #388

dr-orlovsky · 2020-01-10T23:09:40Z

Introduces functions to compute bitcoin hashes for PublicKey and Script which were strangely missed out

codecov-io · 2020-01-10T23:23:25Z

Codecov Report

Merging #388 into master will decrease coverage by 0.61%.
The diff coverage is 94.28%.

@@            Coverage Diff             @@
##           master     #388      +/-   ##
==========================================
- Coverage   85.55%   84.94%   -0.62%     
==========================================
  Files          40       40              
  Lines        8660     7311    -1349     
==========================================
- Hits         7409     6210    -1199     
+ Misses       1251     1101     -150

Impacted Files	Coverage Δ
src/util/key.rs	`75.44% <100%> (+2.67%)`	⬆️
src/blockdata/script.rs	`78.65% <84.61%> (-1.11%)`	⬇️
src/network/message_blockdata.rs	`89.23% <0%> (-3.7%)`	⬇️
src/blockdata/block.rs	`78.83% <0%> (-1.59%)`	⬇️
src/util/contracthash.rs	`78.5% <0%> (-1.29%)`	⬇️
src/blockdata/transaction.rs	`93.92% <0%> (-0.94%)`	⬇️
src/util/hash.rs	`93.75% <0%> (-0.85%)`	⬇️
src/util/bip32.rs	`87.14% <0%> (-0.83%)`	⬇️
src/consensus/encode.rs	`85.15% <0%> (-0.44%)`	⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7587c4b...acbef2d. Read the comment docs.

stevenroose

I'm not sure I like to have these methods spread around. I'd be more interested in some kind of ScriptHash::from_script constructor so all the methods are confined to the hash types. But I'm also not sure of this opinion yet.

src/blockdata/script.rs

src/util/key.rs

dr-orlovsky · 2020-01-14T10:02:13Z

@stevenroose I am not happy with all hash-related stuff still. Hashes are used everywhere, and even after hash type refactoring there is a lot of duplicated code (#391 (comment)) or unclear hash generation conventions (txid, wtxid, bitcoin_hash, we got the rid of, block_hash). I would prefer to have a compatible APIs for Pubkeys/Scripts, so if we generate a hash from the data structure with *_hash method, why we need to change it now? Or if we change it, isn't it that the rest of hash generations have to be refactored, to have Txid::from_tx(&tx) instead of tx.txid()?

My arguments on this matter are the following:

Hashes are part of the bitcoin protocol rules, so there is always a definite set of hashes that need to be created out of bitcoin data structures. Mostly it's a single hash from a data structure, with the only known exclusion of Transaction. From this point of view, having the hash generation methods right in the data structures like <datatype>.<hash_type>() is more logical than delegating this to Hash type constructors with Hash::from_<datatype>. The later also seems to be more verbose (compare block.block_hash() with BlockHash::from(&block)).
If we have sicked to some hash type naming conventions (New hash data types (Txid etc) #284 (comment) and below), we shall follow the same with names for hash generation methods: txid()->Txid, block_hash()->BlockHash, wpubkey_hash()->WPubkeyHash i.e. we convert snake case of hash type into a camel case of method name with the exclusion of single-letter camel cases.

stevenroose · 2020-01-14T11:13:58Z

Hmm, yeah I agree semantically it makes more sense to have <type>.<hash-type>() methods.

I don't think the engine-based hash generation needs a shortcut. It's actually pretty common to do that in many other languages: you create a hasher (engine), write data in (we use consensus_encode) and then finalize the hasher (Hash::from_engine).

elichai · 2020-01-14T11:24:56Z

I don't think the engine-based hash generation needs a shortcut. It's actually pretty common to do that in many other languages: you create a hasher (engine), write data in (we use consensus_encode) and then finalize the hasher (Hash::from_engine).

I'd actually even say that most languages have only a hasher by default (except obviously python) even the rust sha2 crate by default shows to use the engine.

That's not to say that one line hash can't be useful. it is very useful. but I don't think that here and there doing 3 lines is that bad

dr-orlovsky · 2020-01-14T13:03:15Z

I think I can explain the pattern with the Engine. It's kind of Java-style which have appeared from Enterprise patterns and used for doing some work that require internal state, costly at initialization. So instead of initializing all internal state time after time for each operation, a class is preferred, that stores that pre-initialized state and does operation-specific tasks in form of instance functions. However, this is not the case if you initialize such engine each time, so in our situation from best practice perspective, we either have to add a class-level method for engine to do the hashing task, or work on the persistence of pre-initialized HashEngine instances with sort of singleton pattern or such.

elichai · 2020-01-14T15:03:09Z

I think I can explain the pattern with the Engine. It's kind of Java-style which have appeared from Enterprise patterns and used for doing some work that require internal state, costly at initialization. So instead of initializing all internal state time after time for each operation, a class is preferred, that stores that pre-initialized state and does operation-specific tasks in form of instance functions. However, this is not the case if you initialize such engine each time, so in our situation from best practice perspective, we either have to add a class-level method for engine to do the hashing task, or work on the persistence of pre-initialized HashEngine instances with sort of singleton pattern or such.

The reason hashes usually come in engines is different.
it meant such that you don't need to store all the data in a buffer to hash it, you can stream it into the engine and whenever the "block" is full it will run the internal hash function and continue streaming, that way you can ie easily hash a 1GB file without doing any heap allocation or something like that (taking minimal memory)

dr-orlovsky · 2020-01-14T16:13:31Z

However bitcoin hashes operate with a very small-sized data, "streamed" in 95% cases at once, so they do not benefit from this pattern at all. And when this is the case (like Taproot tagged hashes), the midstate is stored in a different way and embedded into the type itself...

stevenroose · 2020-01-15T11:26:49Z

I think even for small values, we can benefit from this pattern. There is by definition 0 overhead of using an engine over using the HashTrait::hash function, because that one uses an engine internally. So the difference is the allocation of the data when using the latter method.

So even for very small bits of data, where allocation might be small, it's still more efficient. But then there is the problem that even for small values, the compiler nor the code can estimate the size of the allocation. So in many cases there will be frequent re-allocations.

The two main things that are hashed are block headers and transactions. Headers are always 80 bytes, but the compiler doesn't know that and our code also doesn't anticipate that. So here it depends on the default allocation size of Vec<u8> to take enough space whether it will be 1 or multiple allocations.

For transactions, it becomes a lot more complex. There is really no way for the compiler, nor the serialization code, to know the size of a serialized tx. Transactions can range in size from a ~100 bytes to many kilobytes. While serializing them into a Vec<u8>, the vector will be frequently re-allocated to make more space.

I really think the usage of the hash engines you mention is both standard practice and not a problem for readability or concise-ness. It's 2 lines of boilerplate instead of 1.

stevenroose · 2020-01-15T11:28:59Z

So I would suggest to reduce this PR to just adding those hash methods where they are missing, using the same pattern used elsewhere.

let mut engine = HashYouWant::engine();
whatever_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
more_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
HashYouWant::from_engine(engine)

8000

dr-orlovsky · 2020-01-15T13:47:40Z

Maybe it worth adding something like

macro_rules! hash {
    ($hash:ident, $($item:expr),+) => {
        {
            let mut engine = $hash::engine();
            $(
                $item.consensus_encode(&mut engine).expect("engines don't error");
            )+
            $hash::from_engine(engine)
        }
    }
}

stevenroose · 2020-01-15T15:16:46Z

While that looks nice in theory, in many cases it won't work because also other data is pushed into the hash engine. And I feel like it's better to always do the same thing instead of have a macro that hides what's actually going on in half of the cases.

With rg -trust "::engine()" -A 10 I found 7 cases where we could apply your macro, and 8 cases where we couldn't:

blockdata/block.rs:

109:        let mut encoder = WitnessCommitment::engine();
110-        witness_root.consensus_encode(&mut encoder).unwrap();
111-        encoder.input(witness_reserved_value);
112-        WitnessCommitment::from_engine(encoder)

util/bip32.rs:

566:        let mut engine = XpubIdentifier::engine();
567-        self.public_key.write_into(&mut engine);
568-        XpubIdentifier::from_engine(engine)

util/bip143.rs:

49:            let mut enc = SigHash::engine();
50-            for txin in &tx.input {
51-                txin.previous_output.consensus_encode(&mut enc).unwrap();
52-            }
53-            SigHash::from_engine(enc)
--
57:            let mut enc = SigHash::engine();
58-            for txin in &tx.input {
59-                txin.sequence.consensus_encode(&mut enc).unwrap();
60-            }
61-            SigHash::from_engine(enc)
--
65:            let mut enc = SigHash::engine();
66-            for txout in &tx.output {
67-                txout.consensus_encode(&mut enc).unwrap();
68-            }
69-            SigHash::from_engine(enc)

util/address.rs:

246:        let mut hash_engine = PubkeyHash::engine();
247-        pk.write_into(&mut hash_engine);
--
268:        let mut hash_engine = WPubkeyHash::engine();
269-        pk.write_into(&mut hash_engine);
--
283:        let mut hash_engine = WPubkeyHash::engine();
284-        pk.write_into(&mut hash_engine);

dr-orlovsky · 2020-01-15T17:15:16Z

True. Ok, agree with you approach then. Sorry for taking that long to discuss: I'd like to understand rationale for design decisions in rust-bitcoin better (to follow it in other projects on top of it), that's why keep digging :)

dr-orlovsky · 2020-01-23T03:32:57Z

We have to decide on closing this issue. While there is not a lot of use cases for the macro in rust-bitcoin itself, in the library we are building for L2/L3 (https://github.com/lnp-bp/rust-lnpbp) there are some; so it's either have to go into rust-bitcoin or be kept on top of it. What would you suggest?

The reference implementation is here: https://github.com/pandoracore/rust-bitcoin/blob/ddb3efff95d8b39ce7a6fee0514c18d1a9c785b6/src/hash_types.rs#L84

If you are up for it I'll add it to this PR instead of the current code. Otherwise I will close the request.

apoelstra · 2020-01-24T19:47:56Z

My feeling is that this functionality belongs in miniscript. I'm also not thrilled about doing things to support p2sh, which I feel is insecure (only 80 bits collision resistant), and is less efficient and more complicated than p2wsh in all cases.

dr-orlovsky · 2020-01-24T22:47:49Z

I try to move it there, will share WIP

dr-orlovsky · 2020-09-03T14:54:21Z

Rebased and fixed serializaiton (now it's non-allocating one, @stevenroose; I did it as a separate commit for you to easier review)

@apoelstra pls let me know if you still think it belongs to miniscript – it seems your comment was related to other PR (#387, see discussion #387 (comment)). I was going to transfer it to miniscript, but it's impossible to extend external rust types, and the function looks very native for (W)ScriptHash and (W)PubkeyHash; moreover they are quire similar by semantics to Transaction.(w)txid(), Block.block_hash() and other functions already residing in rust-bitcoin.

Regarding support for p2sh, well, it is already in the library and this function adds nothing to it.

dr-orlovsky · 2020-09-03T22:51:40Z

Update: @stevenroose Sorry, after re-reviewing this I do not see where the allocation happens in directly calling hash when a single &[u8] is given, like in (w)pubkey_hash() - pls see my comment in the code. As for script serialization, it's true that we need to use script.as_bytes() instead of script.serialize() (this saves from vec allocation), but again, it's safe to use hash and not duplicate the code with the engine.

apoelstra

Looks good! I skipped over the discussion about hash engines vs single-line functions, as it seems like it's moot now.

dr-orlovsky · 2020-10-07T16:38:53Z

@stevenroose can you pls look at this PR so we can close it?

dr-orlovsky changed the title ~~Hashtypes fns~~ Hash functions for public keys and scripts Jan 10, 2020

dr-orlovsky mentioned this pull request Jan 10, 2020

Refactoring script generating functions into a single place #387

Merged

stevenroose reviewed Jan 13, 2020

View reviewed changes

src/blockdata/script.rs Outdated Show resolved Hide resolved

src/blockdata/script.rs Outdated Show resolved Hide resolved

src/util/key.rs Show resolved Hide resolved

src/util/key.rs Outdated Show resolved Hide resolved

stevenroose mentioned this pull request Jan 14, 2020

Revert internal BIP-143 hashes to sha256d::Hash #391

Closed

dr-orlovsky force-pushed the hashtypes-fns branch from acbef2d to ce5a7f3 Compare September 3, 2020 14:51

dr-orlovsky force-pushed the hashtypes-fns branch from ce5a7f3 to e7cc8f9 Compare September 3, 2020 22:54

apoelstra previously approved these changes Sep 9, 2020

View reviewed changes

dr-orlovsky added 3 commits September 11, 2020 15:04

Pubkey hash functions (normal and witness)

1d9f531

Script hash functions (normal and witness)

8363c76

Script hash functions with non-allocating serialization

1342d73

dr-orlovsky dismissed apoelstra’s stale review via 1342d73 September 11, 2020 13:11

dr-orlovsky force-pushed the hashtypes-fns branch from e7cc8f9 to 1342d73 Compare September 11, 2020 13:11

dr-orlovsky requested a review from apoelstra September 11, 2020 13:12

dr-orlovsky requested a review from stevenroose September 11, 2020 13:12

WPubkeyHash constructor failing on uncompressed PublicKey

ff1b4a8

dr-orlovsky mentioned this pull request Sep 12, 2020

Make upstream PRs accepted & publish v0.1.0 release on crates.io LNP-BP/rust-lnpbp#77

Closed

22 tasks

apoelstra approved these changes Oct 7, 2020

View reviewed changes

stevenroose approved these changes Oct 7, 2020

View reviewed changes

apoelstra merged commit 71bf8d7 into rust-bitcoin:master Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hash functions for public keys and scripts #388

Hash functions for public keys and scripts #388

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hash functions for public keys and scripts #388

Hash functions for public keys and scripts #388

Uh oh!

Conversation

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!