8000 Hash functions for public keys and scripts by dr-orlovsky · Pull Request #388 · rust-bitcoin/rust-bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Hash functions for public keys and scripts #388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 7, 2020

Conversation

dr-orlovsky
Copy link
Collaborator

Introduces functions to compute bitcoin hashes for PublicKey and Script which were strangely missed out

@dr-orlovsky dr-orlovsky changed the title Hashtypes fns Hash functions for public keys and scripts Jan 10, 2020
@codecov-io
Copy link

Codecov Report

Merging #388 into master will decrease coverage by 0.61%.
The diff coverage is 94.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #388      +/-   ##
==========================================
- Coverage   85.55%   84.94%   -0.62%     
==========================================
  Files          40       40              
  Lines        8660     7311    -1349     
==========================================
- Hits         7409     6210    -1199     
+ Misses       1251     1101     -150
Impacted Files Coverage Δ
src/util/key.rs 75.44% <100%> (+2.67%) ⬆️
src/blockdata/script.rs 78.65% <84.61%> (-1.11%) ⬇️
src/network/message_blockdata.rs 89.23% <0%> (-3.7%) ⬇️
src/blockdata/block.rs 78.83% <0%> (-1.59%) ⬇️
src/util/contracthash.rs 78.5% <0%> (-1.29%) ⬇️
src/blockdata/transaction.rs 93.92% <0%> (-0.94%) ⬇️
src/util/hash.rs 93.75% <0%> (-0.85%) ⬇️
src/util/bip32.rs 87.14% <0%> (-0.83%) ⬇️
src/consensus/encode.rs 85.15% <0%> (-0.44%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7587c4b...acbef2d. Read the comment docs.

Copy link
Collaborator
@stevenroose stevenroose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like to have these methods spread around. I'd be more interested in some kind of ScriptHash::from_script constructor so all the methods are confined to the hash types. But I'm also not sure of this opinion yet.

@dr-orlovsky
Copy link
Collaborator Author
dr-orlovsky commented Jan 14, 2020

@stevenroose I am not happy with all hash-related stuff still. Hashes are used everywhere, and even after hash type refactoring there is a lot of duplicated code (#391 (comment)) or unclear hash generation conventions (txid, wtxid, bitcoin_hash, we got the rid of, block_hash). I would prefer to have a compatible APIs for Pubkeys/Scripts, so if we generate a hash from the data structure with *_hash method, why we need to change it now? Or if we change it, isn't it that the rest of hash generations have to be refactored, to have Txid::from_tx(&tx) instead of tx.txid()?

My arguments on this matter are the following:

  • Hashes are part of the bitcoin protocol rules, so there is always a definite set of hashes that need to be created out of bitcoin data structures. Mostly it's a single hash from a data structure, with the only known exclusion of Transaction. From this point of view, having the hash generation methods right in the data structures like <datatype>.<hash_type>() is more logical than delegating this to Hash type constructors with Hash::from_<datatype>. The later also seems to be more verbose (compare block.block_hash() with BlockHash::from(&block)).
  • If we have sicked to some hash type naming conventions (New hash data types (Txid etc) #284 (comment) and below), we shall follow the same with names for hash generation methods: txid()->Txid, block_hash()->BlockHash, wpubkey_hash()->WPubkeyHash i.e. we convert snake case of hash type into a camel case of method name with the exclusion of single-letter camel cases.

@stevenroose
Copy link
Collaborator

Hmm, yeah I agree semantically it makes more sense to have <type>.<hash-type>() methods.

I don't think the engine-based hash generation needs a shortcut. It's actually pretty common to do that in many other languages: you create a hasher (engine), write data in (we use consensus_encode) and then finalize the hasher (Hash::from_engine).

@elichai
Copy link
Member
elichai commented Jan 14, 2020

I don't think the engine-based hash generation needs a shortcut. It's actually pretty common to do that in many other languages: you create a hasher (engine), write data in (we use consensus_encode) and then finalize the hasher (Hash::from_engine).

I'd actually even say that most languages have only a hasher by default (except obviously python) even the rust sha2 crate by default shows to use the engine.

That's not to say that one line hash can't be useful. it is very useful. but I don't think that here and there doing 3 lines is that bad

@dr-orlovsky
Copy link
Collaborator Author

I think I can explain the pattern with the Engine. It's kind of Java-style which have appeared from Enterprise patterns and used for doing some work that require internal state, costly at initialization. So instead of initializing all internal state time after time for each operation, a class is preferred, that stores that pre-initialized state and does operation-specific tasks in form of instance functions. However, this is not the case if you initialize such engine each time, so in our situation from best practice perspective, we either have to add a class-level method for engine to do the hashing task, or work on the persistence of pre-initialized HashEngine instances with sort of singleton pattern or such.

@elichai
Copy link
Member
elichai commented Jan 14, 2020

I think I can explain the pattern with the Engine. It's kind of Java-style which have appeared from Enterprise patterns and used for doing some work that require internal state, costly at initialization. So instead of initializing all internal state time after time for each operation, a class is preferred, that stores that pre-initialized state and does operation-specific tasks in form of instance functions. However, this is not the case if you initialize such engine each time, so in our situation from best practice perspective, we either have to add a class-level method for engine to do the hashing task, or work on the persistence of pre-initialized HashEngine instances with sort of singleton pattern or such.

The reason hashes usually come in engines is different.
it meant such that you don't need to store all the data in a buffer to hash it, you can stream it into the engine and whenever the "block" is full it will run the internal hash function and continue streaming, that way you can ie easily hash a 1GB file without doing any heap allocation or something like that (taking minimal memory)

@dr-orlovsky
Copy link
Collaborator Author

However bitcoin hashes operate with a very small-sized data, "streamed" in 95% cases at once, so they do not benefit from this pattern at all. And when this is the case (like Taproot tagged hashes), the midstate is stored in a different way and embedded into the type itself...

@stevenroose
Copy link
Collaborator

I think even for small values, we can benefit from this pattern. There is by definition 0 overhead of using an engine over using the HashTrait::hash function, because that one uses an engine internally. So the difference is the allocation of the data when using the latter method.

So even for very small bits of data, where allocation might be small, it's still more efficient. But then there is the problem that even for small values, the compiler nor the code can estimate the size of the allocation. So in many cases there will be frequent re-allocations.

The two main things that are hashed are block headers and transactions. Headers are always 80 bytes, but the compiler doesn't know that and our code also doesn't anticipate that. So here it depends on the default allocation size of Vec<u8> to take enough space whether it will be 1 or multiple allocations.

For transactions, it becomes a lot more complex. There is really no way for the compiler, nor the serialization code, to know the size of a serialized tx. Transactions can range in size from a ~100 bytes to many kilobytes. While serializing them into a Vec<u8>, the vector will be frequently re-allocated to make more space.

I really think the usage of the hash engines you mention is both standard practice and not a problem for readability or concise-ness. It's 2 lines of boilerplate instead of 1.

@stevenroose
Copy link
Collaborator

So I would suggest to reduce this PR to just adding those hash methods where they are missing, using the same pattern used elsewhere.

let mut engine = HashYouWant::engine();
whatever_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
more_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
HashYouWant::from_engine(engine)
8000

@dr-orlovsky
Copy link
Collaborator Author

Maybe it worth adding something like

macro_rules! hash {
    ($hash:ident, $($item:expr),+) => {
        {
            let mut engine = $hash::engine();
            $(
                $item.consensus_encode(&mut engine).expect("engines don't error");
            )+
            $hash::from_engine(engine)
        }
    }
}

@stevenroose
Copy link
Collaborator

While that looks nice in theory, in many cases it won't work because also other data is pushed into the hash engine. And I feel like it's better to always do the same thing instead of have a macro that hides what's actually going on in half of the cases.

With rg -trust "::engine()" -A 10 I found 7 cases where we could apply your macro, and 8 cases where we couldn't:

  • blockdata/block.rs:
109:        let mut encoder = WitnessCommitment::engine();
110-        witness_root.consensus_encode(&mut encoder).unwrap();
111-        encoder.input(witness_reserved_value);
112-        WitnessCommitment::from_engine(encoder)
  • util/bip32.rs:
566:        let mut engine = XpubIdentifier::engine();
567-        self.public_key.write_into(&mut engine);
568-        XpubIdentifier::from_engine(engine)
  • util/bip143.rs:
49:            let mut enc = SigHash::engine();
50-            for txin in &tx.input {
51-                txin.previous_output.consensus_encode(&mut enc).unwrap();
52-            }
53-            SigHash::from_engine(enc)
--
57:            let mut enc = SigHash::engine();
58-            for txin in &tx.input {
59-                txin.sequence.consensus_encode(&mut enc).unwrap();
60-            }
61-            SigHash::from_engine(enc)
--
65:            let mut enc = SigHash::engine();
66-            for txout in &tx.output {
67-                txout.consensus_encode(&mut enc).unwrap();
68-            }
69-            SigHash::from_engine(enc)
  • util/address.rs:
246:        let mut hash_engine = PubkeyHash::engine();
247-        pk.write_into(&mut hash_engine);
--
268:        let mut hash_engine = WPubkeyHash::engine();
269-        pk.write_into(&mut hash_engine);
--
283:        let mut hash_engine = WPubkeyHash::engine();
284-        pk.write_into(&mut hash_engine);

@dr-orlovsky
Copy link
Collaborator Author

True. Ok, agree with you approach then. Sorry for taking that long to discuss: I'd like to understand rationale for design decisions in rust-bitcoin better (to follow it in other projects on top of it), that's why keep digging :)

@dr-orlovsky
Copy link
Collaborator Author
dr-orlovsky commented Jan 23, 2020

We have to decide on closing this issue. While there is not a lot of use cases for the macro in rust-bitcoin itself, in the library we are building for L2/L3 (https://github.com/lnp-bp/rust-lnpbp) there are some; so it's either have to go into rust-bitcoin or be kept on top of it. What would you suggest?

The reference implementation is here: https://github.com/pandoracore/rust-bitcoin/blob/ddb3efff95d8b39ce7a6fee0514c18d1a9c785b6/src/hash_types.rs#L84

If you are up for it I'll add it to this PR instead of the current code. Otherwise I will close the request.

@apoelstra
Copy link
Member

My feeling is that this functionality belongs in miniscript. I'm also not thrilled about doing things to support p2sh, which I feel is insecure (only 80 bits collision resistant), and is less efficient and more complicated than p2wsh in all cases.

@dr-orlovsky
Copy link
Collaborator Author

I try to move it there, will share WIP

@dr-orlovsky
Copy link
Collaborator Author
dr-orlovsky commented Sep 3, 2020

Rebased and fixed serializaiton (now it's non-allocating one, @stevenroose; I did it as a separate commit for you to easier review)

@apoelstra pls let me know if you still think it belongs to miniscript – it seems your comment was related to other PR (#387, see discussion #387 (comment)). I was going to transfer it to miniscript, but it's impossible to extend external rust types, and the function looks very native for (W)ScriptHash and (W)PubkeyHash; moreover they are quire similar by semantics to Transaction.(w)txid(), Block.block_hash() and other functions already residing in rust-bitcoin.

Regarding support for p2sh, well, it is already in the library and this function adds nothing to it.

@dr-orlovsky
Copy link
Collaborator Author

Update: @stevenroose Sorry, after re-reviewing this I do not see where the allocation happens in directly calling hash when a single &[u8] is given, like in (w)pubkey_hash() - pls see my comment in the code. As for script serialization, it's true that we need to use script.as_bytes() instead of script.serialize() (this saves from vec allocation), but again, it's safe to use hash and not duplicate the code with the engine.

6D40
apoelstra
apoelstra previously approved these changes Sep 9, 2020
Copy link
Member
@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I skipped over the discussion about hash engines vs single-line functions, as it seems like it's moot now.

@dr-orlovsky
Copy link
Collaborator Author

@stevenroose can you pls look at this PR so we can close it?

@apoelstra apoelstra merged commit 71bf8d7 into rust-bitcoin:master Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0