8000 BIPXXX: Taproot Annex Format by ariard · Pull Request #1381 · bitcoin/bips · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

BIPXXX: Taproot Annex Format #1381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

ariard
Copy link
@ariard ariard commented Oct 10, 2022

This is WIP, see ML post for context.

@bitcoin bitcoin deleted a comment Nov 17, 2022
@ajtowns
Copy link
Contributor
ajtowns commented Jan 4, 2023

I think https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-October/020991.html is the mailing list post in question.

@ariard
Copy link
Author
ariard commented Jan 5, 2023

Yes current state of the TLV format discussion is here: ariard#1 and implementation here: bitcoin-inquisition/bitcoin#9

Comment on lines 143 to 146
Lengthy annex bytes stream could be given to nodes as a CPU DoS vector. Standard policy rules should be adequate
to prevent that concern. If many annex fields are considered as valid and/or their validation is expensive, a
compensation mechanism should be introduced to constrain witness producer to commit higher fees (e.g inflate witness
weight in function of annex size).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a little bit backwards -- I think it would be better to say that the annex should always be simple and fast to parse and verify (eg, only using information from the transaction, its utxos, and block headers; only requiring a single pass to parse) and that any expensive computation (such as signature validation) should be left for script evaluation.

Either way, this seems more like a "rationale" thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, putting stuff in annex is cheap for verifiers, unless new verification burdens are added.

@ariard ariard force-pushed the 2022-07-bip-annex branch from df5af5b to 6f3dcc2 Compare January 17, 2023 00:26
Co-authored-by: Anthony Towns <aj@erisian.com.au>
@ariard ariard force-pushed the 2022-07-bip-annex branch from 6f3dcc2 to 9dc3f74 Compare January 17, 2023 00:33
@ariard
Copy link
Author
ariard commented Jan 17, 2023

Updated at 6f3dcc2 with the suggestions from ariard#1.

I think I still have two issues with the current approach:

  • in case of script path spend, we might have a redeem path with <pubkey_alice> OP_CHECKSIG <pubkey_bob> OP_CHECKSIG, where Alice is committing to annex X and Bob is committing to annex Y as spending policies. The current approach by {type,length} delta encoding might prevent to combine them. I don't know if it's a use we care about, and if we should have some clawback mechanism to aggregate annexes from signers sharing the same tapscript.
  • re-using the delta for both type and length might be unpractical as the accumulating the delta for the length might have no relation at all with the size of the data item.

@ajtowns
Copy link
Contributor
ajtowns commented Jan 17, 2023
* in case of script path spend, we might have a redeem path with `<pubkey_alice> OP_CHECKSIG <pubkey_bob> OP_CHECKSIG`, where Alice is committing to annex X and Bob is committing to annex Y as spending policies. The current approach by {type,length} delta encoding might prevent to combine them. I don't know if it's a use we care about, and if we should have some clawback mechanism to aggregate annexes from signers sharing the same tapscript.

CHECKSIG can only commit to an input's annex as a whole; so in the case above either X=Y=the entire annex, or one or both of the signatures are invalid/incompatible. You'd need something like:

ANNEXLENGTH 3 EQUALVERIFY
0 PUSHANNEX 2 EQUALVERIFY TOALT
PUSHANNEX 1 EQUALVERIFY EXTRASIGDATA CHECKSIGVERIFY
RESETSIGDATA FROMALT PUSHANNEX 1 EQUALVERIFY EXTRASIGDATA CHECKSIG

spent by having the following annex (eg): {0: [15, 1], 1: [alicepolicy], 15: [bobpolicy]}, where the novel opcodes behave as:

  • annexlength tells you the number of distinct tags in the annex
  • pushannex pops n of the stack, looks up annex entry n, and pushes each value from the annex with tag n onto the stack, followed by the count (possibly 0)
  • extrasigdata pops an element off the stack, hashes it, and will commit to the (cumulative) hash in subsequent checksig operations
  • resetsigdata resets that cumulative hash

So the "annexlength" check is used to prevent malleability, then the first "0 pushannex" will put [1 15 2] on the stack (2 at the top); the second pushannex will update the stack to [alicepolicy 1], extrasigdata will ensure alice's signature commits to "alicepolicy", the final pushannex will update the stack to [bobpolicy 1], etc. You'd also need a SIGHASH_NOANNEX or similar, of course.

Alice and Bob would still need to agree on the script that defines which subset of the annex they'll each commit to; currently that obviously has to be at the time they define their shared pubkey, but even with OP_EVAL or graftroot, while they could delay that agreement, they'd still need to do it sometime. You'd need a much more generic language to allow them to each choose which parts of the annex to sign at signing time.

* re-using the delta for both type and length might be unpractical as the accumulating the delta for the length might have no relation at all with the size of the data item.

They don't need to have any relation? If the previous element had type X, size N1, and this element has type X+K and size N2, you just encode (K, N2), as:

  • if N2 < 127: K*128 + N2
  • if N2 >= 127: (K*128 + 127), (N2-127)


=== Annex validation rules ===

* If the annex does not decode successfully (that is, if read_CompressedInt() or read_bytes(length) fail due to reaching eof early); fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If the annex does not decode successfully (that is, if read_CompressedInt() or read_bytes(length) fail due to reaching eof early); fail.
* If the annex does not decode successfully (e.g., if read_CompressedInt() or read_bytes(length) fail due to reaching eof early): fail.

=== Abstract ===

This BIP describes a validation format for the taproot annex ([https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki BIP341]).
It allows to extend the usual transaction fields with new data records allowing witness signatures to commit to them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It allows to extend the usual transaction fields with new data records allowing witness signatures to commit to them.
It allows extension of the usual transaction fields with new data records allowing taproot signatures to commit to them.

Comment on lines +30 to +32
released in the early days of the network, few soft-forks occurred extending the validation semantic
of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68])
or adding whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
< 10000 tr class="border-0">
released in the early days of the network, few soft-forks occurred extending the validation semantic
of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68])
or adding whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]).
released in the early days of the network, soft-forks either extended the validation semantic
of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68])
or in one case added a whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]).


This proposal introduces a format to add new data fields in the Taproot annex. BIP341 mandates
that if a witness includes at least two elements and the first byte of the last element is 0x50,
this element is qualified as the annex. The remaining bytes semantics are defined by new validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this element is qualified as the annex. The remaining bytes semantics are defined by new validation
this element is the annex. This BIP defines the remaining bytes' semantics and validation

of use-cases. For now there is only one nLocktime field in a transaction and all inputs must share
the same value. It could be possible to define per-input lock-time enabling aggregation of off-chain
protocols transactions (e.g [https://github.com/lightning/bolts/blob/master/03-transactions.md#htlc-timeout-and-htlc-success-transactions Lightning HTLC-timeout]).
A commitment to historical block hash could be also a new annex data field to enable replay protection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A commitment to historical block hash could be also a new annex data field to enable replay protection
A commitment to a historical block hash could be a new annex data field to enable replay protection

the same value. It could be possible to define per-input lock-time enabling aggregation of off-chain
protocols transactions (e.g [https://github.com/lightning/bolts/blob/master/03-transactions.md#htlc-timeout-and-htlc-success-transactions Lightning HTLC-timeout]).
A commitment to historical block hash could be also a new annex data field to enable replay protection
in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed
in the case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed


== Specification ==

=== CompressedInt Integer Encoding ===
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any historical precedence for this kind of encoding? If so, please add reference.

Copy link
Contributor
@ajtowns ajtowns Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This encoding is from bitcoin/bitcoin@4d6144f (bitcoin/bitcoin#1677)

I don't think there's any precedence for it in other BIPs

Copy link
Contributor
@roconnor-blockstream roconnor-blockstream Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsigned LEB128?

No, it is VLQ

A commitment to historical block hash could be also a new annex data field to enable replay protection
in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed
together to enable fee-bumping batching of off-chain protocols transactions. <ref> '''What if the
use-cases require access to the annex fields by Script operations ?''' A new PUSH_ANNEX_RECORD could be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it more clear from the beginning that there are two ways to look at the annex records: out-of-script rule enforcement and in-script rule enforcement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "rules enforced directly by the presence of an annex element" (like nLockTime or nSequence making a tx invalid in context) and "interaction with script rules" (PUSH_ANNEX_RECORD behaves like OP_CLTV or OP_CSV requiring a particular nLockTime or nSequence value; or SIGHASH_GROUP proposes CHECKSIG hashing working differently depending on an annex value -- those rules should be independent of the tx's context, ie they either always pass or always fail, just as a mismatching nLockTime for your OP_CLTV will always mean the tx is invalid, no matter how long you wait before broadcasting it) ?


Rather than encoding the type directly, we encode the difference between
the previous type (initially 0), both minimising the encoding and ensuring
a canonical ordering for annex entries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonical up to reordering of entries of the same type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I think about the annex is that as a total function from non-negative integers to vectors of byte strings.

So if you're mapping type 9 to the vector ["abc", "def"] then that is distinct from mapping type 9 to the vector ["def", "abc"], and "canonical" here means there's exactly one encoding for each mapping.

In particular, I'm thinking that a script opcode to examine an annex entry might look like 9 PUSH_ANNEX 2 EQUALVERIFY -- now you have "abc" at the top of the stack and "def" beneath it. In that case the annex entries of a single type cannot always be reordered without changing their semantics.

(The current text says The annex is defined as containing an ordered set of "type, value" pairs, -- my thinking was that "ordered set" already captures the idea that reordering entries changes the semantics)

F438
== Deployment ==


== Backwards compatibility ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be empty.

@@ -0,0 +1,173 @@
<pre>
BIP: XXX
Layer: Consensus (soft fork)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks? Doesn't the requirement of this bip for another soft fork to make the field useable defeat that purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annex field gives space for future protocol expansions, but each expansion requires a soft fork in order to give it semantics.

That said, this particular format BIP in principle could alternatively be defined so that invalidly formatted annex fields do not invalidate the transaction, and instead just prevent the interpretation of the annex in this format. There would be advantages and disadvantages to such an alternative definition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could also be a policy-only relaxation(as we've done in the inquisition repo), but the same tradeoffs apply

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks? Doesn't the requirement of this bip for another soft fork to make the field useable defeat that purpose?

The idea is to have a type-length-value record where new semantics can be assigned to each record without the record developers having to think about the consensus syntax issues (e.g what if another record use multi-bytes length) though the semantic issues will be still something to reason on.

I still think even if you have a policy-only relaxation of the annex, we have to deploy a relaxation for each new record.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the most minimal change to make the annex useable on the short term. How about just defining in policy a way to express an 'unstructured' block of data for now? For example starting unstructured data with byte 0, and then later define tlv or something else in a way that the first byte is never 0?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I detailed another format that would optimize the unstructured annex data case more in https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/021756.html

In short: any annex is considered unstructured and does not incur any overhead at all, unless the annex starts with a magic byte which is to be repeated for unstructured data. Other values of the second byte are reserved for future extensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that I think tlv is great. I proposed 0-prefix for unstructured data to keep things as simple as possible initially without compromising future upgradeability, to avoid a potentially lengthy process to get to a standard for tlv.

If you don't like lengthy processes, don't extend them by proposing unnecessary alternatives when there's already a workable proposal?

Compared to this PR, what you propose adds overhead for encoding data that has consensus meaning, or where multiple items of data need to be included.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to this PR, what you propose adds overhead for encoding data that has consensus meaning, or where multiple items of data need to be included.

It indeed depends on whether you want to space-optimize for structured or for unstructured data.

If you don't like lengthy processes, don't extend them by proposing unnecessary alternatives when there's already a workable proposal?

I think it is important to explore alternatives. It maps out the design space and shows the trade-offs, which also exist in the case of tlv. Tlv isn't strictly better.

A "workable" proposal doesn't necessarily mean that it won't need to go through a lengthy process still. In my experience, starting with a simpler alternative can often expedite things and depending on future usage patterns it may even be the optimal choice.

Most important for me though is that the annex becomes usable in some form regardless of the exact space requirements. If you're saying that the tlv proposal can easily be guided through the process and enabled in policy for the next release, it's all good for me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large amounts of data, the overhead is indeed small. But there might be current or future use cases that only require smaller bits of unstructured data for which the overhead weighs more heavily. EDIT: For <127 bytes of unstructured data there is no overhead indeed in this proposal, so agree that this is hardly an argument.

If you have users leveraging smaller bits of unstructured data who cannot afford the TLV record bytes fee overhead cost, I think the economically rational position is to design a L2 system to uplift the unstructured data from on-chain to off-chain ?

Of course there is the question if you can maintain the accountability and visibility properties that your use-case is looking for with a L2 system relying on private state.

It indeed depends on whether you want to space-optimize for structured or for unstructured data.

On the question of space-optimization, in my mind if the annex is used for economically space-sensitive use-case like channel factories or payment pools in the future, even few bytes of witness waste translates in raising the economic bar to afford access to those advanced Bitcoin payment systems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to explore alternatives.

It's valuable to explore alternatives if they potentially offer benefits; but this doesn't -- it just makes one use case slightly cheaper and other uses slightly more expensive. The cost to exploring alternatives is that it delays the entire process of making a decision, which was what you were complaining about in the first place.

@ariard
Copy link
Author
ariard commented Jun 1, 2023

I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks?

There is an interesting design open question - If we could have restrained soft-fork semantics introduced by economic or proof-of-work mechanism, or with expiring enforcement. There was such an idea presented on the mailing list a while back “Automatically reverting (“transitory”) soft forks" .

Can we design the taproot annex as a safe sandbox under some consensus boundaries ?

@casey
Copy link
casey commented Sep 25, 2023

We could consider using a prefix varint, where the number of leading 1s in the initial and subsequent bytes, until a 0 is reached, determines how many additional bytes follow. The only advantage is performance, since you don't have a potential branch on every byte, and you can load data bytes directly. I don't know if that's enough of an advantage to use a less-common varint encoding, but it's worth considering. Here's a good. Hacker News post about the encoding.

@ariard
Copy link
Author
ariard commented Sep 30, 2023

Yes browsed over the hacker news post where a 128-bit prefix varint is argued to be dramatically faster to decode and encode. I think this is unclear if performance-over-space or space-over-performance should be favored (sounds classic time-space csci trade-offs), and what is the average annex payload that can be expected. Maybe performance gain is so cheap that it doesn’t matter to optimize to protect full-node CPU cycles, and favor cheap witness cost for annex users.

Note the annex policy-only discussion, where non-interactive annex composition among a set of multi-party users is weighted on.

@casey
Copy link
casey commented Sep 30, 2023

I notice that there's no maximum varint size mentioned. Would it be a good idea to restrict varints to being no greater than one of {u32, u64, u128}::MAX? (Which one depending on how large varints are expected to be. This would simplify code that has to pass around varints, since they can use a fixed-size value, instead of having to use big ints.

@ajtowns
Copy link
Contributor
ajtowns commented Nov 17, 2023

I notice that there's no maximum varint size mentioned. Would it be a good idea to restrict varints to being no greater than one of {u32, u64, u128}::MAX? (Which one depending on how large varints are expected to be. This would simplify code that has to pass around varints, since they can use a fixed-size value, instead of having to use big ints.

Particularly when combined with (a) a "PUSHANNEX" opcode that grabs an entry from the annex and puts it on the stack, or (b) signing or otherwise accessing annex data from other inputs (see inq#19), it might make sense to make the annex much more limited. For example, only allowing individual data items to be 0-127 bytes, and limiting tags to be integers between, perhaps, 0 and 2**24-1.

In that case, rather than putting 100kB of data in the annex, you'd put the 100kB on the stack and use a script like "HASH160 0 PUSHANNEX EQUALVERIFY CHECKSIG" to achieve the same result; the benefit being that other inputs that also sign your annex are only signing an extra 20 byte hash160 hash, not the 100kB of data.

Doing things with those limits would let you encode annex entries as:

  • 1 byte - bump_len
  • 3 byte optional - tag_bump (present iff (bump_len & 0x80) != 0)
  • (bump_len & 0x7F) bytes - data

So if you wanted to encode {0: [<1234>], 1: [800000, <54a0>]} you'd do it as 50 02 1234 83 010000 00350c 02 54a0, which gets decoded as 50 -- annex prefix, 02 no tag_bump, 2 bytes of data, data is hex string 1234; 83 bump the tag, 3 bytes of data, tag is bumped by 0x000001 (little endian), data is 0x0c3400 or 800,000 (little-endian), 02 don't bump the tag, 2 bytes of data, data is hex string 54a0.

(My thinking is that this way you can define tag value 1 as per-input locktime, which accepts one or two data items, if there's one data item, it just requires that that block height has been reached; if there's two data items, it requires that block height has been reached and that block's hash matches the second data item)

If you make a bad encoding, either by not having enough data or bumping the tag to 2**24 or higher, that's "invalid", either causing the tx to be invalid, or causing the data to be inaccessible via PUSHANNEX. 2**24 is 16M different tags, which is surely more than enough; but perhaps 2**16 or even 2**8 would be fine?

@ariard
Copy link
Author
ariard commented Apr 25, 2024

I’m more likely to work on the great consensus cleanup for the coming future. If someone wish to shepherd forward the taproot annex from here, feel free to ask AJ and/or me if you wish inputs as this current draft are gathering common ideas.

@jonatack
Copy link
Member

@ariard would you mind closing this pull, if you don't currently plan to work on it?

@ariard
Copy link
Author
ariard commented Apr 26, 2024

@jonatack somehow the annex is one of the best known way to fix technical debt in l2s:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-December/022198.html

when i say i don’t currently plan to work on it, it’s on the 2/3 years coming span of time.

meantime, i think it can be good to keep collecting feedbacks, or folks wanna discuss the implementation approach.

if you wish to say more on how we shall deal with BIP related to consensus changes, good.

reality they’re shepherd by a plurality of authors over very long span of time.

@ariard
Copy link
Author
ariard commented Apr 28, 2024

Closing it, I did backup of the comments for my own archive. If someone wants to grab it, feel free to do it.

@ariard ariard closed this Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0