-
Notifications
You must be signed in to change notification settings - Fork 5.6k
BIPXXX: Taproot Annex Format #1381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-October/020991.html is the mailing list post in question. |
Yes current state of the TLV format discussion is here: ariard#1 and implementation here: bitcoin-inquisition/bitcoin#9 |
bip-annex.mediawiki
Outdated
Lengthy annex bytes stream could be given to nodes as a CPU DoS vector. Standard policy rules should be adequate | ||
to prevent that concern. If many annex fields are considered as valid and/or their validation is expensive, a | ||
compensation mechanism should be introduced to constrain witness producer to commit higher fees (e.g inflate witness | ||
weight in function of annex size). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a little bit backwards -- I think it would be better to say that the annex should always be simple and fast to parse and verify (eg, only using information from the transaction, its utxos, and block headers; only requiring a single pass to parse) and that any expensive computation (such as signature validation) should be left for script evaluation.
Either way, this seems more like a "rationale" thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, putting stuff in annex is cheap for verifiers, unless new verification burdens are added.
df5af5b
to
6f3dcc2
Compare
Co-authored-by: Anthony Towns <aj@erisian.com.au>
6f3dcc2
to
9dc3f74
Compare
Updated at 6f3dcc2 with the suggestions from ariard#1. I think I still have two issues with the current approach:
|
CHECKSIG can only commit to an input's annex as a whole; so in the case above either X=Y=the entire annex, or one or both of the signatures are invalid/incompatible. You'd need something like: ANNEXLENGTH 3 EQUALVERIFY spent by having the following annex (eg): {0: [15, 1], 1: [alicepolicy], 15: [bobpolicy]}, where the novel opcodes behave as:
So the "annexlength" check is used to prevent malleability, then the first "0 pushannex" will put [1 15 2] on the stack (2 at the top); the second pushannex will update the stack to [alicepolicy 1], extrasigdata will ensure alice's signature commits to "alicepolicy", the final pushannex will update the stack to [bobpolicy 1], etc. You'd also need a SIGHASH_NOANNEX or similar, of course. Alice and Bob would still need to agree on the script that defines which subset of the annex they'll each commit to; currently that obviously has to be at the time they define their shared pubkey, but even with OP_EVAL or graftroot, while they could delay that agreement, they'd still need to do it sometime. You'd need a much more generic language to allow them to each choose which parts of the annex to sign at signing time.
They don't need to have any relation? If the previous element had type X, size N1, and this element has type X+K and size N2, you just encode (K, N2), as:
|
|
||
=== Annex validation rules === | ||
|
||
* If the annex does not decode successfully (that is, if read_CompressedInt() or read_bytes(length) fail due to reaching eof early); fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If the annex does not decode successfully (that is, if read_CompressedInt() or read_bytes(length) fail due to reaching eof early); fail. | |
* If the annex does not decode successfully (e.g., if read_CompressedInt() or read_bytes(length) fail due to reaching eof early): fail. |
=== Abstract === | ||
|
||
This BIP describes a validation format for the taproot annex ([https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki BIP341]). | ||
It allows to extend the usual transaction fields with new data records allowing witness signatures to commit to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows to extend the usual transaction fields with new data records allowing witness signatures to commit to them. | |
It allows extension of the usual transaction fields with new data records allowing taproot signatures to commit to them. |
released in the early days of the network, few soft-forks occurred extending the validation semantic | ||
of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68]) | ||
or adding whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
released in the early days of the network, few soft-forks occurred extending the validation semantic | |
of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68]) | |
or adding whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]). | |
released in the early days of the network, soft-forks either extended the validation semantic | |
of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68]) | |
or in one case added a whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]). |
|
||
This proposal introduces a format to add new data fields in the Taproot annex. BIP341 mandates | ||
that if a witness includes at least two elements and the first byte of the last element is 0x50, | ||
this element is qualified as the annex. The remaining bytes semantics are defined by new validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this element is qualified as the annex. The remaining bytes semantics are defined by new validation | |
this element is the annex. This BIP defines the remaining bytes' semantics and validation |
of use-cases. For now there is only one nLocktime field in a transaction and all inputs must share | ||
the same value. It could be possible to define per-input lock-time enabling aggregation of off-chain | ||
protocols transactions (e.g [https://github.com/lightning/bolts/blob/master/03-transactions.md#htlc-timeout-and-htlc-success-transactions Lightning HTLC-timeout]). | ||
A commitment to historical block hash could be also a new annex data field to enable replay protection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A commitment to historical block hash could be also a new annex data field to enable replay protection | |
A commitment to a historical block hash could be a new annex data field to enable replay protection |
the same value. It could be possible to define per-input lock-time enabling aggregation of off-chain | ||
protocols transactions (e.g [https://github.com/lightning/bolts/blob/master/03-transactions.md#htlc-timeout-and-htlc-success-transactions Lightning HTLC-timeout]). | ||
A commitment to historical block hash could be also a new annex data field to enable replay protection | ||
in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed | |
in the case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed |
|
||
== Specification == | ||
|
||
=== CompressedInt Integer Encoding === |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any historical precedence for this kind of encoding? If so, please add reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This encoding is from bitcoin/bitcoin@4d6144f (bitcoin/bitcoin#1677)
I don't think there's any precedence for it in other BIPs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsigned LEB128?
No, it is VLQ
A commitment to historical block hash could be also a new annex data field to enable replay protection | ||
in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed | ||
together to enable fee-bumping batching of off-chain protocols transactions. <ref> '''What if the | ||
use-cases require access to the annex fields by Script operations ?''' A new PUSH_ANNEX_RECORD could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make it more clear from the beginning that there are two ways to look at the annex records: out-of-script rule enforcement
and in-script rule enforcement
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "rules enforced directly by the presence of an annex element" (like nLockTime or nSequence making a tx invalid in context) and "interaction with script rules" (PUSH_ANNEX_RECORD behaves like OP_CLTV or OP_CSV requiring a particular nLockTime or nSequence value; or SIGHASH_GROUP proposes CHECKSIG hashing working differently depending on an annex value -- those rules should be independent of the tx's context, ie they either always pass or always fail, just as a mismatching nLockTime for your OP_CLTV will always mean the tx is invalid, no matter how long you wait before broadcasting it) ?
|
||
Rather than encoding the type directly, we encode the difference between | ||
the previous type (initially 0), both minimising the encoding and ensuring | ||
a canonical ordering for annex entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canonical up to reordering of entries of the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I think about the annex is that as a total function from non-negative integers to vectors of byte strings.
So if you're mapping type 9 to the vector ["abc", "def"]
then that is distinct from mapping type 9 to the vector ["def", "abc"]
, and "canonical" here means there's exactly one encoding for each mapping.
In particular, I'm thinking that a script opcode to examine an annex entry might look like 9 PUSH_ANNEX 2 EQUALVERIFY
-- now you have "abc" at the top of the stack and "def" beneath it. In that case the annex entries of a single type cannot always be reordered without changing their semantics.
(The current text says The annex is defined as containing an ordered set of "type, value" pairs,
-- my thinking was that "ordered set" already captures the idea that reordering entries changes the semantics)
F438 | == Deployment == | |
|
||
|
||
== Backwards compatibility == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be empty.
@@ -0,0 +1,173 @@ | |||
<pre> | |||
BIP: XXX | |||
Layer: Consensus (soft fork) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks? Doesn't the requirement of this bip for another soft fork to make the field useable defeat that purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annex field gives space for future protocol expansions, but each expansion requires a soft fork in order to give it semantics.
That said, this particular format BIP in principle could alternatively be defined so that invalidly formatted annex fields do not invalidate the transaction, and instead just prevent the interpretation of the annex in this format. There would be advantages and disadvantages to such an alternative definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could also be a policy-only relaxation(as we've done in the inquisition repo), but the same tradeoffs apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks? Doesn't the requirement of this bip for another soft fork to make the field useable defeat that purpose?
The idea is to have a type-length-value record where new semantics can be assigned to each record without the record developers having to think about the consensus syntax issues (e.g what if another record use multi-bytes length) though the semantic issues will be still something to reason on.
I still think even if you have a policy-only relaxation of the annex, we have to deploy a relaxation for each new record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the most minimal change to make the annex useable on the short term. How about just defining in policy a way to express an 'unstructured' block of data for now? For example starting unstructured data with byte 0
, and then later define tlv or something else in a way that the first byte is never 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I detailed another format that would optimize the unstructured annex data case more in https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/021756.html
In short: any annex is considered unstructured and does not incur any overhead at all, unless the annex starts with a magic byte which is to be repeated for unstructured data. Other values of the second byte are reserved for future extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than that I think tlv is great. I proposed
0
-prefix for unstructured data to keep things as simple as possible initially without compromising future upgradeability, to avoid a potentially lengthy process to get to a standard for tlv.
If you don't like lengthy processes, don't extend them by proposing unnecessary alternatives when there's already a workable proposal?
Compared to this PR, what you propose adds overhead for encoding data that has consensus meaning, or where multiple items of data need to be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to this PR, what you propose adds overhead for encoding data that has consensus meaning, or where multiple items of data need to be included.
It indeed depends on whether you want to space-optimize for structured or for unstructured data.
If you don't like lengthy processes, don't extend them by proposing unnecessary alternatives when there's already a workable proposal?
I think it is important to explore alternatives. It maps out the design space and shows the trade-offs, which also exist in the case of tlv. Tlv isn't strictly better.
A "workable" proposal doesn't necessarily mean that it won't need to go through a lengthy process still. In my experience, starting with a simpler alternative can often expedite things and depending on future usage patterns it may even be the optimal choice.
Most important for me though is that the annex becomes usable in some form regardless of the exact space requirements. If you're saying that the tlv proposal can easily be guided through the process and enabled in policy for the next release, it's all good for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For large amounts of data, the overhead is indeed small. But there might be current or future use cases that only require smaller bits of unstructured data for which the overhead weighs more heavily. EDIT: For <127 bytes of unstructured data there is no overhead indeed in this proposal, so agree that this is hardly an argument.
If you have users leveraging smaller bits of unstructured data who cannot afford the TLV record bytes fee overhead cost, I think the economically rational position is to design a L2 system to uplift the unstructured data from on-chain to off-chain ?
Of course there is the question if you can maintain the accountability and visibility properties that your use-case is looking for with a L2 system relying on private state.
It indeed depends on whether you want to space-optimize for structured or for unstructured data.
On the question of space-optimization, in my mind if the annex is used for economically space-sensitive use-case like channel factories or payment pools in the future, even few bytes of witness waste translates in raising the economic bar to afford access to those advanced Bitcoin payment systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is important to explore alternatives.
It's valuable to explore alternatives if they potentially offer benefits; but this doesn't -- it just makes one use case slightly cheaper and other uses slightly more expensive. The cost to exploring alternatives is that it delays the entire process of making a decision, which was what you were complaining about in the first place.
There is an interesting design open question - If we could have restrained soft-fork semantics introduced by economic or proof-of-work mechanism, or with expiring enforcement. There was such an idea presented on the mailing list a while back “Automatically reverting (“transitory”) soft forks" . Can we design the taproot annex as a safe sandbox under some consensus boundaries ? |
We could consider using a prefix varint, where the number of leading 1s in the initial and subsequent bytes, until a 0 is reached, determines how many additional bytes follow. The only advantage is performance, since you don't have a potential branch on every byte, and you can load data bytes directly. I don't know if that's enough of an advantage to use a less-common varint encoding, but it's worth considering. Here's a good. Hacker News post about the encoding. |
Yes browsed over the hacker news post where a 128-bit prefix varint is argued to be dramatically faster to decode and encode. I think this is unclear if performance-over-space or space-over-performance should be favored (sounds classic time-space csci trade-offs), and what is the average annex payload that can be expected. Maybe performance gain is so cheap that it doesn’t matter to optimize to protect full-node CPU cycles, and favor cheap witness cost for annex users. Note the annex policy-only discussion, where non-interactive annex composition among a set of multi-party users is weighted on. |
I notice that there's no maximum varint size mentioned. Would it be a good idea to restrict varints to being no greater than one of |
Particularly when combined with (a) a "PUSHANNEX" opcode that grabs an entry from the annex and puts it on the stack, or (b) signing or otherwise accessing annex data from other inputs (see inq#19), it might make sense to make the annex much more limited. For example, only allowing individual data items to be 0-127 bytes, and limiting tags to be integers between, perhaps, In that case, rather than putting 100kB of data in the annex, you'd put the 100kB on the stack and use a script like "HASH160 0 PUSHANNEX EQUALVERIFY CHECKSIG" to achieve the same result; the benefit being that other inputs that also sign your annex are only signing an extra 20 byte hash160 hash, not the 100kB of data. Doing things with those limits would let you encode annex entries as:
So if you wanted to encode (My thinking is that this way you can define tag value 1 as per-input locktime, which accepts one or two data items, if there's one data item, it just requires that that block height has been reached; if there's two data items, it requires that block height has been reached and that block's hash matches the second data item) If you make a bad encoding, either by not having enough data or bumping the tag to 2**24 or higher, that's "invalid", either causing the tx to be invalid, or causing the data to be inaccessible via |
I’m more likely to work on the great consensus cleanup for the coming future. If someone wish to shepherd forward the taproot annex from here, feel free to ask AJ and/or me if you wish inputs as this current draft are gathering common ideas. |
@ariard would you mind closing this pull, if you don't currently plan to work on it? |
@jonatack somehow the annex is one of the best known way to fix technical debt in l2s: when i say i don’t currently plan to work on it, it’s on the 2/3 years coming span of time. meantime, i think it can be good to keep collecting feedbacks, or folks wanna discuss the implementation approach. if you wish to say more on how we shall deal with BIP related to consensus changes, good. reality they’re shepherd by a plurality of authors over very long span of time. |
Closing it, I did backup of the comments for my own archive. If someone wants to grab it, feel free to do it. |
This is WIP, see ML post for context.