BIPXXX: Taproot Annex Format #1381

ariard · 2022-10-10T05:59:28Z

This is WIP, see ML post for context.

ajtowns · 2023-01-04T05:42:13Z

I think https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-October/020991.html is the mailing list post in question.

ariard · 2023-01-05T02:34:44Z

Yes current state of the TLV format discussion is here: ariard#1 and implementation here: bitcoin-inquisition/bitcoin#9

ajtowns · 2023-01-05T02:58:03Z

bip-annex.mediawiki

+Lengthy annex bytes stream could be given to nodes as a CPU DoS vector. Standard policy rules should be adequate
+to prevent that concern. If many annex fields are considered as valid and/or their validation is expensive, a
+compensation mechanism should be introduced to constrain witness producer to commit higher fees (e.g inflate witness
+weight in function of annex size).


I think this is a little bit backwards -- I think it would be better to say that the annex should always be simple and fast to parse and verify (eg, only using information from the transaction, its utxos, and block headers; only requiring a single pass to parse) and that any expensive computation (such as signature validation) should be left for script evaluation.

Either way, this seems more like a "rationale" thing?

Right, putting stuff in annex is cheap for verifiers, unless new verification burdens are added.

Co-authored-by: Anthony Towns <aj@erisian.com.au>

ariard · 2023-01-17T00:56:47Z

Updated at 6f3dcc2 with the suggestions from ariard#1.

I think I still have two issues with the current approach:

in case of script path spend, we might have a redeem path with <pubkey_alice> OP_CHECKSIG <pubkey_bob> OP_CHECKSIG, where Alice is committing to annex X and Bob is committing to annex Y as spending policies. The current approach by {type,length} delta encoding might prevent to combine them. I don't know if it's a use we care about, and if we should have some clawback mechanism to aggregate annexes from signers sharing the same tapscript.
re-using the delta for both type and length might be unpractical as the accumulating the delta for the length might have no relation at all with the size of the data item.

ajtowns · 2023-01-17T08:28:40Z

* in case of script path spend, we might have a redeem path with `<pubkey_alice> OP_CHECKSIG <pubkey_bob> OP_CHECKSIG`, where Alice is committing to annex X and Bob is committing to annex Y as spending policies. The current approach by {type,length} delta encoding might prevent to combine them. I don't know if it's a use we care about, and if we should have some clawback mechanism to aggregate annexes from signers sharing the same tapscript.

CHECKSIG can only commit to an input's annex as a whole; so in the case above either X=Y=the entire annex, or one or both of the signatures are invalid/incompatible. You'd need something like:

ANNEXLENGTH 3 EQUALVERIFY
0 PUSHANNEX 2 EQUALVERIFY TOALT
PUSHANNEX 1 EQUALVERIFY EXTRASIGDATA CHECKSIGVERIFY
RESETSIGDATA FROMALT PUSHANNEX 1 EQUALVERIFY EXTRASIGDATA CHECKSIG

spent by having the following annex (eg): {0: [15, 1], 1: [alicepolicy], 15: [bobpolicy]}, where the novel opcodes behave as:

annexlength tells you the number of distinct tags in the annex
pushannex pops n of the stack, looks up annex entry n, and pushes each value from the annex with tag n onto the stack, followed by the count (possibly 0)
extrasigdata pops an element off the stack, hashes it, and will commit to the (cumulative) hash in subsequent checksig operations
resetsigdata resets that cumulative hash

So the "annexlength" check is used to prevent malleability, then the first "0 pushannex" will put [1 15 2] on the stack (2 at the top); the second pushannex will update the stack to [alicepolicy 1], extrasigdata will ensure alice's signature commits to "alicepolicy", the final pushannex will update the stack to [bobpolicy 1], etc. You'd also need a SIGHASH_NOANNEX or similar, of course.

Alice and Bob would still need to agree on the script that defines which subset of the annex they'll each commit to; currently that obviously has to be at the time they define their shared pubkey, but even with OP_EVAL or graftroot, while they could delay that agreement, they'd still need to do it sometime. You'd need a much more generic language to allow them to each choose which parts of the annex to sign at signing time.

* re-using the delta for both type and length might be unpractical as the accumulating the delta for the length might have no relation at all with the size of the data item.

They don't need to have any relation? If the previous element had type X, size N1, and this element has type X+K and size N2, you just encode (K, N2), as:

if N2 < 127: K*128 + N2
if N2 >= 127: (K*128 + 127), (N2-127)

instagibbs · 2023-01-17T14:18:14Z

bip-annex.mediawiki

+
+=== Annex validation rules ===
+
+* If the annex does not decode successfully (that is, if read_CompressedInt() or read_bytes(length) fail due to reaching eof early); fail.


Suggested change

* If the annex does not decode successfully (that is, if read_CompressedInt() or read_bytes(length) fail due to reaching eof early); fail.

* If the annex does not decode successfully (e.g., if read_CompressedInt() or read_bytes(length) fail due to reaching eof early): fail.

instagibbs · 2023-01-17T14:19:18Z

bip-annex.mediawiki

+=== Abstract ===
+
+This BIP describes a validation format for the taproot annex ([https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki BIP341]).
+It allows to extend the usual transaction fields with new data records allowing witness signatures to commit to them.


Suggested change

It allows to extend the usual transaction fields with new data records allowing witness signatures to commit to them.

It allows extension of the usual transaction fields with new data records allowing taproot signatures to commit to them.

instagibbs · 2023-01-17T14:22:12Z

bip-annex.mediawiki

+released in the early days of the network, few soft-forks occurred extending the validation semantic
+of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68])
+or adding whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]). 


Suggested change

< 10000 tr class="border-0">

released in the early days of the network, few soft-forks occurred extending the validation semantic

of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68])

or adding whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]).

released in the early days of the network, soft-forks either extended the validation semantic

of some transaction fields (e.g [https://github.com/bitcoin/bips/blob/master/bip-0068.mediawiki BIP68])

or in one case added a whole new field to solve the malleability issue (e.g [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]).

instagibbs · 2023-01-17T14:23:12Z

bip-annex.mediawiki

+
+This proposal introduces a format to add new data fields in the Taproot annex. BIP341 mandates
+that if a witness includes at least two elements and the first byte of the last element is 0x50,
+this element is qualified as the annex. The remaining bytes semantics are defined by new validation


Suggested change

this element is qualified as the annex. The remaining bytes semantics are defined by new validation

this element is the annex. This BIP defines the remaining bytes' semantics and validation

instagibbs · 2023-01-17T14:25:46Z

bip-annex.mediawiki

+of use-cases. For now there is only one nLocktime field in a transaction and all inputs must share
+the same value. It could be possible to define per-input lock-time enabling aggregation of off-chain
+protocols transactions (e.g [https://github.com/lightning/bolts/blob/master/03-transactions.md#htlc-timeout-and-htlc-success-transactions Lightning HTLC-timeout]).
+A commitment to historical block hash could be also a new annex data field to enable replay protection


Suggested change

A commitment to historical block hash could be also a new annex data field to enable replay protection

A commitment to a historical block hash could be a new annex data field to enable replay protection

instagibbs · 2023-01-17T14:25:57Z

bip-annex.mediawiki

+the same value. It could be possible to define per-input lock-time enabling aggregation of off-chain
+protocols transactions (e.g [https://github.com/lightning/bolts/blob/master/03-transactions.md#htlc-timeout-and-htlc-success-transactions Lightning HTLC-timeout]).
+A commitment to historical block hash could be also a new annex data field to enable replay protection
+in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed


Suggested change

in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed

in the case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed

instagibbs · 2023-01-17T14:29:53Z

bip-annex.mediawiki

+
+== Specification ==
+
+=== CompressedInt Integer Encoding ===


Any historical precedence for this kind of encoding? If so, please add reference.

This encoding is from bitcoin/bitcoin@4d6144f (bitcoin/bitcoin#1677)

I don't think there's any precedence for it in other BIPs

~~This is unsigned LEB128?~~

No, it is VLQ

naumenkogs · 2023-01-23T09:22:29Z

bip-annex.mediawiki

+A commitment to historical block hash could be also a new annex data field to enable replay protection
+in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed
+together to enable fee-bumping batching of off-chain protocols transactions. <ref> '''What if the
+use-cases require access to the annex fields by Script operations ?''' A new PUSH_ANNEX_RECORD could be


Can you make it more clear from the beginning that there are two ways to look at the annex records: out-of-script rule enforcement and in-script rule enforcement?

Maybe "rules enforced directly by the presence of an annex element" (like nLockTime or nSequence making a tx invalid in context) and "interaction with script rules" (PUSH_ANNEX_RECORD behaves like OP_CLTV or OP_CSV requiring a particular nLockTime or nSequence value; or SIGHASH_GROUP proposes CHECKSIG hashing working differently depending on an annex value -- those rules should be independent of the tx's context, ie they either always pass or always fail, just as a mismatching nLockTime for your OP_CLTV will always mean the tx is invalid, no matter how long you wait before broadcasting it) ?

roconnor-blockstream · 2023-02-17T13:37:54Z

bip-annex.mediawiki

+
+Rather than encoding the type directly, we encode the difference between
+the previous type (initially 0), both minimising the encoding and ensuring
+a canonical ordering for annex entries.


canonical up to reordering of entries of the same type.

The way I think about the annex is that as a total function from non-negative integers to vectors of byte strings.

So if you're mapping type 9 to the vector ["abc", "def"] then that is distinct from mapping type 9 to the vector ["def", "abc"], and "canonical" here means there's exactly one encoding for each mapping.

In particular, I'm thinking that a script opcode to examine an annex entry might look like 9 PUSH_ANNEX 2 EQUALVERIFY -- now you have "abc" at the top of the stack and "def" beneath it. In that case the annex entries of a single type cannot always be reordered without changing their semantics.

(The current text says The annex is defined as containing an ordered set of "type, value" pairs, -- my thinking was that "ordered set" already captures the idea that reordering entries changes the semantics)

luke-jr · 2023-02-21T17:18:08Z

bip-annex.mediawiki

+== Deployment ==
+
+
+== Backwards compatibility ==


This can't be empty.

joostjager · 2023-05-31T11:44:15Z

bip-annex.mediawiki

@@ -0,0 +1,173 @@
+<pre>
+  BIP: XXX
+  Layer: Consensus (soft fork)


I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks? Doesn't the requirement of this bip for another soft fork to make the field useable defeat that purpose?

The annex field gives space for future protocol expansions, but each expansion requires a soft fork in order to give it semantics.

That said, this particular format BIP in principle could alternatively be defined so that invalidly formatted annex fields do not invalidate the transaction, and instead just prevent the interpretation of the annex in this format. There would be advantages and disadvantages to such an alternative definition.

There could also be a policy-only relaxation(as we've done in the inquisition repo), but the same tradeoffs apply

I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks? Doesn't the requirement of this bip for another soft fork to make the field useable defeat that purpose?

The idea is to have a type-length-value record where new semantics can be assigned to each record without the record developers having to think about the consensus syntax issues (e.g what if another record use multi-bytes length) though the semantic issues will be still something to reason on.

I still think even if you have a policy-only relaxation of the annex, we have to deploy a relaxation for each new record.

Thinking about the most minimal change to make the annex useable on the short term. How about just defining in policy a way to express an 'unstructured' block of data for now? For example starting unstructured data with byte 0, and then later define tlv or something else in a way that the first byte is never 0?

I detailed another format that would optimize the unstructured annex data case more in https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/021756.html

In short: any annex is considered unstructured and does not incur any overhead at all, unless the annex starts with a magic byte which is to be repeated for unstructured data. Other values of the second byte are reserved for future extensions.

Other than that I think tlv is great. I proposed 0-prefix for unstructured data to keep things as simple as possible initially without compromising future upgradeability, to avoid a potentially lengthy process to get to a standard for tlv.

If you don't like lengthy processes, don't extend them by proposing unnecessary alternatives when there's already a workable proposal?

Compared to this PR, what you propose adds overhead for encoding data that has consensus meaning, or where multiple items of data need to be included.

Compared to this PR, what you propose adds overhead for encoding data that has consensus meaning, or where multiple items of data need to be included.

It indeed depends on whether you want to space-optimize for structured or for unstructured data.

If you don't like lengthy processes, don't extend them by proposing unnecessary alternatives when there's already a workable proposal?

I think it is important to explore alternatives. It maps out the design space and shows the trade-offs, which also exist in the case of tlv. Tlv isn't strictly better.

A "workable" proposal doesn't necessarily mean that it won't need to go through a lengthy process still. In my experience, starting with a simpler alternative can often expedite things and depending on future usage patterns it may even be the optimal choice.

Most important for me though is that the annex becomes usable in some form regardless of the exact space requirements. If you're saying that the tlv proposal can easily be guided through the process and enabled in policy for the next release, it's all good for me.

For large amounts of data, the overhead is indeed small. But there might be current or future use cases that only require smaller bits of unstructured data for which the overhead weighs more heavily. EDIT: For <127 bytes of unstructured data there is no overhead indeed in this proposal, so agree that this is hardly an argument.

If you have users leveraging smaller bits of unstructured data who cannot afford the TLV record bytes fee overhead cost, I think the economically rational position is to design a L2 system to uplift the unstructured data from on-chain to off-chain ?

Of course there is the question if you can maintain the accountability and visibility properties that your use-case is looking for with a L2 system relying on private state.

It indeed depends on whether you want to space-optimize for structured or for unstructured data.

On the question of space-optimization, in my mind if the annex is used for economically space-sensitive use-case like channel factories or payment pools in the future, even few bytes of witness waste translates in raising the economic bar to afford access to those advanced Bitcoin payment systems.

I think it is important to explore alternatives.

It's valuable to explore alternatives if they potentially offer benefits; but this doesn't -- it just makes one use case slightly cheaper and other uses slightly more expensive. The cost to exploring alternatives is that it delays the entire process of making a decision, which was what you were complaining about in the first place.

ariard · 2023-06-01T00:53:13Z

I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks?

There is an interesting design open question - If we could have restrained soft-fork semantics introduced by economic or proof-of-work mechanism, or with expiring enforcement. There was such an idea presented on the mailing list a while back “Automatically reverting (“transitory”) soft forks" .

Can we design the taproot annex as a safe sandbox under some consensus boundaries ?

casey · 2023-09-25T19:50:10Z

We could consider using a prefix varint, where the number of leading 1s in the initial and subsequent bytes, until a 0 is reached, determines how many additional bytes follow. The only advantage is performance, since you don't have a potential branch on every byte, and you can load data bytes directly. I don't know if that's enough of an advantage to use a less-common varint encoding, but it's worth considering. Here's a good. Hacker News post about the encoding.

ariard · 2023-09-30T01:33:24Z

Yes browsed over the hacker news post where a 128-bit prefix varint is argued to be dramatically faster to decode and encode. I think this is unclear if performance-over-space or space-over-performance should be favored (sounds classic time-space csci trade-offs), and what is the average annex payload that can be expected. Maybe performance gain is so cheap that it doesn’t matter to optimize to protect full-node CPU cycles, and favor cheap witness cost for annex users.

Note the annex policy-only discussion, where non-interactive annex composition among a set of multi-party users is weighted on.

casey · 2023-09-30T02:05:35Z

I notice that there's no maximum varint size mentioned. Would it be a good idea to restrict varints to being no greater than one of {u32, u64, u128}::MAX? (Which one depending on how large varints are expected to be. This would simplify code that has to pass around varints, since they can use a fixed-size value, instead of having to use big ints.

ajtowns · 2023-11-17T05:08:13Z

I notice that there's no maximum varint size mentioned. Would it be a good idea to restrict varints to being no greater than one of {u32, u64, u128}::MAX? (Which one depending on how large varints are expected to be. This would simplify code that has to pass around varints, since they can use a fixed-size value, instead of having to use big ints.

Particularly when combined with (a) a "PUSHANNEX" opcode that grabs an entry from the annex and puts it on the stack, or (b) signing or otherwise accessing annex data from other inputs (see inq#19), it might make sense to make the annex much more limited. For example, only allowing individual data items to be 0-127 bytes, and limiting tags to be integers between, perhaps, 0 and 2**24-1.

In that case, rather than putting 100kB of data in the annex, you'd put the 100kB on the stack and use a script like "HASH160 0 PUSHANNEX EQUALVERIFY CHECKSIG" to achieve the same result; the benefit being that other inputs that also sign your annex are only signing an extra 20 byte hash160 hash, not the 100kB of data.

Doing things with those limits would let you encode annex entries as:

1 byte - bump_len
3 byte optional - tag_bump (present iff (bump_len & 0x80) != 0)
(bump_len & 0x7F) bytes - data

So if you wanted to encode {0: [<1234>], 1: [800000, <54a0>]} you'd do it as 50 02 1234 83 010000 00350c 02 54a0, which gets decoded as 50 -- annex prefix, 02 no tag_bump, 2 bytes of data, data is hex string 1234; 83 bump the tag, 3 bytes of data, tag is bumped by 0x000001 (little endian), data is 0x0c3400 or 800,000 (little-endian), 02 don't bump the tag, 2 bytes of data, data is hex string 54a0.

(My thinking is that this way you can define tag value 1 as per-input locktime, which accepts one or two data items, if there's one data item, it just requires that that block height has been reached; if there's two data items, it requires that block height has been reached and that block's hash matches the second data item)

If you make a bad encoding, either by not having enough data or bumping the tag to 2**24 or higher, that's "invalid", either causing the tx to be invalid, or causing the data to be inaccessible via PUSHANNEX. 2**24 is 16M different tags, which is surely more than enough; but perhaps 2**16 or even 2**8 would be fine?

ariard · 2024-04-25T07:00:49Z

I’m more likely to work on the great consensus cleanup for the coming future. If someone wish to shepherd forward the taproot annex from here, feel free to ask AJ and/or me if you wish inputs as this current draft are gathering common ideas.

jonatack · 2024-04-25T14:31:18Z

@ariard would you mind closing this pull, if you don't currently plan to work on it?

ariard · 2024-04-26T23:46:45Z

@jonatack somehow the annex is one of the best known way to fix technical debt in l2s:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-December/022198.html

when i say i don’t currently plan to work on it, it’s on the 2/3 years coming span of time.

meantime, i think it can be good to keep collecting feedbacks, or folks wanna discuss the implementation approach.

if you wish to say more on how we shall deal with BIP related to consensus changes, good.

reality they’re shepherd by a plurality of authors over very long span of time.

ariard · 2024-04-28T21:24:19Z

Closing it, I did backup of the comments for my own archive. If someone wants to grab it, feel free to do it.

bitcoin deleted a comment Nov 17, 2022

manda2020panda approved these changes Dec 11, 2022

View reviewed changes

ajtowns reviewed Jan 5, 2023

View reviewed changes

ariard force-pushed the 2022-07-bip-annex branch from df5af5b to 6f3dcc2 Compare January 17, 2023 00:26

Add BIP annex.

9dc3f74

Co-authored-by: Anthony Towns <aj@erisian.com.au>

ariard force-pushed the 2022-07-bip-annex branch from 6f3dcc2 to 9dc3f74 Compare January 17, 2023 00:33

ariard mentioned this pull request Jan 17, 2023

Use serialize.h::VarInt for encoding ariard/bips#1

Closed

instagibbs reviewed Jan 17, 2023

View reviewed changes

ariard mentioned this pull request Jan 21, 2023

Always Look On The Bright Side of the Annex bitcoin-inquisition/bitcoin#9

Closed

naumenkogs reviewed Jan 23, 2023

View reviewed changes

ajtowns mentioned this pull request Feb 17, 2023

Add annex data carrier option behind -annexcarrier option bitcoin-inquisition/bitcoin#22

Merged

roconnor-blockstream reviewed Feb 17, 2023

View reviewed changes

luke-jr requested changes Feb 21, 2023

View reviewed changes

bip-annex.mediawiki

F438

== Deployment ==

== Backwards compatibility ==

Copy link

Member

luke-jr Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be empty.

ajtowns mentioned this pull request May 29, 2023

Allow accepting non-standard transactions on mainnet via local rpc bitcoin/bitcoin#27768

Closed

joostjager reviewed May 31, 2023

View reviewed changes

joostjager mentioned this pull request Jun 21, 2023

policy: make unstructured annex standard bitcoin/bitcoin#27926

Closed

1 task

luke-jr marked this pull request as draft June 29, 2023 17:51

luke-jr added the New BIP label Jun 29, 2023

This was referenced Jul 19, 2023

Taproot Annex saopaulobitdevs/saopaulobitdevs.org#6

Closed

Topicos Julho/2023 saopaulobitdevs/saopaulobitdevs.org#5

Closed

casey mentioned this pull request Nov 13, 2023

Store inscriptions in taproot signature annex ordinals/ord#2405

Open

ariard closed this Apr 28, 2024


		=== Annex validation rules ===

		* If the annex does not decode successfully (that is, if read_CompressedInt() or read_bytes(length) fail due to reaching eof early); fail.

	It allows to extend the usual transaction fields with new data records allowing witness signatures to commit to them.
	It allows extension of the usual transaction fields with new data records allowing taproot signatures to commit to them.

	this element is qualified as the annex. The remaining bytes semantics are defined by new validation
	this element is the annex. This BIP defines the remaining bytes' semantics and validation

	A commitment to historical block hash could be also a new annex data field to enable replay protection
	A commitment to a historical block hash could be a new annex data field to enable replay protection

	in case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed
	in the case of persisting forks. Another use-case, a group of input-outputs could be bundled and signed

BIPXXX: Taproot Annex Format #1381

BIPXXX: Taproot Annex Format #1381

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!