Add developer docs to explain pagination tokens

Add developer docs to explain pagination tokens.

The comment docs explain general things around the pagination tokens well but when I was confronted with s2633508_17_338_6732159_1082514_541479_274711_265584_1, it wasn't obvious to me how to decipher it. I knew the stream_ordering (s2633508) part but it was really fuzzy what the other numbers were and the comment docs don't explain that part. I only really figured it out while drafting this issue and looking at the code more.

Relevant code:

Relevant endpoints:

/sync
- ?since
- next_batch
- prev_batch
/messages
- ?from
- ?to
- start
- end
others

Live tokens (`stream_ordering`)

synapse/synapse/types.py

Lines 436 to 437 in 3c41d87

    
               Live tokens start with an "s" followed by the "stream_ordering" id of the 
        
               event it comes after. Historic tokens start with a "t" followed by the

ex.

s2633508_17_338_6732159_1082514_541479_274711_265584_1
1. room_key: s2633508 -> 2633508 stream_ordering
2. presence_key: 17
3. typing_key: 338
4. receipt_key: 6732159
5. account_data_key: 1082514
6. push_rules_key: 541479
7. to_device_key: 274711
8. device_list_key: 265584
9. groups_key: 1
s1_33_0_1_1_1_1_7_1
s843_0_0_0_0_0_0_0_0

Each number key are concatenated together in this order:

synapse/synapse/types.py

Lines 636 to 649 in 3c41d87

    
           async def to_string(self, store: "DataStore") -> str: 
        
               return self._SEPARATOR.join( 
        
                   [ 
        
                       await self.room_key.to_string(store), 
        
                       str(self.presence_key), 
        
                       str(self.typing_key), 
        
                       str(self.receipt_key), 
        
                       str(self.account_data_key), 
        
                       str(self.push_rules_key), 
        
                       str(self.to_device_key), 
        
                       str(self.device_list_key), 
        
                       str(self.groups_key), 
        
                   ] 
        
               )

And represent the position of the various fields in the /sync response:

{
  "next_batch": "s12_4_0_1_1_1_1_4_1",
  "presence": {
    "events": [
      {
        "type": "m.presence",
        "sender": "@the-bridge-user:hs1",
        "content": {
          "presence": "offline",
          "last_active_ago": 103
        }
      }
    ]
  },
  "device_lists": {
    "changed": [
      "@alice:hs1"
    ]
  },
  "device_one_time_keys_count": {
    "signed_curve25519": 0
  },
  "org.matrix.msc2732.device_unused_fallback_key_types": [],
  "device_unused_fallback_key_types": [],
  "rooms": {
    "join": {
      "!QrZlfIDQLNLdZHqTnt:hs1": {
        "timeline": {
          "events": [],
          "prev_batch": "s10_4_0_1_1_1_1_4_1",
          "limited": false
        },
        "state": {
          "events": []
        },
        "account_data": {
          "events": []
        },
        "ephemeral": {
          "events": []
        },
        "unread_notifications": {
          "notification_count": 1,
          "highlight_count": 0
        },
        "summary": {},
        "org.matrix.msc2654.unread_count": 1
      }
    }
  }
}

Historic tokens (`topological_ordering`/`depth`)

synapse/synapse/types.py

Lines 437 to 439 in 3c41d87

    
               event it comes after. Historic tokens start with a "t" followed by the 
        
               "topological_ordering" id of the event it comes after, followed by "-", 
        
               followed by the "stream_ordering" id of the event it comes after.

t175-530_0_0_0_0_0_0_0_0
1. topological_ordering: t175 -> 175 (depth)
2. stream_ordering: 530
3. presence_key: 0
4. typing_key: 0
5. receipt_key: 0
6. account_data_key: 0
7. push_rules_key: 0
8. to_device_key: 0
9. device_list_key: 0
10. groups_key: 0
- You will see this from /messages probably because the endpoint is scoped to the room and so is depth
- topological_ordering which is the same as depth in Synapse

Min-position tokens

This one seems pretty well explained by the comment docs already:

ex. m56~2.58~3.59

synapse/synapse/types.py

Lines 441 to 461 in 3c41d87

    
               There is also a third mode for live tokens where the token starts with "m", 
        
               which is sometimes used when using sharded event persisters. In this case 
        
               the events stream is considered to be a set of streams (one for each writer) 
        
               and the token encodes the vector clock of positions of each writer in their 
        
               respective streams. 
        
               The format of the token in such case is an initial integer min position, 
        
               followed by the mapping of instance ID to position separated by '.' and '~': 
        
                   m{min_pos}~{writer1}.{pos1}~{writer2}.{pos2}. ... 
        
               The `min_pos` corresponds to the minimum position all writers have persisted 
        
               up to, and then only writers that are ahead of that position need to be 
        
               encoded. An example token is: 
        
                   m56~2.58~3.59 
        
               Which corresponds to a set of three (or more writers) where instances 2 and 
        
               3 (these are instance IDs that can be looked up in the DB to fetch the more 
        
               commonly used instance names) are at positions 58 and 59 respectively, and 
        
               all other instances are at position 56.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Live tokens (`stream_ordering`)

Historic tokens (`topological_ordering`/`depth`)

Min-position tokens

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Live tokens start with an "s" followed by the "stream_ordering" id of the
	event it comes after. Historic tokens start with a "t" followed by the

	async def to_string(self, store: "DataStore") -> str:
	return self._SEPARATOR.join(
	[
	await self.room_key.to_string(store),
	str(self.presence_key),
	str(self.typing_key),
	str(self.receipt_key),
	str(self.account_data_key),
	str(self.push_rules_key),
	str(self.to_device_key),
	str(self.device_list_key),
	str(self.groups_key),
	]
	)

	event it comes after. Historic tokens start with a "t" followed by the
	"topological_ordering" id of the event it comes after, followed by "-",
	followed by the "stream_ordering" id of the event it comes after.

	There is also a third mode for live tokens where the token starts with "m",
	which is sometimes used when using sharded event persisters. In this case
	the events stream is considered to be a set of streams (one for each writer)
	and the token encodes the vector clock of positions of each writer in their
	respective streams.

	The format of the token in such case is an initial integer min position,
	followed by the mapping of instance ID to position separated by '.' and '~':

	m{min_pos}~{writer1}.{pos1}~{writer2}.{pos2}. ...

	The `min_pos` corresponds to the minimum position all writers have persisted
	up to, and then only writers that are ahead of that position need to be
	encoded. An example token is:

	m56~2.58~3.59

	Which corresponds to a set of three (or more writers) where instances 2 and
	3 (these are instance IDs that can be looked up in the DB to fetch the more
	commonly used instance names) are at positions 58 and 59 respectively, and
	all other instances are at position 56.

Uh oh!

Description

Live tokens (stream_ordering)

Historic tokens (topological_ordering/depth)

Min-position tokens

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Live tokens (`stream_ordering`)

Historic tokens (`topological_ordering`/`depth`)