Description
If a bucket becomes huge, its sending during rebalancing locks TX-thread while it is collected from all spaces, encoded and sent.
The proposal: before sending bucket, set it into SENDING state. On all sharded spaces set on_replace triggers, which will collect all changes of this bucket. Start reading bucket in batches (by 10k tuples) and send them to a receiver, which immediately applies them. With each batch send not only read bucket tuples, but also tuples, collected by on_replace triggers. When all data is read, and there are only updates, lock the bucket for write requests, send updates, and make bucket be SENT.
The only problem I see here is that on_replace can remember new/old tuple, but can not detect, if a transaction, created these tuples, is aborted after on_replace. To check, that an update is actually commited, bucket_send needs to extract primary key and make lookup in a primary index to ensure the tuple is really exists. It is too slow.
Alternative - create a special temporary space with four columns only: bucket_id, space_id, type, tuple. Insert in this space all updates in on_replace trigger instead of in lua table. And if a transaction will be aborted, then this updates will not be inserted in this space. Here field 'type' is used to distinguish REPLACE and DELETE.