Optimize slice handling to accelerate the large batch transfer operation #557
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In scenarios such as cases mentioned in add support for batch transfer to accelerate transfer operation #499 , where transfers involve a large batch size (thousands or more), but each chunk within the batch is relatively small (tens to hundreds of KiB), a substantial number of slices and work requests must be generated for the transfer. The current implementation introduces non-negligible latency due to the following issue:
RdmaTransport::submitTransferTask
, all slices must be allocated before posting toRdmaContext::submitPostSend
. When there are many requests, the volume of slices can overwhelm theThreadLocalSliceCache
, causing page faults and delaying transfer initiation.TransferRequest
length may not be a multiple ofGlobalConfig::slice_size
, leading to smaller final slices. Each slice becomes a separate work request, and when many small slices are created, overhead increases, reducing throughput.new
operator reaches a predefined watermark, even if there are still pending requests to be processed inRdmaTransport::submitTransfer
Task.Run the Python script provided by add support for batch transfer to accelerate transfer operation #499
Since the modification occurs with
RdmaTransport::submitTransferTask
, two results have been merged for comparison as followResult shows that the modification achieves 20%~30% boost in throughput.