8000 Optimize slice handling to accelerate the large batch transfer operation by SCDESPERTATE · Pull Request #557 · kvcache-ai/Mooncake · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Optimize slice handling to accelerate the large batch transfer operation #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

SCDESPERTATE
Copy link
Contributor
  • Motivation
    In scenarios such as cases mentioned in add support for batch transfer to accelerate transfer operation #499 , where transfers involve a large batch size (thousands or more), but each chunk within the batch is relatively small (tens to hundreds of KiB), a substantial number of slices and work requests must be generated for the transfer. The current implementation introduces non-negligible latency due to the following issue:
    • In RdmaTransport::submitTransferTask, all slices must be allocated before posting to RdmaContext::submitPostSend. When there are many requests, the volume of slices can overwhelm the ThreadLocalSliceCache, causing page faults and delaying transfer initiation.
    • The TransferRequest length may not be a multiple of GlobalConfig::slice_size, leading to smaller final slices. Each slice becomes a separate work request, and when many small slices are created, overhead increases, reducing throughput.
  • Modification
    • Submit the slice_list when the number of slices allocated by the new operator reaches a predefined watermark, even if there are still pending requests to be processed in RdmaTransport::submitTransferTask.
    • Merge the final slice with the previous slice if its size is below a specified threshold.
  • Result
    Run the Python script provided by add support for batch transfer to accelerate transfer operation #499
    Since the modification occurs with RdmaTransport::submitTransferTask, two results have been merged for comparison as follow
============================================================================================
SUMMARY
============================================================================================
Test Case            wo-opt(s)      wo-opt(GB/s)   w-opt(GB/s)    w-opt(GB/s)    Speedup   
--------------------------------------------------------------------------------------------
200MB/5000chunks     0.008          27.921         0.007          32.471         16.29%    
200MB/10000chunks    0.011          19.145         0.008          24.819         29.63%    
300MB/8000chunks     0.012          26.849         0.010          32.463         20.90%    
400MB/10000chunks    0.015          27.521         0.013          33.013         19.95%    
500MB/15000chunks    0.021          25.069         0.017          30.392         21.23%    
600MB/12000chunks    0.021          30.130         0.018          34.885         15.78%    
700MB/20000chunks    0.029          25.571         0.024          31.191         21.97%    
700MB/10000chunks    0.025          29.294         0.019          38.352         30.91%    

Average Speedup: 22.08%
Maximum Speedup: 30.91%
Average Batch Throughput: 32.198 GB/s
Average Non-Batch Throughput: 26.438 GB/s

Result shows that the modification achieves 20%~30% boost in throughput.

@SCDESPERTATE SCDESPERTATE changed the title Optimize to accelerate the large batch transfer operation Optimize slice handling to accelerate the large batch transfer operation Jun 25, 2025
@SCDESPERTATE SCDESPERTATE marked this pull request as ready for review June 26, 2025 15:21
Copy link
Collaborator
@alogfans alogfans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let @doujiang24 double-confirm it. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0