git clone https://github.com/xxyux/SpInfer.git
cd SpInfer
git submodule update --init --recursive
source Init_SpInfer.sh
cd $SpInfer_HOME/third_party/FasterTransformer && git apply ../ft_spinfer.patch
cd $SpInfer_HOME/third_party/sputnik && git apply ../sputnik.patch
- Requirements:
Ubuntu 16.04+
gcc >= 7.3
cmake >= 3.30.3
CUDA >= 12.2
andnvcc >= 12.0
- NVIDIA GPU with
sm >= 80
(i.e., Ampere-A6000 and Ada -RTX4090).
- 2.1 Install
conda
on system Toturial. - 2.2 Create a
conda
environment:
cd $SpInfer_HOME
conda env create -f spinfer.yml
conda activate spinfer
The libSpMM_API.so and SpMM_API.cuh will be available for easy integration after:
cd $SpInfer_HOME/build && make -j
- Build Sputnik.
cd $SpInfer_HOME/third_party/
source build_sputnik.sh
- Build SparTA.
cd $SpInfer_HOME/third_party/
source preparse_cusparselt.sh
- Reproduce Figure 10.
cd $SpInfer_HOME/kernel_benchmark
source test_env
make -j
source benchmark.sh
Check the results in raw csv files and the reproduced Figure10.png (Fig. 10).
Follow the steps in SpInfer/docs/LLMInferenceExample
- Building Faster-Transformer with (SpInfer, Flash-llm or Standard) integration
- Downloading & Converting OPT models
- Configuration Note: Model_dir is different for SpInfer, Flash-llm and Faster-Transformer.
cd $SpInfer_HOME/third_party/
bash run_1gpu_loop.sh
- Check the results (Fig.13/14) in
$SpInfer_HOME/third_party/FasterTransformer/OutputFile_1gpu_our_60_inlen64/
- Test tensor_para_size=2 using
bash run_2gpu_loop.sh
- Test tensor_para_size=4 using
bash run_4gpu_loop.sh
cd $FlashLLM_HOME/third_party/
bash run_1gpu_loop.sh
- Check the results in
$FlashLLM_HOME/third_party/FasterTransformer/OutputFile_1gpu_our_60_inlen64/
- Test tensor_para_size=1 using
bash run_1gpu_loop.sh
cd $FT_HOME/third_party/
bash run_2gpu_loop.sh
- Check the results in
$FT_HOME/FasterTransformer/OutputFile_2gpu_our_60_inlen64/
cd $SpInfer_HOME/end2end_inference/ds_scripts
pip install -r requirements.txt
bash run_ds_loop.sh
- Check the results in
$SpInfer_HOME/end2end_inference/ds_scripts/ds_result/