🙋🙋🙋 自制大模型推理框架,火热进行中,请加下方微信了解。
带你从零写一个支持LLama推理,支持Cuda加速的大模型框架
- google glog https://github.com/google/glog
- google gtest https://github.com/google/googletest
- sentencepiece https://github.com/google/sentencepiece
- armadillo + openblas https://arma.sourceforge.net/download.html
openblas作为armadillo的后端数学库,加速矩阵乘法等操作,也可以选用Intel-MKL
- llama2 https://pan.baidu.com/s/1PF5KqvIvNFR8yDIY1HmTYA?pwd=ma8r 或 https://huggingface.co/fushenshen/lession_model/tree/main
# 假设已经装好上述的第三方依赖
mkdir build
cd build
cmake ..
make -j16
./llama_infer llama2_7b.bin tokenizer.model