GitHub - thu-pacman/chitu: High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

10000 GitHub - thu-pacman/chitu: High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

More Web Proxy on the site http://driver.im/

Sign in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chitu（赤兔）

最新动态

简介

测试数据

在单机八卡 H20(96G) 服务器上部署 DeepSeek-R1-671B

在两机16卡 H20(96G) 服务器集群上部署 DeepSeek-R1-671B

在 Xeon 8480P + H20(96G) 服务器上异构部署 DeepSeek-R1-671B

在 A800(40GB) 集群上部署 DeepSeek-R1-671B

在沐曦集群上部署 DeepSeek-R1-671B 和 DeepSeek-R1-Distill-Llama-70B

快速入门

从源码安装

查看支持的模型

单 GPU 推理

混合并行 (TP+PP)

启动服务

性能测试

常见问题

贡献指南

讨论

许可证

致谢

技术服务

About

Releases 9

Contributors 7

Languages

输出速率 token/s	chitu 0.3.0, 原版 FP8	chitu 0.3.0, FP4->FP8	chitu 0.3.0, FP4->BF16
bs=1	24.30	20.70	19.78
bs=16	203.71	89.56	110.68
bs=64	OOM	237.20	232.14
bs=128	OOM	360.80	351.73
MMLU 得分	89.8	88.0	88.0

输出速率 token/s	chitu 0.1.0, 原版 FP8
bs=1	22.1
bs=16	202.1
bs=256	780.3

完整放置于 GPU 的层数	GPU 卡数	output token/s (bs=1)	output token/s (bs=16)
0	1	10.61	28.16
24	2	14.04	42.57

Batchsize	6 节点, BF16	3 节点, FP8
1	29.8	22.7
4	78.8	70.1
8	129.8	108.9
16	181.4	159.0
32	244.1	214.5

License

thu-pacman/chitu

Folders and files

Latest commit

History

Repository files navigation

Chitu（赤兔）

最新动态

简介

测试数据

在单机八卡 H20(96G) 服务器上部署 DeepSeek-R1-671B

在两机16卡 H20(96G) 服务器集群上部署 DeepSeek-R1-671B

在 Xeon 8480P + H20(96G) 服务器上异构部署 DeepSeek-R1-671B

在 A800(40GB) 集群上部署 DeepSeek-R1-671B

在沐曦集群上部署 DeepSeek-R1-671B 和 DeepSeek-R1-Distill-Llama-70B

快速入门

从源码安装

查看支持的模型

单 GPU 推理

混合并行 (TP+PP)

启动服务

性能测试

常见问题

贡献指南

讨论

许可证

致谢

技术服务

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Contributors 7

Languages