Ditto - Direct Torch to TensorRT-LLM Optimizer

Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines. Normally, building a TensorRT-LLM engine consists of two steps - checkpoint conversion and trtllm-build - both of which rely on pre-defined model architectures. As a result, converting a novel model requires porting the model with TensorRT-LLM's Python API and writing a custom checkpoint conversion script. By automating these dull procedures, Ditto aims to make TensorRT-LLM more accessible to the broader AI community.

Latest News

[2025/02] Blog post introducing Ditto is published! [Blog]
[2025/02] Ditto 0.1.0 released!
[2025/04] Ditto 0.2.0 released with new features - MoE, Quantization

Getting Started

Key Advantages

Ease-of-use: Ditto enables users to convert models with a single command.
```
ditto build <huggingface-model-name>
```
Enables conversion of novel model architectures into TensorRT engines, including models that are not supported in TensorRT-LLM due to the absence of checkpoint conversion scripts.
- For example, as of the publication date of this document (February 10, 2025), Helium is supported in Ditto, while it is not in TensorRT-LLM. (Note that you need to re-install transformers nightly-build after installing Ditto as pip install git+https://github.com/huggingface/transformers.git)
Directly converts quantized HuggingFace models. (Future Work)

Benchmarks

We have conducted comprehensive benchmarks for both output quality and inference performance to validate the conversion process of Ditto. Llama3.3-70B-Instruct, Llama3.1-8B-Instruct, and Helium1-preview-2B were used for the benchmarks and all benchmarks were performed with both GEMM and GPT attention plugins enabled.

Quality

We used TensorRT-LLM llmapi integrated with lm-evaluation-harness for quality evaluation. For Helium model, ifeval task was excluded since it is not an instruction model.

		MMLU (Accuracy)	wikitext2 (PPL)	gpqa_main _zeroshot (Accuracy)	arc_challenge (Accuracy)	ifeval (Accuracy)
Llama3.3-70B-Instruct	Ditto	0.819	3.96	0.507	0.928	0.915
Llama3.3-70B-Instruct	TRT-LLM	0.819	3.96	0.507	0.928	0.915
Llama3.1-8B-Instruct	Ditto	0.680	8.64	0.350	0.823	0.815
Llama3.1-8B-Instruct	TRT-LLM	0.680	8.64	0.350	0.823	0.815
Helium1-preview-2B	Ditto	0.486	11.37	0.263	0.578	-
Helium1-preview-2B	TRT-LLM	Not Supported

NOTE: All tasks were tested as 0-shot.

Throughput

Performance benchmarks were conducted using TensorRT-LLM gptManagerBenchmark. A100 in the table represents A100-SXM4-80GB.

		TP	A100 (token/sec)	A6000 (token/sec)	L40 (token/sec)
Llama3.3-70B-Instruct	Ditto	4	1759.2	-	-
Llama3.3-70B-Instruct	TRT-LLM	4	1751.6	-	-
Llama3.1-8B-Instruct	Ditto	1	3357.9	1479.8	1085.2
Llama3.1-8B-Instruct	TRT-LLM	1	3318.0	1508.6	1086.5
Helium1-preview-2B	Ditto	1	-	1439.5	1340.5
Helium1-preview-2B	TRT-LLM	1	Not Supported

Support Matrix

Models

Llama2-7B
Llama3-8B
LLama3.1-8B
Llama3.2
Llama3.3-70B
Mistral-7B
Gemma2-9B
Phi4
Phi3.5-mini
Qwen2-7B
Codellama
Codestral
ExaOne3.5-8B
aya-expanse-8B
Llama-DNA-1.0-8B
SOLAR-10.7B
Falcon
Nemotron
42dot_LLM-SFT-1.3B
Helium1-2B
Sky-T1-32B
SmolLM2-1.7B
Mixtral-8x7B
Qwen-MoE
DeepSeek-V1, V2
and many others that we haven't tested yet

Features

Multi LoRA
Tensor Parallelism / Pipeline Parallelism
Mixture of Experts
Quantization - Weight-only & FP8 (AutoAWQ, AutoGPTQ, Compressed Tensors)

What's Next?

Below features are planned to be supported in Ditto in the near future. Feel free to reach out if you have any questions or suggestions.

Additional Quantization Support
Expert Parallelism
Multimodal
Speculative Decoding
Prefix Caching
State Space Model
Encode-Decoder Model

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
conda		conda
docker		docker
docs		docs
scripts		scripts
src/ditto		src/ditto
.gitignore		.gitignore
.mypy.ini		.mypy.ini
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.ruff.toml		.ruff.toml
CREDITS		CREDITS
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements-lint.txt		requirements-lint.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ditto - Direct Torch to TensorRT-LLM Optimizer

Latest News

Getting Started

Key Advantages

Benchmarks

Quality

Throughput

Support Matrix

Models

Features

What's Next?

References

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 4

Languages

License

SqueezeBits/Torch-TRTLLM

Folders and files

Latest commit

History

Repository files navigation

Ditto - Direct Torch to TensorRT-LLM Optimizer

Latest News

Getting Started

Key Advantages

Benchmarks

Quality

Throughput

Support Matrix

Models

Features

What's Next?

References

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 4

Languages

Packages