TeleTron

To pioneer training long-context multi-modal transformer models

⏱️Speed Benchmark

HunyuanVideo Training Throughput

📖Introduction

TeleTron features flexible parallel strategy and fused cuda kernels to best facilitate long-context, efficient and flexible training of multi-modal transformer models.

Long-Context
- TeleTron leverages mixed parallel strategy, activation checkpointing and fused cuda kernels at the same time to optimize GPU memory usage, so as to train HunyuanVideo with more than 30s 720P video clips.
Efficient
- With fused cuda kernels, TeleTron facilitates faster training than general training optimization libraries like DeepSpeed.
Flexible
- Training with a variety of video sequence length and model size, TeleTron supports flexible adjustment of parallel strategy among data parallel, context parallel, and/or tensor parallel.

TeleTron has released code for HunyuanVideo I2V fine-tuning and has been supporting TeleAI VAST (Video As Storyboard from Text) on high-resolution video generation training (code to be released).

⚡️QuickStart

Installation

To save efforts on environment setup, it is recommended using nvcr's 24.10-py3 container image.

# pull docker image
docker pull nvcr.io/nvidia/pytorch:24.10-py3

# start docker container
sudo docker run --gpus all -itd --shm-size 512G --name teletron  nvcr.io/nvidia/pytorch:24.10-py3 /bin/bash

# enter the container
sudo docker exec -it teletron /bin/bash

In the docker container, follow the script below to setup TeleTron.

# get TeleTron
git clone git@github.com:Tele-AI/TeleTron.git --recurse-submodule

# install requirements
pip install -r requirements.txt

# install TeleTron fused kernels 
cd teletron_op && bash install.sh && cd -

Sanity Check

The script below will run a tiny version of HunyuanVideo with fake data. It serves as a sanity check for that the environment is correctly set up.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 MASTER_PORT=12345 bash examples/hunyuanvideo/run_unified_sanity_check.sh 1 1

Training

single node training

bash examples/hunyuanvideo/run_unified.sh 2 2 9

Note that the numbers "2 2 9" above denotes TP size, CP size, and number of frames respectively. The default video resolution is 720P, and you may also alter training video resolution by adding --video-resolution {width} {height} to the training arguments in the shell script run_unified.sh.

Multi-node training

Run the script below respectively on 4 * 8 H800 nodes and 129-frame 720P training will be initiated. Note that for full finetuning you still need to download and convert HunyuanVideo pretrained weights.

bash examples/hunyuanvideo/run_unified.sh 1 4 129

🔥News

2025/5/16: TeleTron first release with code for HunyuanVideo full-finetuning and inference!

✨Features

Acknowledgement

License

Apache 2.0 License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
examples/hunyuanvideo		examples/hunyuanvideo
sample		sample
teletron		teletron
teletron_op		teletron_op
test/test_scripts		test/test_scripts
third_parties		third_parties
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TeleTron

To pioneer training long-context multi-modal transformer models

⏱️Speed Benchmark

📖Introduction

⚡️QuickStart

Installation

Sanity Check

Training

🔥News

✨Features

Acknowledgement

License

About

Releases

Packages

Languages

License

Tele-AI/TeleTron

Folders and files

Latest commit

History

Repository files navigation

TeleTron

To pioneer training long-context multi-modal transformer models

⏱️Speed Benchmark

📖Introduction

⚡️QuickStart

Installation

Sanity Check

Training

🔥News

✨Features

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages