Video-T3

Temporal Reasoning Transfer from Text to Video

Lei Li^1* Yuanxin Liu^2* Linli Yao² Peiyuan Zhang³
Chenxin An¹ Lean Wang² Xu Sun² Lingpeng Kong¹ Qi Liu¹

¹The University of Hong Kong ²Peking University ³UCSD

^*Equal Contribution

Video-T3

This repo contains official implementation of our paper "Temporal Reasoning Transfer from Text to Video"

Video LLM Temporal Bootleneck Probing

Please refer to ./probing for details.

SFT Data Preparation

For LongVA experiments, we mix the Open-LLaVA-NeXT dataset with our T3 dataset. The data mixing process is implemented in t3_sft/data_creation.py. The script handles:

Loading and processing Open-LLaVA-NeXT data
Loading our Video T3 dataset containing various aspects

The data mixing script allows for:

Customizable dataset ratios (see Table 2 of the main paper and Figure 9 of Appendix for the best practice of mixing datasets.)
Text length filtering (to avoid OOM when GPU memory is limited )
Token length analysis and visualization

SFT Training

LongVA Training

We use the LongVA codebase for training LongVA models. Please setup the environments according to LongVA. The training script is located at t3_sft/longva_exp/longva_t3.sh.

Qwen2VL Models

For Qwen2VL models, we use LLaMa-Factory for fine-tuning. Please setup the environments according to LLaMa-Factory. The specific training configurations and scripts for Qwen2VL models could be found under t3_sft/qwen_exp/7b and t3_sft/qwen_exp/72b for 7B and 72B models, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
probing		probing
t3_sft		t3_sft
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Temporal Reasoning Transfer from Text to Video

Video-T3

Video LLM Temporal Bootleneck Probing

SFT Data Preparation

SFT Training

LongVA Training

Qwen2VL Models

About

Uh oh!

Releases

Packages

Languages

llyx97/video-t3

Folders and files

Latest commit

History

Repository files navigation

Temporal Reasoning Transfer from Text to Video

Video-T3

Video LLM Temporal Bootleneck Probing

SFT Data Preparation

SFT Training

LongVA Training

Qwen2VL Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages