8000 GitHub - llyx97/video-t3: [ICLR 2025] "Temporal Reasoning Transfer from Text to Video", Lei Li, Yuanxin Liu, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[ICLR 2025] "Temporal Reasoning Transfer from Text to Video", Lei Li, Yuanxin Liu, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu

Notifications You must be signed in to change notification settings

llyx97/video-t3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

1The University of Hong Kong  2Peking University  3UCSD
*Equal Contribution

Video-T3

This repo contains official implementation of our paper "Temporal Reasoning Transfer from Text to Video"

Video LLM Temporal Bootleneck Probing

Please refer to ./probing for details.

SFT Data Preparation

For LongVA experiments, we mix the Open-LLaVA-NeXT dataset with our T3 dataset. The data mixing process is implemented in t3_sft/data_creation.py. The script handles:

  • Loading and processing Open-LLaVA-NeXT data
  • Loading our Video T3 dataset containing various aspects

The data mixing script allows for:

  • Customizable dataset ratios (see Table 2 of the main paper and Figure 9 of Appendix for the best practice of mixing datasets.)
  • Text length filtering (to avoid OOM when GPU memory is limited )
  • Token length analysis and visualization

SFT Training

LongVA Training

We use the LongVA codebase for training LongVA models. Please setup the environments according to LongVA. The training script is located at t3_sft/longva_exp/longva_t3.sh.

Qwen2VL Models

For Qwen2VL models, we use LLaMa-Factory for fine-tuning. Please setup the environments according to LLaMa-Factory. The specific training configurations and scripts for Qwen2VL models could be found under t3_sft/qwen_exp/7b and t3_sft/qwen_exp/72b for 7B and 72B models, respectively.

About

[ICLR 2025] "Temporal Reasoning Transfer from Text to Video", Lei Li, Yuanxin Liu, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0