Official PyTorch implementation of Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
- Set up a conda environment (python>= 3.10) using:
conda create -n cat2 python=3.10 -y
conda activate cat2
- Install the requirements:
pip install -e .
- Download checkpoints:
cd checkpoints && \
./download_ckpts.sh && \
cd ..
bash inference.sh
If you find this work useful for your research or applications, please cite using this BibTeX:
@inproceedings{tang2025cat-v,
title={Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting},
author={Tang, Yunlong and Bi, Jing and Hua, Hang and Xiao, Yunzhong and Song, Yizhi and Liu, Pinxin and Huang, Chao and Feng, Mingqian and Guo, Junjia and Liu, Zhuo and Song, Luchuan and Liang, Susan and Shimada, Daiki and Vosoughi, Ali and He, Jinxi and He, Liu and Zhang, Zeliang and Luo, Jiebo and Xu, Chenliang},
journel={arXiv},
year={2025}
}
This work was supported by Sony Group Corporation. We would like to thank Sayaka Nakamura and Jerry Jun Yokono for their insightful discussion.
We are also grateful for the following awesome projects our CAT-V arising from:
Our project wouldn't be possible without the contributions of these amazing people! Thank you all for making this project better.
- Yunlong Tang @ University of Rochester
- Jing Bi @ University of Rochester
- Chao Huang @ University of Rochester
- Susan Liang @ University of Rochester
- Daiki Shimada @ Sony Group Corporation
- Hang Hua @ University of Rochester
- Yunzhong Xiao @ Carnegie Mellon University
- Yizhi Song @ Purdue University
- Pinxin Liu @ University of Rochester
- Mingqian Feng @ University of Rochester
- Junjia Guo @ University of Rochester
- Zhuo Liu @ University of Rochester
- Luchuan Song @ University of Rochester
- Ali Vosoughi @ University of Rochester
- Jinxi He @ University of Rochester
- Liu He @ Purdue University
- Zeliang Zhang @ University of Rochester
- Jiebo Luo @ University of Rochester
- Chenliang Xu @ University of Rochester