8000 GitHub - yunlong10/CAT-V: Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting

License

BSD-3-Clause and 3 other licenses found

Licenses found

BSD-3-Clause
LICENSE-Caption-Anything.txt
Apache-2.0
LICENSE-Qwen2-VL.txt
Apache-2.0
LICENSE-SAMURAI.txt
Apache-2.0
LICENSE-VideoLLaMA2.txt
Notifications You must be signed in to change notification settings

yunlong10/CAT-V

CAT-V

Official PyTorch implementation of Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

cat-v-framework

🚀 Updates

🕹️ Demo

🛠️ Getting Started

  1. Set up a conda environment (python>= 3.10) using:
conda create -n cat2 python=3.10 -y
conda activate cat2
  1. Install the requirements:
pip install -e .
  1. Download checkpoints:
cd checkpoints && \
./download_ckpts.sh && \
cd ..

🏃 RUN

bash inference.sh

📖 Citation

If you find this work useful for your research or applications, please cite using this BibTeX:

@inproceedings{tang2025cat-v,
  title={Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting},
  author={Tang, Yunlong and Bi, Jing and Hua, Hang and Xiao, Yunzhong and Song, Yizhi and Liu, Pinxin and Huang, Chao and Feng, Mingqian and Guo, Junjia and Liu, Zhuo and Song, Luchuan and Liang, Susan and Shimada, Daiki and Vosoughi, Ali and He, Jinxi and He, Liu and Zhang, Zeliang and Luo, Jiebo and Xu, Chenliang},
  journel={arXiv},
  year={2025}
}

🙏 Acknowledgements

This work was supported by Sony Group Corporation. We would like to thank Sayaka Nakamura and Jerry Jun Yokono for their insightful discussion.

We are also grateful for the following awesome projects our CAT-V arising from:

👩‍💻 Contributors

Our project wouldn't be possible without the contributions of these amazing people! Thank you all for making this project better.

🌟 Star History

Star History Chart

About

Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting

Resources

License

BSD-3-Clause and 3 other licenses found

Licenses found

BSD-3-Clause
LICENSE-Caption-Anything.txt
Apache-2.0
LICENSE-Qwen2-VL.txt
Apache-2.0
LICENSE-SAMURAI.txt
Apache-2.0
LICENSE-VideoLLaMA2.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  
0