Stars
Gemini is a modern LaTex beamerposter theme 🖼
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Let your Claude able to think
UniMD: Towards Unifying Moment retrieval and temporal action Detection
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
xlliu7 / TadTR
EB2E div>[TIP 2022] End-to-end Temporal Action Detection with Transformer
Large-scale text-video dataset. 10 million captioned short videos.
Segment Anything in High Quality [NeurIPS 2023]
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
[CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"
Code release for "VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning"
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23
Code and documentation to train Stanford's Alpaca models, and generate the data.
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.