Stars
The implementation of Decoupling Layout from Glyph in Online Chinese Handwriting Generation (ICLR 2025)
A synthetic data generator for text recognition
[CVPR 2025] Multiple Object Tracking as ID Prediction
[IEEE TIP] TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under Complex Motions and Diverse Scenes
[CVPR 2023] Unifying Short and Long-Term Tracking with Graph Hierarchies
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
[CVPR 2025] "A Distractor-Aware Memory for Visual Object Tracking with SAM2"
强国通 科技强国 学习强国 xuexiqiangguo 全网最好用开源网页学习强国助手:TechXueXi (懒人刷分工具 自动学习)技术强国,支持答题,支持 docker 45分/天
CO-MOT: Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking
[CVPR2023] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
[ECCV2022] MOTR: End-to-End Multiple-Object Tracking with TRansformer
Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
[ACCV 2024 (Oral)] Official Implementation of "Moving Object Segmentation: All You Need Is SAM (and Flow)" Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman
Fast and memory-efficient exact attention
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
An open-source framework for training large multimodal models.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.