- Shenzhen, China
- https://scholar.google.com/citations?user=853-0n8AAAAJ
Lists (1)
Sort Name ascending (A-Z)
Stars
The code for the paper "Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning" (CVPR'25).
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Implementing DeepSeek R1's GRPO algorithm from scratch
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
gimpong / CVPR25-AutoSSVH
Forked from EliSpectre/CVPR25-AutoSSVHThe code for the paper "AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing" (CVPR'25).
A comprehensive collection of process reward models.
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
This repository contains the PyTorch implementation of our work at CVPR 2025
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
✨First Open-Source R1-like Video-LLM [2025/02/18]
Collection of papers and repos for multimodal chain-of-thought
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Official pytorch repository for "Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval" (AAAI 2025 Paper)
FastVideo is a unified framework for accelerated video generation.
AI for Science 论文解读合集(持续更新ing),论文/数据集/教程下载:hyper.ai
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A collection of vision foundation models unifying understanding and generation.
The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).
The code for the paper "BoostAdapter: Improving Test-Time Adaptation via Regional Bootstrapping" (NeurIPS'24).
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
High-performance Image Tokenizers for VAR and AR
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
[TMLR 2025🔥] A survey for the autoregressive models in vision.
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
A suite of image and video neural tokenizers
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training