Stars
[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer
CVPR 2025 - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
EthoML / VAME
Forked from LINCellularNeuroscience/VAMEVariational Animal Motion Embedding - A tool for time series embedding and clustering
Official code for the manuscript "Three-dimensional surface motion capture of multiple freely moving pigs using MAMMAL"
A Python toolbox for analysing body movements across space and time
Official implementation of DeepLabCut: Markerless pose estimation of user-defined features with deep learning for all animals incl. humans
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Modified Python 3.0 implementation of MotionMapper (https://github.com/gordonberman/MotionMapper)
(Supports DeepSeek R1) An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models.
Witness the aha moment of VLM with less than $3.
Solve Visual Understanding with Reinforced VLMs
AIDE: AI-Driven Exploration in the Space of Code. State of the Art machine Learning engineering agents that automates AI R&D.
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
[AAAI 2025] The official repository of our paper "Target Semantics Clustering via Text Representations for Robust Universal Domain Adaptation"
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)