Stars
[BMVC2023] Widely Applicable Strong Baseline for Sports Ball Detection and Tracking
Code for LATTE-MV: Learning to Anticipate Table Tennis hits from Monocular Videos
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
FlashMLA: Efficient MLA decoding kernels
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
The world's first open-source "Vibe Workflow" platform for complex tasks.
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
Cross-platform, customizable ML solutions for live and streaming media.
A generative world for general-purpose robotics & embodied AI learning.
There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable an…
[CVPR 2025] RollingDepth: Video Depth without Video Models
first base model for full-duplex conversational audio
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Official implementation for HybridDepth Model [WACV 2025, ISMAR 2024]
Open-source and strong foundation image recognition models.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
OpenMMLab Detection Toolbox and Benchmark
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Python sample codes and textbook for robotics algorithms.
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)
KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations (CVPR2020)
LLM Transparency Tool (LLM-TT), an open-source interactive toolkit for analyzing internal workings of Transformer-based language models. *Check out demo at* https://huggingface.co/spaces/facebook/l…
OpenMMLab Pose Estimation Toolbox and Benchmark.
[NeurIPS 2023] Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"