Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
PyTorch code and models for VJEPA2 self-supervised learning from video.
Official repository for the paper "SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation."
[CVPR25] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"
Robustness in Both Domains: CLIP Needs a Robust Text Encoder
Interactive Pytorch forward pass visualization in notebooks
Turn any computer or edge device into a command center for your computer vision projects.
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
Run Kokoro TTS locally on device using Expo & ONNX Runtime
react-native-mlkit - The definitive MLKit wrapper for React Native and Expo
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Swift Package to implement a transformers-like API in Swift
On-device AI across mobile, embedded and edge for PyTorch
Declarative way to run AI models in React Native on device, powered by ExecuTorch.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
📚 Jupyter notebook tutorials for OpenVINO™
Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.
LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"
Real-time webcam demo with SmolVLM and llama.cpp server
[CVPR 2025 Best Paper Award Candidate] VGGT: Visual Geometry Grounded Transformer
Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.