Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
PyTorch code and models for VJEPA2 self-supervised learning from video.
[ICML'25] Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
PyTorch implementation of Zero-Shot Vision Encoder Grafting via LLM Surrogates [ICCV 2025]
Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"
Processed / Cleaned Data for Paper Copilot
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
GIF: Generative Inspiration for Face Recognition at Scale
[ECCV2020] A Large-Scale Face Anti-Spoofing Dataset
[TPAMI] Searching prompt modules for parameter-efficient transfer learning.
The simplest, fastest repository for training/finetuning small-sized VLMs.
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.
A free open source IT asset/license management system
An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowe…
⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.