Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A lightweight LMM-based Document Parsing Model
🪐 ✨ Model Context Protocol (MCP) Server for Jupyter.
Official implementation of SAM-Med2D
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Doctor is a tool for discovering, crawl, and indexing web sites to be exposed as an MCP server for LLM agents.
Python version of the Playwright testing and automation library.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
The simplest, fastest repository for training/finetuning small-sized VLMs.
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
A python package to build AI-powered real-time audio applications
Python package for Real-time, Local Speech-to-Text and Speaker Diarization. FastAPI Server & Web Interface
caviri / BetterWhisperX
Forked from m-bain/whisperXBetter WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
Projects motion of pixels to a voxel
Formatron empowers everyone to control the format of language models' output with minimal overhead.
FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser & Trae AI (And other Open Sourced) System Prompts, Tools & AI Models.
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
SGLang is a fast serving framework for large language models and vision language models.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
This repository analyzes satellite imagery to track the impact of the war on Gaza, using Sentinel Hub and Planet.com APIs for image retrieval and visualization.
Pruned CoTracker architecture for tracking the myocardium in 2D echo images.