Stars
Open-source and strong foundation image recognition models.
Rankings include: Align3R BetterDepth ChronoDepth CUT3R Deep3D Depth Any Video Depth Anything Depth Pro DepthCrafter Geo4D GRIN L4P MASt3R Metric3D Metric-Solver MoGe MonST3R NVDS RollingDepth Ster…
Python3 package for Chinese/English OCR, with paddleocr-v4 onnx model(~14MB). 基于ppocr-v4-onnx模型推理,可实现 CPU 上毫秒级的 OCR 精准预测,通用场景中英文OCR达到开源SOTA。
Local Deployment of OmniParser v2.0 with pyautogui for True Automated Clicking!
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
A simple screen parsing tool towards pure vision based GUI agent
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Texas Poker Multi-Agent Game/多智能体德州扑克游戏
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, Comfy…
PE3R: Perception-Efficient 3D Reconstruction. Take 2 - 3 photos with your phone, upload them, wait a few minutes, and then start exploring your 3D world via text!
Official implementations for paper: VACE: All-in-One Video Creation and Editing
Open-Sora: Democratizing Efficient Video Production for All
The fastest digital human algorithm, now on your desktop.
最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。
Real time interactive streaming digital human
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
The official code for “Recurrent Generic Contour-based Instance Segmentation with Progressive Learning”, TCSVT, 2024.
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/
A collaboration friendly studio for NeRFs
hulk006 / mmsegmentation
Forked from open-mmlab/mmsegmentationOpenMMLab Semantic Segmentation Toolbox and Benchmark.
Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
A Pytorch implementation of CASENet for the Cityscapes Dataset
Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)