Lists (30)
Sort Name ascending (A-Z)
Book
C/C++
Colorization
ComfyUI
Computer Vision
Object Detection / Segmentation / Recognition, Optical Character Recognition (OCR), Vision-Language Models (VLM)Dataset
Deep Learning
Emacs
Fonts
Games
i3wm
Image/Video Generation
Generative Adversarial Networks (GAN), Autoregressive Models, Diffusion Models (DM), Latent Diffusion Models (LDM)Image/Video Restoration
Denoising, Super-Resolution, Colorization, InpaintingInpainting
Image and Video InpaintingLanguage Models
Natural Language Processing (NLP), Large Language Models (LLM)Mathematics
Media
Metrics
Multimodal Foundation Models
Obsidian
Programming Languages
Python
PyTorch
Rust
Speech and Audio
Stable Diffusion
Transformer
Utils
ViT
Web
Stars
🪐 Markdown with superpowers — from ideas to papers, presentations and books.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
MOVED TO CODEBERG - Web-based environment for live coding algorithmic patterns, incorporating a faithful port of TidalCycles to JavaScript
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
chat with private and local large language models
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022
A Conversational Speech Generation Model
DSPy: The framework for programming—not prompting—language models
A TTS model capable of generating ultra-realistic dialogue in one pass.
🔥 [ICCV 2025] InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Simple, unified interface to multiple Generative AI providers
YOLO-World + EfficientViT SAM
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
MAGI-1: Autoregressive Video Generation at Scale
Unofficial implementation of YOLO-World + EfficientSAM for ComfyUI
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Cut and paste your surroundings using AR
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
ComfyUI Yolo World EfficientSAM custom node