Starred repositories
Official repository for "Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment"
🔥 [CVPR 2020] STEFANN: Scene Text Editor using Font Adaptive Neural Network (official code).
教育各种资料,从幼儿园到小学、中学,涵盖学而思,万维、猿辅导等多个机构,持续增加中
DreamO: A Unified Framework for Image Customization
ACE-Step: A Step Towards Music Generation Foundation Model
A TTS model capable of generating ultra-realistic dialogue in one pass.
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"
Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields
Deezer source separation library including pretrained models.
SkyReels V1: The first and most advanced open-source human-centric video foundation model
Enable AI models for video production in the browser
坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.
Arbitrary-steps Image Super-resolution via Diffusion Inversion (CVPR 2025)
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Official implementation of OneDiffusion paper (CVPR 2025)
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
StoryMaker: Towards consistent characters in text-to-image generation
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high …
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…