Stars
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Memory for AI Agents; Announcing OpenMemory MCP - local and secure memory management.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
Diffusion Model-Based Image Editing: A Survey (TPAMI 2025)
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
SSA + FastSAM/Semantic Fast Segment Anything , or Fast Semantic Segment Anything
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Reference implementation for DPO (Direct Preference Optimization)
✨✨Latest Advances on Multimodal Large Language Models
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技
This is an AI agent for Street Fighter II Champion Edition.
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
A collection of libraries to optimise AI model performances
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
Awesome Incremental Learning
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
The code repository for "Cross-Modal and Hierarchical Modeling of Video and Text" in PyTorch
Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
Implementation for MAF: Multimodal Alignment Framework
Improving One-stage Visual Grounding by Recursive Sub-query Construction, ECCV 2020
A curated list of deep learning resources for video-text retrieval.