Stars
Train transformer language models with reinforcement learning.
Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning"
[Lumina Embodied AI Community] A paper list for Embodied AI / Robotics
A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
[CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
A curated list of balanced multimodal learning methods.
A chatbot/GraphRAG framework that creates multi-llm-agents from social platform user comments and let them debate on specific topics.
⭐⭐⭐FightingCV Paper Reading, which helps you understand the most advanced research work in an easier way 🍀 🍀 🍀
🏭
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
QLoRA: Efficient Finetuning of Quantized LLMs
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A distributed task scheduling framework.(分布式任务调度平台XXL-JOB)
Stable Diffusion web UI
A collaboration friendly studio for NeRFs
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
We developed a python UI based on labelme and segment-anything for pixel-level annotation. It support multiple masks generation by SAM(box/point prompt), efficient polygon modification and category…
PyTorch implemented C3D, R3D, R2Plus1D models for video activity recognition.