-
Beijing University of Posts and Telecommunications
- Beijing
Stars
Open Source framework for voice and multimodal conversational AI
verl: Volcano Engine Reinforcement Learning for LLMs
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
👤 | Real Time Head Pose Estimation: Accurate head pose estimation using ResNet 18/34/50 and MobileNet V2/V3 models. Evaluate yaw, pitch, and roll with pre-trained weights for quick integration.
Toolkit for linearizing PDFs for LLM datasets/training
Bolt is a deep learning library with high performance and heterogeneous flexibility.
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
Solve Visual Understanding with Reinforced VLMs
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
Fully open reproduction of DeepSeek-R1
Minimal reproduction of DeepSeek R1-Zero
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Witness the aha moment of VLM with less than $3.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
[ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations
Scalable RL solution for advanced reasoning of language models
[AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero