Stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The development and future prospects of multimodal reasoning models.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
a state-of-the-art-level open visual language model | 多模态预训练模型
✨✨Latest Advances on Multimodal Large Language Models
A high-throughput and memory-efficient inference and serving engine for LLMs
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.