Stars
[NeurIPS 2024] How do Large Language Models Handle Multilingualism?
Trying to prototype a multimodal llm which can take text and audio as input and then output text.
Build your own visual reasoning model
Open neural machine translation models and web services
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
Democratizing Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Multilingual Generative Pretrained Model
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
STACL simultaneously translation model with PaddlePaddle
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
MAD: The first work to explore Multi-Agent Debate with Large Language Models :D
GEMBA — GPT Estimation Metric Based Assessment
The official Python library for the OpenAI API
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)