Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
-
Updated
Jun 21, 2025 - Python
8000
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
Pivotal Token Search
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)
Train Large Language Models on MLX.
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
CodeUltraFeedback: aligning large language models to coding preferences
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.
To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."