dpo

Star

Here are 52 public repositories matching this topic...

oumi-ai / oumi

Star

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

evaluation inference llama fine-tuning sft dpo llms vlms

Updated Jun 21, 2025
Python

shibing624 / MedicalGPT

Star

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

medical llama gpt dpo llm chatgpt medicalgpt

Updated Jun 20, 2025
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated Jun 8, 2025
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jan 24, 2025
Python

zhaorw02 / DeepMesh

Star

Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

point-cloud mesh generative-model mesh-generation 3d dpo aigc llm

Updated Jun 18, 2025
Python

sail-sg / oat

Star

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

thompson-sampling alignment reasoning distributed-training ppo dueling-bandits dpo distributed-rl llm online-rl rlhf llm-aligment online-alignment llm-exploration grpo r1-zero

Updated Jun 11, 2025
Python

dvlab-research / Step-DPO

Star

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

Updated Jan 19, 2025
Python

TUDB-Labs / mLoRA

Star

An Efficient "Factory" to Build Multiple LoRA Adapters

gpu llama lora finetune peft dpo baichuan llm rlhf chatglm llama2 mlora

Updated Feb 13, 2025
Python

armbues / SiLLM

Star

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jun 16, 2025
Python

RockeyCoss / SPO

Star

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

text-to-image dpo diffusion-models text-to-image-generation sdxl

Updated Apr 7, 2025
Python

YangLing0818 / IterComp

Star

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

text-to-image dpo rlhf reward-modeling

Updated Feb 19, 2025
Python

TideDra / VL-RLHF

Star

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated Nov 15, 2024
Python

argilla-io / notus

Star

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

zephyr fine-tuning dpo trl lm-alignment preference-data alignment-handbook

Updated Jan 15, 2024
Python

NiuTrans / Vision-LLM-Alignment

Star

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

vision alignment multi-model reward ppo sft dpo llm rlhf mllm llava llama3-vision

Updated Jun 18, 2025
Python

codelion / pts

Sponsor

Star

Pivotal Token Search

Updated May 17, 2025
Python

AIDC-AI / CHATS

Star

CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)

text-to-image dpo sdxl

Updated May 29, 2025
Python

Goekdeniz-Guelmez / mlx-lm-lora

Star

Train Large Language Models on MLX.

training apple deep-learning ml mlx sft dpo grpo grpotrainer

Updated Jun 20, 2025
Python

YangLing0818 / SuperCorrect-llm

Star

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

reflection self-correction dpo llm llm-reasoning

Updated Mar 23, 2025
Python

martin-wey / CodeUltraFeedback

Star

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge codeultrafeedback codal-bench

Updated Jun 25, 2024
Python

junkangwu / beta-DPO

Star

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

alignment dpo rlhf preference-alignment

Updated Oct 23, 2024
Python

Improve this page

Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpo

Here are 52 public repositories matching this topic...

oumi-ai / oumi

shibing624 / MedicalGPT

ContextualAI / HALOs

jianzhnie / LLamaTuner

zhaorw02 / DeepMesh

sail-sg / oat

dvlab-research / Step-DPO

TUDB-Labs / mLoRA

armbues / SiLLM

RockeyCoss / SPO

YangLing0818 / IterComp

TideDra / VL-RLHF

argilla-io / notus

NiuTrans / Vision-LLM-Alignment

codelion / pts

AIDC-AI / CHATS

Goekdeniz-Guelmez / mlx-lm-lora

YangLing0818 / SuperCorrect-llm

martin-wey / CodeUltraFeedback

junkangwu / beta-DPO

Improve this page

Add this topic to your repo