[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
-
Updated
Feb 19, 2025 - Python
8000
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Datasets collection and preprocessings framework for NLP extreme multitask learning
Efficient LLM inference on Slurm clusters using vLLM.
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
Learning to route instances for Human vs AI Feedback
[ACL2024 Findings]DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
The code used in the paper "DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging"
Building an LLM with RLHF involves fine-tuning using human-labeled preferences. Based on Learning to Summarize from Human Feedback, it uses supervised learning, reward modeling, and PPO to improve response quality and alignment.
Source code of our paper "Transferring Textual Preferences to Vision-Language Understanding through Model Merging"
Add a description, image, and links to the reward-modeling topic page so that developers can more easily learn about it.
To associate your repository with the reward-modeling topic, visit your repo's landing page and select "manage topics."