We introduce MePO, a lightweight and locally deployable prompt optimization model trained under a merits-guided preference framework. MePO is designed to optimize prompts effectively for downstream use in small language models.
The dataset used for training and evaluation is available on Hugging Face:
- MePO
- MePO_BPO — Optimized prompts based on the BPO dataset
- MePO_Alpaca — Optimized prompts based on the Alpaca dataset
To train your own prompt optimization model using MePO, simply run with downloaded dataset in your correct folder path:
pip install -r requirements.txt
python MePO_run_train.py
📌 Recommendation:
Based on our empirical results, we recommend using MePO_BPO for training prompt optimizers targeting lightweight LLMs (<7B), especially in chatbot-style prompt optimization tasks.
For chatbot-style testing demonstration:
MePO_prompt_optimization.py
For downstream tasks optimization prompt generation:
MePO_optimized_downstream_task.py
If you use our code, dataset, or model, please cite our paper:
@misc{zhu2025rethinkingpromptoptimizersprompt,
title = {Rethinking Prompt Optimizers: From Prompt Merits to Optimization},
author = {Zixiao Zhu and Hanzhang Zhou and Zijian Feng and Tianjiao Li and Chua Jia Jim Deryl and Mak Lee Onn and Gee Wah Ng and Kezhi Mao},
year = {2025},
eprint = {2505.09930},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2505.09930}
}