MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning

Overview 📚

Welcome to the official repository for MT-R1-Zero, the first open-source adaptation of the R1-Zero Reinforcement Learning (RL) paradigm for Machine Translation (MT). MT-R1-Zero achieves highly competitive translation quality without supervised fine-tuning or cold-start data by using the Rule-Metric Mixed Reward mechanism that guides LLMs via feedback from metrics like BLEU and COMETKiwi. Our 7B parameter models demonstrate performance on par with or exceeding advanced models on WMT'24 EN-ZH benchmarks. We observed many interesting findings during the training process, which we invite you to explore in our paper. This work highlights the potential of pure, metric-guided RL for advancing Natural Language Generation tasks.

The training dynamics are fascinating! We strongly encourage you to try our code firsthand.

News 📢

[2025/04/15] Our paper is released on arxiv: [2504.10160] MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
[2025/04/14] We release the code and data of MT-R1-Zero

Environment Setup 🔧

conda create -n mtzero python=3.10
conda activate mtzero

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e . 
pip install wandb IPython matplotlib sacrebleu sacrebleu[ja] unbabel-comet

Data Preparation 📦

Our training process uses parallel corpora with the following structure:

{
  "data_source": "train",
  "lg": "en-zh", 
  "en": "I love machine translation.", 
  "zh": "我爱机器翻译。"
}

You can either use the data provided in the /data directory or process your own data with:

python3 data/process_data.py \
    --train_files "data/train/json/train_zhen_6565.jsonl" "data/train/json/train_enzh_6565.jsonl" \
    --test_files "data/test/json/wmt23_zhen.jsonl" "data/test/json/wmt24_enzh.jsonl" \
    --template_type "base" \
    --train_output_file ${train_file_path} \
    --test_output_file ${test_file_path}

For more details, please refer to data/process_data.py.

GRPO Training 🎬️

conda activate mtzero
bash main_grpo.sh

Parameters:

model_path: Path to your base model
train_file_path: Path to processed training data in parquet format
test_file_path: Path to processed test data in parquet format
comet_model_path: Path to the reference-based metric checkpoint(e.g., XCOMET, COMET-22)
comet_free_model_path: Path to the reference-free metric checkpoint (e.g., COMETKiwi, XCOMET)
train_batch_size: Training batch size (default: 8)
rollout_num: Number of generated samples for each input during training (default: 8)
comet_rm: Whether to use reference-based COMET as reward (True/False)
comet_free_rm: Whether to use reference-free COMETKiwi as reward (True/False)
reward_metric: Type of metric reward to use ('Model', 'BLEU', or 'Merge'):
- 'Model': When using COMET-based metrics (comet_rm or comet_free_rm is True)
- 'BLEU': When using BLEU as the metric reward
- 'Merge': When using a combination of BLEU and COMETKiwi as metric rewards
If you want to support larger models, you can increase tensor_model_parallel_size.

We have successfully run our code on 4xH800 (7B) and 4xA40 (3B) GPUs.

Evaluation 🎰

First, replace BASE_MODEL_NAME, BASE_PATH, BASE_SAVE_DIR, comet_model_path, and comet_free_model_path in main_inference_eval.sh with your paths.

conda activate mtzero
bash main_inference_eval.sh

Citation 📝

@misc{feng2025mtr1zero,
      title={MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning}, 
      author={Zhaopeng Feng and Shaosheng Cao and Jiahan Ren and Jiayuan Su and Ruizhe Chen and Yan Zhang and Zhe Xu and Yao Hu and Jian Wu and Zuozhu Liu},
      year={2025},
      eprint={2504.10160},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.10160}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset		asset
data		data
eval		eval
verl		verl
LICENSE		LICENSE
README.md		README.md
main_grpo.sh		main_grpo.sh
main_inference_eval.sh		main_inference_eval.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning

Overview 📚

News 📢

Environment Setup 🔧

Data Preparation 📦

GRPO Training 🎬️

Evaluation 🎰

Citation 📝

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

fzp0424/MT-R1-Zero

Folders and files

Latest commit

History

Repository files navigation

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning

Overview 📚

News 📢

Environment Setup 🔧

Data Preparation 📦

GRPO Training 🎬️

Evaluation 🎰

Citation 📝

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages