8000 GitHub - fzp0424/MT-R1-Zero: Code for paper "MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Code for paper "MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning"

License

Notifications You must be signed in to change notification settings

fzp0424/MT-R1-Zero

Repository files navigation

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning

 

Overview 📚

Welcome to the official repository for MT-R1-Zero, the first open-source adaptation of the R1-Zero Reinforcement Learning (RL) paradigm for Machine Translation (MT). MT-R1-Zero achieves highly competitive translation quality without supervised fine-tuning or cold-start data by using the Rule-Metric Mixed Reward mechanism that guides LLMs via feedback from metrics like BLEU and COMETKiwi. Our 7B parameter models demonstrate performance on par with or exceeding advanced models on WMT'24 EN-ZH benchmarks. We observed many interesting findings during the training process, which we invite you to explore in our paper. This work highlights the potential of pure, metric-guided RL for advancing Natural Language Generation tasks.

The training dynamics are fascinating! We strongly encourage you to try our code firsthand.

zhen_training.png

intro.png

News 📢

Environment Setup 🔧

conda create -n mtzero python=3.10
conda activate mtzero
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e . 
pip install wandb IPython matplotlib sacrebleu sacrebleu[ja] unbabel-comet

Data Preparation 📦

Our training process uses parallel corpora with the following structure:

{
  "data_source": "train",
  "lg": "en-zh", 
  "en": "I love machine translation.", 
  "zh": "我爱机器翻译。"
}

You can either use the data provided in the /data directory or process your own data with:

python3 data/process_data.py \
    --train_files "data/train/json/train_zhen_6565.jsonl" "data/train/json/train_enzh_6565.jsonl" \
    --test_files "data/test/json/wmt23_zhen.jsonl" "data/test/json/wmt24_enzh.jsonl" \
    --template_type "base" \
    --train_output_file ${train_file_path} \
    --test_output_file ${test_file_path}

For more details, please refer to data/process_data.py.

GRPO Training 🎬️

conda activate mtzero
bash main_grpo.sh

Parameters:

  • model_path: Path to your base model

  • train_file_path: Path to processed training data in parquet format

  • test_file_path: Path to processed test data in parquet format

  • comet_model_path: Path to the reference-based metric checkpoint(e.g., XCOMET, COMET-22)

  • comet_free_model_path: Path to the reference-free metric checkpoint (e.g., COMETKiwi, XCOMET)

  • train_batch_size: Training batch size (default: 8)

  • rollout_num: Number of generated samples for each input during training (default: 8)

  • comet_rm: Whether to use reference-based COMET as reward (True/False)

  • comet_free_rm: Whether to use reference-free COMETKiwi as reward (True/False)

  • reward_metric: Type of metric reward to use ('Model', 'BLEU', or 'Merge'):

    • 'Model': When using COMET-based metrics (comet_rm or comet_free_rm is True)
    • 'BLEU': When using BLEU as the metric reward
    • 'Merge': When using a combination of BLEU and COMETKiwi as metric rewards

    If you want to support larger models, you can increase tensor_model_parallel_size.

    We have successfully run our code on 4xH800 (7B) and 4xA40 (3B) GPUs.

Evaluation 🎰

First, replace BASE_MODEL_NAMEBASE_PATHBASE_SAVE_DIRcomet_model_path, and comet_free_model_path in main_inference_eval.sh with your paths.

conda activate mtzero
bash main_inference_eval.sh

Citation 📝

@misc{feng2025mtr1zero,
      title={MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning}, 
      author={Zhaopeng Feng and Shaosheng Cao and Jiahan Ren and Jiayuan Su and Ruizhe Chen and Yan Zhang and Zhe Xu and Yao Hu and Jian Wu and Zuozhu Liu},
      year={2025},
      eprint={2504.10160},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.10160}, 
}

About

Code for paper "MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0