Open
Description
I am trying to reproduce the evaluation results of Qwen2.5-7b-instruct on MMLU-pro, BBH and TheoremQA datasets, but the results are quite different from those in the official technical report. I have carefully read the README and related documents in the official GitHub repository, but I did not find clear instructions on the steps and scripts to reproduce.Could you kindly share the information!!!