[REQUEST] Request a script to reproduce the results of Qwen2.5-7b-instruct on the MMLU-pro, BBH and TheoremQA datasets.

I am trying to reproduce the evaluation results of Qwen2.5-7b-instruct on MMLU-pro, BBH and TheoremQA datasets, but the results are quite different from those in the official technical report. I have carefully read the README and related documents in the official GitHub repository, but I did not find clear instructions on the steps and scripts to reproduce.Could you kindly share the information！！！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions