docs: Add missing arguments to DeepScaler evaluation #502

butsugiri · 2025-06-11T07:13:20Z

butsugiri commented

Jun 11, 2025

•

What does this PR do ?

This PR attempts to fix the document for deepscaler experiments

Currently, necessary arguments are missing, which leads to poor evaluation results

============================================================
model_name='step_300-hf' dataset_name='aime_2024'
max_new_tokens=2048 temperature=0.0 top_p=1.0 top_k=-1

metric='pass@1' num_tests_per_prompt=1

score=0.0333 (1.0/30)
============================================================

By specifying cot.txt (as is done in training setup), the result improves a bit.

============================================================
model_name='step_300-hf' dataset_name='aime_2024'
max_new_tokens=2048 temperature=0.0 top_p=1.0 top_k=-1

metric='pass@1' num_tests_per_prompt=1

score=0.1333 (4.0/30)
============================================================

By allowing the generation of more than 2048 tokens, the result is even better (This PR)

============================================================
model_name='step_300-hf' dataset_name='aime_2024'
max_new_tokens=8192 temperature=0.0 top_p=1.0 top_k=-1

metric='pass@1' num_tests_per_prompt=1

score=0.3667 (11.0/30)
============================================================

Issues

n/a

Usage

n/a

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests? --> n/a
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests --> n/a
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs. --> n/a

Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>

SahilJain314 · 2025-06-26T22:19:43Z

Thanks for the PR! Slipped past us for a bit.

abukharin-nv

LGTM! I would also suggest increasing max_len to 32K, but that is kind of a subjective choice.

add missing arguments

cf2c168

Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>

github-actions bot added the documentation Improvements or additions to documentation label Jun 11, 2025

butsugiri changed the title ~~Add missing arguments to DeepScaler evaluation~~ docs: Add missing arguments to DeepScaler evaluation Jun 11, 2025

parthchadha requested a review from abukharin-nv June 26, 2025 22:19

abukharin-nv approved these changes Jun 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add missing arguments to DeepScaler evaluation #502

docs: Add missing arguments to DeepScaler evaluation #502

Uh oh!

Uh oh!

Uh oh!

Uh oh!

docs: Add missing arguments to DeepScaler evaluation #502

Are you sure you want to change the base?

docs: Add missing arguments to DeepScaler evaluation #502

Conversation

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!