feat: Enable vLLM cudagraphs #498

jiemingz · 2025-06-10T22:04:09Z

Addresses: !186

The generation throughput shows about ~3% speedup for llama8b on 4 nodes

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

examples/configs/grpo_math_1B.yaml

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

nemo_rl/models/generation/vllm.py

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

parthchadha · 2025-06-17T16:54:26Z

@jiemingz can you also add timing plot to the MR description showing benefits of enabling cuda graphs vs not.

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

SahilJain314 · 2025-06-27T20:55:35Z

Unit test failure here with the eager key missing: @jiemingz
E File "/opt/nemo-rl/nemo_rl/models/generation/vllm.py", line 336, in init
E enforce_eager=self.cfg["vllm_cfg"]["enforce_eager"],
E ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
E KeyError: 'enforce_eager'

enable vllm cg

e84ff82

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

jiemingz force-pushed the jiemingz/vllm_cg branch from de91c35 to e84ff82 Compare June 11, 2025 14:52

fix

d385c3d

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

jiemingz requested review from parthchadha and terrykong June 11, 2025 14:52

jiemingz changed the title ~~Draft: Enable vLLM cudag 8000 raphs~~ Enable vLLM cudagraphs Jun 11, 2025

jiemingz self-assigned this Jun 11, 2025

Merge branch 'main' into jiemingz/vllm_cg

0b087d9

parthchadha reviewed Jun 11, 2025

View reviewed changes

examples/configs/grpo_math_1B.yaml Outdated Show resolved Hide resolved

enable cg default

b38aea8

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

jiemingz changed the title ~~Enable vLLM cudagraphs~~ feat: Enable vLLM cudagraphs Jun 13, 2025

Merge branch 'main' into jiemingz/vllm_cg

31b3fae

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

parthchadha requested changes Jun 17, 2025

View reviewed changes

nemo_rl/models/generation/vllm.py Outdated Show resolved Hide resolved

jiemingz added 2 commits June 17, 2025 12:52

Update vllm.py

9e36ac7

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

Update vllm.py

7758656

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

Update grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml

1b2c134

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

parthchadha previously approved these changes Jun 17, 2025

View reviewed changes

Update vllm.py

684bca1

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

jiemingz dismissed parthchadha’s stale review via 684bca1 June 25, 2025 18:04

SahilJain314 approved these changes Jun 26, 2025

View reviewed changes

parthchadha approved these changes Jun 26, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jun 26, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enable vLLM cudagraphs #498

feat: Enable vLLM cudagraphs #498

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: Enable vLLM cudagraphs #498

Are you sure you want to change the base?

feat: Enable vLLM cudagraphs #498

Uh oh!

Conversation

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!