8000 feat: Enable vLLM cudagraphs by jiemingz · Pull Request #498 · NVIDIA-NeMo/RL · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: Enable vLLM cudagraphs #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

feat: Enable vLLM cudagraphs #498

wants to merge 9 commits into from

Conversation

jiemingz
Copy link
Contributor
@jiemingz jiemingz commented Jun 10, 2025

Addresses: !186

image image image

The generation throughput shows about ~3% speedup for llama8b on 4 nodes

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
@jiemingz jiemingz changed the title Draft: Enable vLLM cudag 8000 raphs Enable vLLM cudagraphs Jun 11, 2025
@jiemingz jiemingz self-assigned this Jun 11, 2025
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
@jiemingz jiemingz changed the title Enable vLLM cudagraphs feat: Enable vLLM cudagraphs Jun 13, 2025
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
jiemingz added 2 commits June 17, 2025 12:52
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
@parthchadha
Copy link
Contributor

@jiemingz can you also add timing plot to the MR description showing benefits of enabling cuda graphs vs not.

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
parthchadha
parthchadha previously approved these changes Jun 17, 2025
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
@parthchadha parthchadha added this pull request to the merge queue Jun 26, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 27, 2025
@SahilJain314
Copy link
Contributor

Unit test failure here with the eager key missing: @jiemingz
E File "/opt/nemo-rl/nemo_rl/models/generation/vllm.py", line 336, in init
E enforce_eager=self.cfg["vllm_cfg"]["enforce_eager"],
E ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
E KeyError: 'enforce_eager'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0