feat: use cuda_graph by default for vllm #116

parthchadha · 2025-04-01T21:51:45Z

What does this PR do ?

Enables cuda graph by default for vllm generation.
For llama-8b I am seeing 10-15% better generation speed compared to eager mode.

Issues

Closes #115.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

nemo_reinforcer/models/generation/vllm.py

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

feat: use cuda_graph by default for vllm

54a3b4a

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

parthchadha requested review from SahilJain314 and terrykong April 1, 2025 21:51

parthchadha added the Run CICD Set to run CI (unset + set to rerun) label Apr 1, 2025

terrykong reviewed Apr 1, 2025

View reviewed changes

nemo_reinforcer/models/generation/vllm.py Show resolved Hide resolved

Merge branch 'main' into pchadha/vllm-cuda-graph

f78e4f8

parthchadha added Run CICD Set to run CI (unset + set to rerun) and removed Run CICD Set to run CI (unset + set to rerun) labels Apr 1, 2025

terrykong approved these changes Apr 1, 2025

View reviewed changes

parthchadha enabled auto-merge (squash) April 1, 2025 22:56

parthchadha merged commit d9277a8 into main Apr 1, 2025
11 checks passed

parthchadha deleted the pchadha/vllm-cuda-graph branch April 1, 2025 22:58

yfw pushed a commit that referenced this pull request Apr 2, 2025

feat: use cuda_graph by default for vllm (#116)

97c5e1b

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

KiddoZhu pushed a commit that referenced this pull request May 6, 2025

feat: use cuda_graph by default for vllm (#116)

64b39f3

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use cuda_graph by default for vllm #116

feat: use cuda_graph by default for vllm #116

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: use cuda_graph by default for vllm #116

feat: use cuda_graph by default for vllm #116

Uh oh!

Conversation

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!