8000 Startup times are slow · Issue #384 · volcengine/verl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Startup times are slow #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
casper-hansen opened this issue Feb 25, 2025 · 4 comments
Open

Startup times are slow #384

casper-hansen opened this issue Feb 25, 2025 · 4 comments

Comments

@casper-hansen
Copy link

The time it takes to startup a training using veRL with vLLM is quite extensive.

In my estimation, it takes 3.7 minutes to startup veRL, excluding the time to spinup Ray on multiple nodes.

Image

@vermouth1992
Copy link
Collaborator

Yeah, this is a known problem. What's the model size are you using?

@casper-hansen
Copy link
Author

This is with Qwen 2 7B

@casper-hansen
Copy link
Author

Today with 4x nodes and Qwen 2.5 7B, I logged 6.7 minutes until step 1. @vermouth1992 Do you have any idea of which process in the init is taking so long and if we can optimize it?

@dddraxxx
Copy link

same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0