-
Notifications
You must be signed in to change notification settings - Fork 6.5k
(serve.llm) Remove test leakage from placement bundle logic #53723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kouroshHakha
merged 11 commits into
ray-project:master
from
nrghosh:fix-test_code_leakage
Jun 27, 2025
Merged
(serve.llm) Remove test leakage from placement bundle logic #53723
kouroshHakha
merged 11 commits into
ray-project:master
from
nrghosh:fix-test_code_leakage
Jun 27, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8ee9b19
to
b20ea10
Compare
- Replace hacky fix to bypass placement groups when mock_resource was present - Remove coupling between test infra and prod code - Remove mock_resource checks, add test fixture to configure placement_bundles as [] when in testing - Tests now run without requiring GPUs by using a test-layer patch instead of modifying production logic - Addresses TODO (server_models.py) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
b20ea10
to
78a9359
Compare
- Root Cause: Patch in conftest.py only worked within the pytest process, but the failing test (test_build_openai_app_with_config) spawned a subprocess using serve run, which didn't inherit the monkey patch - Monkey patch for in-process tests: Continues to work for regular pytest tests - Environment variable for subprocesses: RAYLLM_TEST_DISABLE_PLACEMENT_GROUPS=1 is set by the fixture and checked by the production code Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
eicherseiji
reviewed
Jun 11, 2025
python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_models.py
Outdated
Show resolved
Hide resolved
- stop using env variable to control placement groups - monkey patching (global) broke tests by overriding all - tests ensure placement bundle logic works - eliminate leakage while still maintaining test coverage - fix tests by relaxing some assertions on serve_options dict Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
kouroshHakha
approved these changes
Jun 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. waiting for the tests.
…resource needs. apply placement bundle override to overlooked tests, fix type errors Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
The refactoring to explicit placement group mocking missed some fixtures that use mock engines but still tried to create real placement groups. Changes: - Add disable_placement_bundles dependency to testing_model fixtures in conftest.py - Set accelerator_type: None in test_application_builders.py subprocess configs - Remove environment variable approach to avoid test code leakage This fixes hanging tests in both CPU and GPU test suites that use mock engines (MockVLLMEngine) but were attempting to create real GPU placement groups. Mock vs real engine tests need different placement group handling, not CPU vs GPU tests. Fixes: - test_build_openai_app_with_config (CPU subprocess test) - test_lora_unavailable_base_model (CPU mock test) - test_models (GPU integration test with mock engine) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…t groups - fix broad initial approach that was breaking some tests - more targeted approach (explicitly declare test resources in config) - Revert core placement_bundles logic to preserve production behavior - Add resources_per_bundle: {"CPU": 1} to test fixture for CPU-only testing - Fixes test_build_openai_app_with_config deployment timeouts without breaking existing tests Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
b032aef
to
2f67dc1
Compare
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
eicherseiji
reviewed
Jun 26, 2025
|
||
|
||
@pytest.fixture | ||
def disable_placement_bundles(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in a future change--
To avoid adding disable_placement_bundles
each to each test, could we instead
- create a new
conftest.py
undercpu
- move this fixture declaration there
- and use
@pytest.fixture(autouse=True)
?
eicherseiji
approved these changes
Jun 26, 2025
kouroshHakha
approved these changes
Jun 27, 2025
minerharry
pushed a commit
to minerharry/ray
that referenced
this pull request
Jun 27, 2025
…ect#53723) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixing Test Code Leakage in LLM Placement Bundle Logic
Problem
Test Code Leakage: Production code had test-specific logic that didn't follow clean separation of concerns:
accelerator_type: A10G
in YAML configs, causingplacement_bundles
to return[{"GPU": 1, "accelerator_type:A10G": 0.001}]
mock_resource: 0
check was embedded in production code to bypass this and setbundles = []
Solution
Clean Test/Production Separation: Implemented proper test fixtures that isolate test behavior without modifying production code:
Test Fixture Approach (
conftest.py
):disable_placement_bundles
fixture usingunittest.mock.patch
VLLMEngineConfig.placement_bundles
property to return[]
during testsSubprocess Test Handling:
serve run
)resources_per_bundle: {"CPU": 1}
for CPU-only testing scenariosProduction Code Cleanup:
mock_resource
checks fromserver_models.py
Test Coverage
Fixed Test Categories:
serve run
(proper resource declarations)Key Tests Fixed:
test_build_openai_app_with_config
(CPU subprocess test)test_lora_unavailable_base_model
(CPU mock test)test_models
(GPU integration test with mock engine)Architecture
Before (Test Leakage):
After (Clean Separation):
Approach Evolution
Earlier broad monkey-patching broke existing tests, which led to a more refined approach:
resources_per_bundle: {"CPU": 1}
) for CPU-only scenarios