8000 Apply resources appropriately to both launcher and node containers by jskswamy · Pull Request #2653 · kubeflow/trainer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Apply resources appropriately to both launcher and node containers #2653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jskswamy
Copy link

The Trainer method has been updated to apply resources appropriately to both the launcher and node containers based on this fl 8000 ag.

Key changes include:

  • Added the isRunLauncherAsNode method to determine if the launcher should be run as a node.
  • Updated the Trainer method to conditionally apply resource configurations to the launcher container based on the runLauncherAsNode value.
  • Enhanced test cases to cover scenarios for resource application to both launcher and node pods based on the MPI policy settings.

Which issue(s) this PR fixes: Fixes #2650

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coveralls
Copy link

Pull Request Test Coverage Report for Build 15338735909

Details

  • 30 of 37 (81.08%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.6%) to 30.448%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/runtime/framework/plugins/jobset/builder.go 30 37 81.08%
Totals Coverage Status
Change from base Build 15284619907: 1.6%
Covered Lines: 890
Relevant Lines: 2923

💛 - Coveralls

@jskswamy jskswamy force-pushed the fix-resource-allocation branch from c80c3ff to c0d40e8 Compare June 2, 2025 04:57
The Trainer method has been updated to apply resources appropriately
to both the launcher and node containers based on this flag.

Key changes include:
- Added the `isRunLauncherAsNode` method to determine if the
  launcher should be run as a node.
- Updated the Trainer method to conditionally apply resource
  configurations to the launcher container based on the
  `runLauncherAsNode` value.
- Enhanced test cases to cover scenarios for resource
  application to both launcher and node pods based on the
  MPI policy settings.

Signed-off-by: Krishnaswamy Subramanian <subramk@thoughtworks.com>
@jskswamy jskswamy force-pushed the fix-resource-allocation branch from c0d40e8 to 6925f41 Compare June 5, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for ResourcesPerNode in DeepSpeed Training Job Containers
2 participants
0