8000 [Core] Native CPU affinity support for accelerators by HollowMan6 · Pull Request #51719 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Core] Native CPU affinity support for accelerators #51719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

HollowMan6
Copy link
Contributor

Why are these changes needed?

This PR adds native CPU affinity support for accelerators, users can now specify the CPU affinity masks with accelerator_cpu_mask, in a string of digits separated by commas. The mapping is specifi 8000 ed to be node specific and identical mapping is applied to the tasks on each node with same accelerator id (even for different accelerator kind). If the number of accelerators exceeds the number of elements in this list, elements in the list will be reused as needed starting from the beginning of the list.

The affinity is set via psutil.Process().cpu_affinity before an actor task starts.

Related issue number

N/A

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@HollowMan6 HollowMan6 marked this pull request as draft March 26, 2025 20:18
@HollowMan6 HollowMan6 marked this pull request as ready for review March 26, 2025 20:24
@HollowMan6 HollowMan6 force-pushed the cpu_gpu branch 5 times, most recently from 71b6073 to e93eb86 Compare March 27, 2025 22:28
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Mar 29, 2025
@HollowMan6 HollowMan6 force-pushed the cpu_gpu branch 6 times, most recently from 60f4943 to 9ce1b58 Compare April 3, 2025 08:19
@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label Apr 4, 2025
@HollowMan6 HollowMan6 force-pushed the cpu_gpu branch 3 times, most recently from d921471 to 41fc56f Compare April 18, 2025 08:45
@HollowMan6 HollowMan6 force-pushed the cpu_gpu branch 2 times, most recently from 60f98cd to c4a6aa1 Compare May 13, 2025 14:09
@HollowMan6 HollowMan6 force-pushed the cpu_gpu branch 4 times, most recently from 4189356 to 283acb7 Compare June 6, 2025 07:14
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 25, 2025
This PR adds native CPU affinity support for accelerators, users
can now specify the CPU affinity masks with `accelerator_cpu_mask`,
in a string of digits separated by commas. The mapping is specified
to be node specific and identical mapping is applied to the tasks
on each node with same accelerator id (even for different accelerator
kind). If the number of accelerators exceeds the number of elements
in this list, elements in the list will be reused as needed starting
from the beginning of the list.

The affinity is set via `psutil.Process().cpu_affinity` before an actor
task starts.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@github-actions github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community core Issues that should be addressed in Ray Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0