8000 Fallback to NCCL shared lib if static one is not found by nvcastet · Pull Request #3500 · horovod/horovod · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fallback to NCCL shared lib if static one is not found #3500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 30, 2022

Conversation

nvcastet
Copy link
Collaborator
@nvcastet nvcastet commented Mar 29, 2022

Signed-off-by: Nicolas Castet 26874160+nvcastet@users.noreply.github.com

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

Our build default is to link against NCCL static lib. Unfortunately, if this one is unfound (e.g. in NGC containers), the build will fail.
Adding a fallback to trying to link against NCCL shared lib if static is not found and HOROVOD_NCCL_LINK is not set to STATIC.

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
@nvcastet nvcastet requested review from romerojosh and tgaddair March 29, 2022 20:05
Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
@github-actions
Copy link

Unit Test Results

     849 files  +  28       849 suites  +28   9h 38m 36s ⏱️ - 16m 19s
     765 tests ±    0       722 ✔️ ±    0       43 💤 ±    0  0 ±0 
19 568 runs  +732  14 021 ✔️ +468  5 547 💤 +264  0 ±0 

Results for commit 35ef810. ± Comparison against base commit 12f9f9a.

@github-actions
Copy link

Unit Test Results (with flaky tests)

     929 files  +  24       929 suites  +24   9h 59m 29s ⏱️ - 18m 24s
     765 tests ±    0       722 ✔️ ±    0       43 💤 ±    0  0 ±0 
21 552 runs  +520  15 177 ✔️ +292  6 375 💤 +228  0 ±0 

Results for commit 35ef810. ± Comparison against base commit 12f9f9a.

Copy link
Collaborator
@romerojosh romerojosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nvcastet nvcastet merged commit b4db405 into horovod:master Mar 30, 2022
@nvcastet nvcastet deleted the fallback_shared_nccl branch March 30, 2022 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0