8000 add support for local variables for BroadcastGlobalVariablesCallback by MrAta · Pull Request #3685 · horovod/horovod · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

add support for local variables for BroadcastGlobalVariablesCallback #3685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

MrAta
Copy link
Contributor
@MrAta MrAta commented Sep 8, 2022

Signed-off-by: Ata FatahiBaarzi afatahibaarzi@linkedin.com

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

This PR is continuation of #3628, #3643, and #3600 and adds support for local variables for BroadcastGlobalVariablesCallback. So in model parallel use cases, the only the initial value of non-local variables are broadcasted.
An example use case is provided in the unit test included with this PR, but, in short, the callback will be used as follows:

...
local_variables = <model's local variables>
callbacks = [hvd.callbacks.BroadcastGlobalVariablesCallback(root_rank=root_rank, local_variables=local_variables)]
model.fit(..., callbacks=callbacks)
...

@MrAta
Copy link
Contributor Author
MrAta commented Sep 8, 2022

cc @romerojosh @maxhgerlach would appreciate your review for this PR.
One thing that I wasn't sure about is that whether TF1 code uses this callback or not? currently the PR doesn't handle the case for TF1.

@github-actions
Copy link
github-actions bot commented Sep 8, 2022

Unit Test Results

     678 files   -    166     678 suites   - 166   8h 55m 18s ⏱️ - 39m 18s
     817 tests  -      16     697 ✔️  -      11     120 💤 +  12  0 ±0 
13 517 runs   - 2 712  9 312 ✔️  - 1 765  4 205 💤  - 891  0 ±0 

Results for commit 01d78d4. ± Comparison against base commit dfe5b8e.

@github-actions
Copy link
github-actions bot commented Sep 8, 2022

Unit Test Results (with flaky tests)

     793 files   -    257       793 suites   - 257   9h 47m 7s ⏱️ - 26m 32s
     817 tests  -      16       696 ✔️  -      12     120 💤 +     12  1 +1 
16 017 runs   - 3 480  10 803 ✔️  - 2 279  5 213 💤  - 1 086  1 +1 

For more details on these failures, see this check.

Results for commit 01d78d4. ± Comparison against base commit dfe5b8e.

@EnricoMi
Copy link
Collaborator
EnricoMi commented Sep 9, 2022

Please rebase with latest master, which includes a fix for the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0