8000 Add elastic run api by EnricoMi · Pull Request #3503 · horovod/horovod · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add elastic run api #3503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 17, 2022
Merged

Add elastic run api #3503

merged 13 commits into from
Jun 17, 2022

Conversation

EnricoMi
Copy link
Collaborator
@EnricoMi EnricoMi commented Apr 4, 2022

Currently, the elastic training mode can only be used through horovodrun and not the existing horovod.run API.

This allows to run horovod.run with min_num_proc or host_discovery_script set to run a func in elastic mode.

@github-actions
Copy link
github-actions bot commented Apr 4, 2022

Unit Test Results

     836 files  +19       836 suites  +19   9h 48m 38s ⏱️ + 35m 37s
     770 tests +  2       727 ✔️ +  2       43 💤 ±0  0 ±0 
18 776 runs  +38  13 431 ✔️ +34  5 345 💤 +4  0 ±0 

Results for commit af09b9a. ± Comparison against base commit a304c81.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
github-actions bot commented Apr 4, 2022

Unit Test Results (with flaky tests)

     968 files  +19       968 suites  +19   10h 21m 50s ⏱️ + 40m 19s
     770 tests +  2       727 ✔️ +  2       43 💤 ±0  0 ±0 
22 042 runs  +38  15 335 ✔️ +34  6 707 💤 +4  0 ±0 

Results for commit af09b9a. ± Comparison against base commit a304c81.

♻️ This comment has been updated with latest results.

@EnricoMi EnricoMi changed the base branch from master to branch-test-run-api-examples April 6, 2022 09:50
@EnricoMi EnricoMi added this to the v0.25.0 milestone Apr 22, 2022
@EnricoMi EnricoMi force-pushed the branch-elastic-run-api branch from 0362ab5 to 923f511 Compare April 22, 2022 12:58
@EnricoMi EnricoMi marked this pull request as ready for review April 26, 2022 21:17
@EnricoMi EnricoMi force-pushed the branch-elastic-run-api branch from 1398fb9 to 439502f Compare April 26, 2022 21:19
@EnricoMi EnricoMi marked this pull request as draft April 26, 2022 21:21
Base automatically changed from branch-test-run-api-examples to master April 30, 2022 20:53
@EnricoMi EnricoMi force-pushed the branch-elastic-run-api branch from 439502f to 0cac76a Compare April 30, 2022 20:55
@EnricoMi EnricoMi marked this pull request as ready for review April 30, 2022 20:56
@EnricoMi EnricoMi force-pushed the branch-elastic-run-api branch 2 times, most recently from ac778f9 to 4dc5aab Compare June 16, 2022 11:32
EnricoMi added 10 commits June 16, 2022 13:34
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
@EnricoMi EnricoMi force-pushed the branch-elastic-run-api branch from 4dc5aab to 3eaf15b Compare June 16, 2022 11:34
@EnricoMi
Copy link
Collaborator Author
EnricoMi commented Jun 16, 2022

I managed to move the KVStoreServer code from launch.py into gloo_run.py, which is where some remote hosts are known first time and the common interface can be determined. This removes the ugly need to provide all driver ips to run_task.py with very low timeout.

…hanges

Reverts changes to run_task.py, launch.py and http_client.py.

Signed-off-by: Enrico Minack <github@enrico.minack.dev>
@EnricoMi EnricoMi force-pushed the branch-elastic-run-api branch from 3eaf15b to fa2ea9c Compare June 16, 2022 14:52
EnricoMi added 2 commits June 16, 2022 21:49
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
8000
@EnricoMi EnricoMi force-pushed the branch-elastic-run-api branch from fa2ea9c to af09b9a Compare June 16, 2022 19:50
@EnricoMi EnricoMi merged commit aeb960c into master Jun 17, 2022
@EnricoMi EnricoMi deleted the branch-elastic-run-api branch June 17, 2022 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0