-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Re-enabling new keras optimizers #3860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc8f797
to
43e2045
Compare
Unit Test Results 1 156 files - 35 1 156 suites - 35 13h 1m 30s ⏱️ - 1h 9m 14s For more details on these failures, see this check. Results for commit 9fe3eec. ± Comparison against base commit a418125. This pull request removes 2 and adds 1 tests. Note that renamed tests count towards both.
This pull request skips 5 tests.
♻️ This comment has been updated with latest results. |
Unit Test Results (with flaky tests) 1 520 files - 93 1 520 suites - 93 15h 49m 17s ⏱️ - 1h 29m 14s For more details on these failures, see this check. Results for commit 9fe3eec. ± Comparison against base commit a418125. This pull request removes 2 and adds 1 tests. Note that renamed tests count towards both.
This pull request skips 5 tests.
♻️ This comment has been updated with latest results. |
@MrAta Maybe, you have an idea on the remaining issue. It is failing because |
I think flattening the list did the trick. |
86e0030
to
dd0cfbc
Compare
@maxhgerlach Do you know why the build heads are failing? |
@@ -59,7 +54,7 @@ def main(): | |||
|
|||
# Horovod: adjust learning rate based on number of GPUs. | |||
scaled_lr = 0.001 * hvd.size() | |||
opt = optimizers.Adam(scaled_lr) | |||
opt = tf.optimizers.Adam(scaled_lr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but a question about tf.keras
and keras
compatibility in horovod:
- After tf 2.6, tf.keras == keras.
- Before tf 2.6, this might be an issue?
- Or our CI already dropped support for TF2 versions < 2.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly about the test matrix of our CI.
Technically it should work for tf < 2.6 version when using keras package. I maintained the code path for pure keras (older version of TF and Keras).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just not sure TF2.6-
compatibilities.
Wait for head CIs to pass?
There are build fixes for tf-head in #3864. Ideally someone other than me should approve the PR as I added C++ changes to it. |
dd0cfbc
to
320be63
Compare
Argh, there appears to be a new change on tf-head:
|
2e694aa
to
752580a
Compare
New Keras optimizers should work without issues with Horovod. Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
2bfef56
to
ff7052c
Compare
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
The TF head failures are unrelated to those changes. Those can be fixed/investigated separately. |
Hi, may I know when will Horovod v0.28.0 release? I use v0.27.0 now and want to use branch with this PR for TF keras 2.12. |
I think once the CI is updated and we can be sure tests pass with TF 2.12, we should be able to get the next release out soon. Does Horovod master work for you in the mean time, @LuFinch? |
@maxhgerlach Thanks for the info. We are rebasing our fork repo to v0.27.0 but some models using keras optimizers meet issues, so we want to directly rebese to v0.28.0 including this PR. Glad to hear that you will release v0.28.0 soon and thanks for your jobs! |
Description
Re-enabling TF Keras 2.11+ optimizers.
Those and the legacy ones should work out the box with Horovod.Using new optimizers with
LossScaleOptimizer
is broken in 2.11 but fixed in 2.12.See discussions at keras-team/tf-keras#349
and commit at keras-team/keras@c98ea36
EDIT:
They are quite a few changes in API between the 2 optimizer versions. But hopefully this PR properly integrates Horovod to the new ones while maintaining support for the old ones.