-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Remove std::regex use for better cxx11 abi compatibility #3584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8b642ca
to
bfd191c
Compare
Signed-off-by: Trevor Morris <tmorris@nvidia.com>
bfd191c
to
def3f29
Compare
Very interesting! The issue sounds similar to this Stack Overflow question: https://stackoverflow.com/questions/51382355/stdregex-and-dual-abi So if in your case TensorFlow is built with horovod/horovod/tensorflow/CMakeLists.txt Line 71 in d15c914
Would it help if the rest of Horovod (sources in Your change looks great, @trevor-m. I'm just a little afraid that there might be more hidden corruption if the ABIs are mixed. |
Ah, that could be it. I didn't realize only the TF lib would be built with a I definitely understand and agree with your concern about more hidden corruption. Perhaps we can look into it a little more and do a follow-up PR? I can check if changing the horovod |
I'm definitely in favor of merging this current PR (once the CI passes). But it would be great to better understand the root cause.
If you find the time, that would be fabulous! |
Unit Test Results (with flaky tests) 1 074 files + 27 1 074 suites +27 10h 30m 1s ⏱️ + 23m 23s For more details on these failures, see this check. Results for commit def3f29. 8000 ± Comparison against base commit 48e0aff. |
Actually, I was mistaken there: It is already the case that all Horovod sources are compiled with Alternatively, could the crash be related to some other library that your are linking to / importing in Python, where In any case, making this conflict less likely to turn up by dropping |
Thanks @maxhgerlach! |
Checklist before submitting
Description
Since TensorFlow has moved to
_GLIBCXX_USE_CXX11_ABI=1
, we have ran into some issues with Horovod related to the use ofstd::regex
when we build TF/Horovod on manylinux2014.HorovodBroadcastInplaceOp
was crashing whenstd::regex
was used.To fix the problem, I replaced the regex with something simpler which is probably also faster and more lightweight.
Review process to land