Open
Description
What happened + What you expected to happen
Look Limitation
section of this PR.
Node is failed. In this case, we should track Node ID -> Worker ID mapping at GCS and when the node is failed, we should record worker metadata.
We should fix it to report all worker exits properly.
This is needed to have complete exit report.
Versions / Dependencies
master
Reproduction script
N/A
Issue Severity
Low: It annoys or frustrates me.