You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The above four unit tests with "dropout" failed with the following error messages:
Traceback (most recent call last):
File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 149, in test_transducer_joint_pack_relu_dropout
self.run_transducer_joint(for_vector_kernel=False, pack_output=True, relu=True, dropout=True)
File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 109, in run_transducer_joint
mask=mask if dropout else None)
File "/apex/apex/contrib/test/transducer/transducer_ref.py", line 94, in transducer_joint_reference
h.backward(h_grad)
File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 402, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 193, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [4, 101, 25, 509]], which is output 0 of ReluBackward0, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
The above unit test failed with the following error messages:
Traceback (most recent call last):
File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 137, in test_transducer_joint_pack_relu
self.run_transducer_joint(for_vector_kernel=False, pack_output=True, relu=True, dropout=False)
File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 115, in run_transducer_joint
self.assertTrue(torch.allclose(f_grad_ref, f_grad_tst, atol=1e-5, rtol=1e-5))
AssertionError: False is not true
They are not reproducible with the docker (rocm/pytorch:latest == rocm5.2_ubuntu20.04_py3.7_pytorch_staging) locally. We may need to set them as flaky tests in the future or adjust the tolerance for ROCm.
The text was updated successfully, but these errors were encountered:
The above four unit tests with "dropout" failed with the following error messages:
The above unit test failed with the following error messages:
They are not reproducible with the docker (rocm/pytorch:latest == rocm5.2_ubuntu20.04_py3.7_pytorch_staging) locally. We may need to set them as flaky tests in the future or adjust the tolerance for ROCm.
The text was updated successfully, but these errors were encountered: