10000 The failing unit tests in test_transducer_joint.py · Issue #89 · ROCm/apex · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

The failing unit tests in test_transducer_joint.py #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hubertlu-tw opened this issue Aug 24, 2022 · 0 comments
Open

The failing unit tests in test_transducer_joint.py #89

hubertlu-tw opened this issue Aug 24, 2022 · 0 comments

Comments

@hubertlu-tw
Copy link
hubertlu-tw commented Aug 24, 2022
  • test_transducer_joint_pack_relu_dropout (test_transducer_joint.TransducerJointTest)
  • test_transducer_joint_relu_dropout (test_transducer_joint.TransducerJointTest)
  • test_transducer_joint_vec_pack_relu_dropout (test_transducer_joint.TransducerJointTest)
  • test_transducer_joint_vec_relu_dropout (test_transducer_joint.TransducerJointTest)

The above four unit tests with "dropout" failed with the following error messages:

Traceback (most recent call last):
  File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 149, in test_transducer_joint_pack_relu_dropout
    self.run_transducer_joint(for_vector_kernel=False, pack_output=True, relu=True, dropout=True)
  File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 109, in run_transducer_joint
    mask=mask if dropout else None)
  File "/apex/apex/contrib/test/transducer/transducer_ref.py", line 94, in transducer_joint_reference
    h.backward(h_grad)
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 402, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 193, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [4, 101, 25, 509]], which is output 0 of ReluBackward0, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
  • test_transducer_joint_pack (test_transducer_joint.TransducerJointTest)
  • test_transducer_joint_pack_relu (test_transducer_joint.TransducerJointTest)

The above unit test failed with the following error messages:

Traceback (most recent call last):
  File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 137, in test_transducer_joint_pack_relu
    self.run_transducer_joint(for_vector_kernel=False, pack_output=True, relu=True, dropout=False)
  File "/apex/apex/contrib/test/transducer/test_transducer_joint.py", line 115, in run_transducer_joint
    self.assertTrue(torch.allclose(f_grad_ref, f_grad_tst, atol=1e-5, rtol=1e-5))
AssertionError: False is not true

They are not reproducible with the docker (rocm/pytorch:latest == rocm5.2_ubuntu20.04_py3.7_pytorch_staging) locally. We may need to set them as flaky tests in the future or adjust the tolerance for ROCm.

@hubertlu-tw hubertlu-tw added the bug Something isn't working label Aug 24, 2022
@hubertlu-tw hubertlu-tw removed the bug Something isn't working label Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0