8000 Torch.compile does not fill force tensor in PT2.7 · Issue #1008 · ACEsuit/mace · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Torch.compile does not fill force tensor in PT2.7 #1008
Open
@vbharadwaj-bk

Description

@vbharadwaj-bk

Describe the bug
In PT2.7, I notice that torch.compile does not appear to fill the force tensor, which left at exactly zero. The compile test passes anyway because the ground truth force is pretty close to zero. To trigger an explicit error (as suggested here NVIDIA/cuEquivariance#77), adding fullgraph=True to torch.compile produces an exception in dynamo.

To Reproduce

  1. Install PT2.7 + MACE, latest from Github.
  2. Modify test_mace in tests/test_compile.py as follows:
# skip if on windows
@pytest.mark.skipif(os.name == "nt", reason="Not supported on Windows")
@pytest.mark.parametrize("device", ["cuda"])
def test_mace(device, default_dtype):  # pylint: disable=W0621
    print(f"using default dtype = {default_dtype}")
    if device == "cuda" and not torch.cuda.is_available():
        pytest.skip(reason="cuda is not available")

    model_defaults = create_mace(device)
    tmp_model = mace_compile.prepare(create_mace)(device)
    model_compiled = torch.compile(tmp_model) # Add fullgraph = True to see the dynamo exception

    batch = create_batch(device)
    output1 = model_defaults(batch, training=True)
    output2 = model_compiled(batch, training=True)

    print(torch.norm(output1["forces"]))
    print(torch.norm(output2["forces"]))
    assert False
    assert_close(output1["energy"], output2["energy"])
    assert_close(output1["forces"], output2["forces"])

Expected behavior
The first tensor is close (but not exactly) zero, the second tensor is suspiciously zero (hence the test is passing). fullgraph=True yields the following exception:

E               torch._dynamo.exc.Unsupported: SKIPPED INLINING <code object forward at 0x7fafdfbb24a0, file "/global/cfs/cdirs/m1982/vbharadw/equivariant_spmm/mace_oeq_integration/mace/modules/symmetric_contraction.py", line 212>: 
E               
E               from user code:
E                  File "/global/cfs/cdirs/m1982/vbharadw/equivariant_spmm/mace_oeq_integration/mace/modules/models.py", line 430, in forward
E                   node_feats = product(
E                 File "/global/cfs/cdirs/m1982/vbharadw/equivariant_spmm/mace_oeq_integration/mace/modules/blocks.py", line 301, in forward
E                   node_feats = self.symmetric_contractions(node_feats, node_attrs)
E                 File "/global/cfs/cdirs/m1982/vbharadw/equivariant_spmm/mace_oeq_integration/mace/modules/symmetric_contraction.py", line 82, in forward
E                   outs = [contraction(x, y) for contraction in self.contractions]
E                 File "/global/cfs/cdirs/m1982/vbharadw/equivariant_spmm/mace_oeq_integration/mace/modules/symmetric_contraction.py", line 82, in <listcomp>
E                   outs = [contraction(x, y) for contraction in self.contractions]
E               
E               Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

/global/cfs/projectdirs/m1982/vbharadw/conda/pt27/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:659: Unsupported

Both tensors are nonzero and the error is not raised in PT2.6.

Platform
NERSC Perlmutter A100 nodes

ETA: I tested this behavior with MACE 3.10 as well; the issue doesn't seem to arise from some new diff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0