[Bug]: NaN in PyTorch SDPA on RTX5080

Model Series

Qwen3

What are the models used?

Qwen3-0.6B

What is the scenario where the problem happened?

使用原生torch进行训练和推理

Is this a known issue?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find an answer there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

python 3.12
显卡 : RTX5080 单张
CUDA Version: 12.9
torch==2.7.0+cu128
transformers==4.51.3

Log output

2.7.0+cu128
4.51.3
C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\TensorCompare.cu:112: block: [0,0,0], thread: [0,0,0] Assertion `input[0] != 0` failed.
Traceback (most recent call last):
  File "D:\PycharmProjects\Qwen3\bug_report.py", line 48, in <module>
    output_ids = model.generate(**inputs, max_new_tokens=120)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 2465, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 3476, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Description

Steps to reproduce

This happens to Qwen3-0.6B and Qwen3-1.7B（其余未测试）.
The problem can be reproduced with the following steps:

运行以下代码

import warnings

import transformers

warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", message="Failed to load image Python extension.*")

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, get_scheduler
from transformers.models.qwen2.tokenization_qwen2_fast import Qwen2TokenizerFast
from transformers.models.qwen3.modeling_qwen3 import Qwen3ForCausalLM

from torch.optim import Adam
import os


if __name__ == '__main__':
    print(torch.__version__)
    print(transformers.__version__)

    os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
    model_name = "../dl_models/Qwen3-0.6B" # local
    device = "cuda" if torch.cuda.is_available() else "cpu"

    tokenizer: Qwen2TokenizerFast = AutoTokenizer.from_pretrained(model_name)
    model: Qwen3ForCausalLM = AutoModelForCausalLM.from_pretrained(model_name).to(device)
    optimizer = Adam(model.parameters(), lr=1e-4, betas=(0.9, 0.95))
    optimizer.zero_grad()
    model.train()

    train_seq = ["<think>\n\n</think>\n\n翻译：行我觉得如果我想要在前赶回宿舍的话，我就得尽快把事情做完。<|im_end|>"]
    inputs = tokenizer(train_seq, return_tensors="pt", padding=True).to(device)

    outputs = model(
        input_ids=inputs.input_ids,
        attention_mask=inputs.attention_mask,
        labels=inputs.input_ids
    )
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    with torch.no_grad():
        model.eval()
        test_seq = "<|im_start|>user\n你好。<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
        inputs = tokenizer(test_seq, return_tensors="pt").to(device)
        output_ids = model.generate(**inputs, max_new_tokens=120)

将会出现如上的报错

其中 model_name = "../dl_models/Qwen3-0.6B" 是HF上拉取的最新模型

Expected results

正确运行，没有报错

Attempts to fix

以下几种方法均可使代码正常运行：

修改device = "cpu"
修改train_seq为其他语句，包括仅仅移除句中一个中文引号（“翻译”两字后）
移除 optimizer.step()
不修改，将上述代码移植到:

RTX2070
CUDA Version: 12.9
torch==1.7.0+cu126
transformers==4.51.3

Anything else helpful for investigation

报错源于：

transformers\generation\utils.py line 3476:
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
中 probs 均为nan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this a known issue?

Information about environment

Log output

Description

Steps to reproduce

Expected results

Attempts to fix

Anything else helpful for investigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this a known issue?

Information about environment

Log output

Description

Steps to reproduce

Expected results

Attempts to fix

Anything else helpful for investigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions