8000 [Bug]: NaN in PyTorch SDPA on RTX5080 · Issue #1499 · QwenLM/Qwen3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Bug]: NaN in PyTorch SDPA on RTX5080  #1499
Closed
@O5-7

Description

@O5-7

Model Series

Qwen3

What are the models used?

Qwen3-0.6B

What is the scenario where the problem happened?

使用原生torch进行训练和推理

Is this a known issue?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find an answer there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

python 3.12
显卡 : RTX5080 单张
CUDA Version: 12.9
torch==2.7.0+cu128
transformers==4.51.3

Log output

2.7.0+cu128
4.51.3
C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\TensorCompare.cu:112: block: [0,0,0], thread: [0,0,0] Assertion `input[0] != 0` failed.
Traceback (most recent call last):
  File "D:\PycharmProjects\Qwen3\bug_report.py", line 48, in <module>
    output_ids = model.generate(**inputs, max_new_tokens=120)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 2465, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 3476, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Description

Steps to reproduce

This happens to Qwen3-0.6B and Qwen3-1.7B(其余未测试).
The problem can be reproduced with the following steps:

运行以下代码

import warnings

import transformers

warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", message="Failed to load image Python extension.*")

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, get_scheduler
from transformers.models.qwen2.tokenization_qwen2_fast import Qwen2TokenizerFast
from transformers.models.qwen3.modeling_qwen3 import Qwen3ForCausalLM

from torch.optim import Adam
import os


if __name__ == '__main__':
    print(torch.__version__)
    print(transformers.__version__)

    os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
    model_name = "../dl_models/Qwen3-0.6B" # local
    device = "cuda" if torch.cuda.is_available() else "cpu"

    tokenizer: Qwen2TokenizerFast = AutoTokenizer.from_pretrained(model_name)
    model: Qwen3ForCausalLM = AutoModelForCausalLM.from_pretrained(model_name).to(device)
    optimizer = Adam(model.parameters(), lr=1e-4, betas=(0.9, 0.95))
    optimizer.zero_grad()
    model.train()

    train_seq = ["<think>\n\n</think>\n\n翻译:行我觉得如果我想要在前赶回宿舍的话,我就得尽快把事情做完。<|im_end|>"]
    inputs = tokenizer(train_seq, return_tensors="pt", padding=True).to(device)

    outputs = model(
        input_ids=inputs.input_ids,
        attention_mask=inputs.attention_mask,
        labels=inputs.input_ids
    )
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    with torch.no_grad():
        model.eval()
        test_seq = "<|im_start|>user\n你好。<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
        inputs = tokenizer(test_seq, return_tensors="pt").to(device)
        output_ids = model.generate(**inputs, max_new_tokens=120)

将会出现如上的报错

其中 model_name = "../dl_models/Qwen3-0.6B" 是HF上拉取的最新模型

Expected results

正确运行,没有报错

Attempts to fix

以下几种方法均可使代码正常运行:

修改device = "cpu"
修改train_seq为其他语句,包括仅仅移除句中一个中文引号(“翻译”两字后)
移除 optimizer.step()
不修改,将上述代码移植到:

RTX2070
CUDA Version: 12.9
torch==1.7.0+cu126
transformers==4.51.3

Anything else helpful for investigation

报错源于:

transformers\generation\utils.py line 3476:
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
中 probs 均为nan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0