Closed
Description
Model Series
Qwen3
What are the models used?
Qwen3-0.6B
What is the scenario where the problem happened?
使用原生torch进行训练和推理
Is this a known issue?
- I have followed the GitHub README.
- I have checked the Qwen documentation and cannot find an answer there.
- I have checked the documentation of the related framework and cannot find useful information.
- I have searched the issues and there is not a similar one.
Information about environment
python 3.12
显卡 : RTX5080 单张
CUDA Version: 12.9
torch==2.7.0+cu128
transformers==4.51.3
Log output
2.7.0+cu128
4.51.3
C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\TensorCompare.cu:112: block: [0,0,0], thread: [0,0,0] Assertion `input[0] != 0` failed.
Traceback (most recent call last):
File "D:\PycharmProjects\Qwen3\bug_report.py", line 48, in <module>
output_ids = model.generate(**inputs, max_new_tokens=120)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 2465, in generate
result = self._sample(
^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\transformers\generation\utils.py", line 3476, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Description
Steps to reproduce
This happens to Qwen3-0.6B and Qwen3-1.7B(其余未测试).
The problem can be reproduced with the following steps:
运行以下代码
import warnings
import transformers
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", message="Failed to load image Python extension.*")
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, get_scheduler
from transformers.models.qwen2.tokenization_qwen2_fast import Qwen2TokenizerFast
from transformers.models.qwen3.modeling_qwen3 import Qwen3ForCausalLM
from torch.optim import Adam
import os
if __name__ == '__main__':
print(torch.__version__)
print(transformers.__version__)
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
model_name = "../dl_models/Qwen3-0.6B" # local
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer: Qwen2TokenizerFast = AutoTokenizer.from_pretrained(model_name)
model: Qwen3ForCausalLM = AutoModelForCausalLM.from_pretrained(model_name).to(device)
optimizer = Adam(model.parameters(), lr=1e-4, betas=(0.9, 0.95))
optimizer.zero_grad()
model.train()
train_seq = ["<think>\n\n</think>\n\n翻译:行我觉得如果我想要在前赶回宿舍的话,我就得尽快把事情做完。<|im_end|>"]
inputs = tokenizer(train_seq, return_tensors="pt", padding=True).to(device)
outputs = model(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
labels=inputs.input_ids
)
loss = outputs.loss
loss.backward()
optimizer.step()
with torch.no_grad():
model.eval()
test_seq = "<|im_start|>user\n你好。<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
inputs = tokenizer(test_seq, return_tensors="pt").to(device)
output_ids = model.generate(**inputs, max_new_tokens=120)
将会出现如上的报错
其中 model_name = "../dl_models/Qwen3-0.6B" 是HF上拉取的最新模型
Expected results
正确运行,没有报错
Attempts to fix
以下几种方法均可使代码正常运行:
修改device = "cpu"
修改train_seq为其他语句,包括仅仅移除句中一个中文引号(“翻译”两字后)
移除 optimizer.step()
不修改,将上述代码移植到:
RTX2070
CUDA Version: 12.9
torch==1.7.0+cu126
transformers==4.51.3
Anything else helpful for investigation
报错源于:
transformers\generation\utils.py line 3476:
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
中 probs 均为nan
Metadata
Metadata
Assignees
Labels
No labels