10000 TensorRT produces wrong results when running valid onnx model on GPU 3080 · Issue #4473 · NVIDIA/TensorRT · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

TensorRT produces wrong results when running valid onnx model on GPU 3080 #4473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
coffezh 8000 ou opened this issue May 29, 2025 · 0 comments
Open
Assignees
Labels
Module:Accuracy Output mismatch between TensorRT and other frameworks triaged Issue has been triaged by maintainers

Comments

@coffezhou
Copy link

Description

For the following valid onnx model,

Image
it can be executed by onnxruntime. The results are as follows:

ONNXRuntime:
 [array(0.48268864, dtype=float32)]

However, when I run this model using TensorRT, it produces wrong results as follows:

TensorRT:
 [array(0., dtype=float32)]

Environment

TensorRT Version: 10.11.0.33

NVIDIA GPU: GeForce RTX 3080

NVIDIA Driver Version: 535.183.01

CUDA Version: 12.2

CUDNN Version: none

Operating System: ubuntu 20.04

Python Version (if applicable): 3.12.9

Tensorflow Version (if applicable): none

PyTorch Version (if applicable): none

Baremetal or Container (if so, version): none

Steps To Reproduce

This bug can be reproduced by the following code with the model in the attachment. As shown in the code, the model can be executed by onnxruntime.

from typing import Dict, List, Literal, Optional
import sys
import os

import numpy as np
import onnx
import onnxruntime
from onnx import ModelProto, TensorProto, helper, mapping

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

import argparse
import pickle


def test():
    onnx_model = onnx.load('1111.onnx')
    
    with open("inputs.pkl", "rb") as fp:
        inputs = pickle.load(fp)

    try:
        ort_session = onnxruntime.InferenceSession(
            onnx_model.SerializeToString(), providers=["CPUExecutionProvider"]
        )
        ort_output = ort_session.run([], inputs)
    except Exception as e:
        print(e)
        print("This model cannot be executed by onnxruntime!")
        sys.exit(1)
    
    print("ONNXRuntime:\n", ort_output)
    
    #--------------------------------------------------------
        
    trt_logger = trt.Logger(trt.Logger.WARNING)
    trt.init_libnvinfer_plugins(trt_logger, '')
    builder = trt.Builder(trt_logger)
    #network = builder.create_network()
    network = builder.create_network(flags=1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))

    parser = trt.OnnxParser(network, trt_logger)
    with open('1111.onnx', 'rb') as model_file:
        if not parser.parse(model_file.read()):
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            sys.exit(1)
            
    config = builder.create_builder_config()
    serialized_engine = builder.build_serialized_network(network, config)
    
    with open("engine.trt", "wb") as f:
        f.write(serialized_engine)
        
    with open("engine.trt", "rb") as f, trt.Runtime(trt_logger) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())
        
    context = engine.create_execution_context()

    inputs_trt, outputs_trt, bindings = [], [], []
    stream = cuda.Stream()
    input_name = []
    output_shape_dtype = []
    #------------------------------------------------------------
    for binding in engine:
        size = trt.volume(engine.get_tensor_shape(binding))
        dtype = trt.nptype(engine.get_tensor_dtype(binding))
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        bindings.append({'name':binding, 'address':int(device_mem)})
        
        if engine.get_tensor_mode(binding) == trt.TensorIOMode.INPUT:
            inputs_trt.append({'host': host_mem, 'device': device_mem})
            input_name.append(binding)
        else:
            outputs_trt.append({'host': host_mem, 'device': device_mem})
            output_shape = engine.get_tensor_shape(binding)
            output_shape_dtype.append({'shape':output_shape, 'dtype':dtype})

    for i, input_mem in enumerate(inputs_trt):
        inp = np.ravel(inputs[input_name[i]])
        np.copyto(input_mem['host'], inp)
        cuda.memcpy_htod_async(input_mem['device'], input_mem['host'], stream)

    for bind in bindings:
        name = bind['name']
        addr = bind['address']
        context.set_tensor_address(name, addr)
    
    context.execute_async_v3(stream_handle=stream.handle)
    
    trt_output = []
    for i, output_mem in enumerate(outputs_trt):
        cuda.memcpy_dtoh_async(output_mem['host'], output_mem['device'], stream)
        out_shape = output_shape_dtype[i]['shape']
        out = output_mem['host'].reshape(out_shape)
        trt_output.append(out)

    stream.synchronize()
    
    print("TensorRT:\n",trt_output)

    
if __name__ == "__main__":
    test()
    

testcase.zip

Commands or scripts:

Have you tried the latest release?: yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): the mode can be executed by onnxruntime.

@poweiw poweiw added triaged Issue has been triaged by maintainers Module:Accuracy Output mismatch between TensorRT and other frameworks labels Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module:Accuracy Output mismatch between TensorRT and other frameworks triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants
0