Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request. Since I am adding challenging model optimizations and fixing bugs almost daily, I frequently embed potential bugs that would otherwise break through CI's regression testing. Therefore, if you encounter new problems, I recommend that you try a package that is a few versions older, or try the latest package that will be released in a few days.
https://github.com/PINTO0309/onnx2tf/wiki/model_status
-
✔️: Supported ✅: Partial support Help wanted: Pull Request are welcome
See the list of supported layers
OP Status Abs ✔️ Acosh ✔️ Acos ✔️ Add ✔️ And ✔️ ArgMax ✔️ ArgMin ✔️ Asinh ✔️ Asin ✔️ Atanh ✔️ Atan ✔️ AveragePool ✔️ BatchNormalization ✔️ Bernoulli ✔️ BitShift ✔️ BitwiseAnd Help wanted BitwiseNot Help wanted BitwiseOr Help wanted BitwiseXor Help wanted Cast ✔️ Ceil ✔️ Celu ✔️ CenterCropPad Help wanted Clip ✔️ Col2Im ✅ Compress ✔️ ConcatFromSequence ✔️ Concat ✔️ ConstantOfShape ✔️ Constant ✔️ Conv ✔️ ConvInteger ✅ ConvTranspose ✔️ Cosh ✔️ Cos ✔️ CumSum ✔️ DeformConv Help wanted DepthToSpace ✔️ Det ✔️ DequantizeLinear ✔️ DFT Help wanted Div ✔️ Dropout ✔️ DynamicQuantizeLinear ✔️ Einsum ✔️ Elu ✔️ Equal ✔️ Erf ✔️ Expand ✔️ Exp ✔️ EyeLike ✔️ Flatten ✔️ Floor ✔️ FusedConv ✔️ GatherElements ✔️ GatherND ✔️ Gather ✔️ Gelu ✔️ Gemm ✔️ GlobalAveragePool ✔️ GlobalLpPool ✔️ GlobalMaxPool ✔️ GreaterOrEqual ✔️ Greater ✔️ GridSample ✅ GroupNormalization Help wanted GRU ✔️ HammingWindow ✅ HannWindow ✅ Hardmax ✔️ HardSigmoid ✔️ HardSwish ✔️ Identity ✔️ If ✔️ Input ✔️ InstanceNormalization ✔️ Inverse ✔️ IsInf ✔️ IsNaN ✔️ LayerNormalization ✔️ LeakyRelu ✔️ LessOrEqual ✔️ Less ✔️ Log ✔️ LogSoftmax ✔️ Loop Help wanted LpNormalization ✔️ LRN ✔️ LSTM ✔️ MatMul ✔️ MatMulInteger ✔️ MaxPool ✔️ Max ✔️ MaxRoiPool Help wanted MaxUnpool ✔️ Mean ✔️ MeanVarianceNormalization ✔️ MelWeightMatrix ✔️ Min ✔️ Mish ✔️ Mod ✔️ Mul ✔️ Multinomial ✔️ Neg ✔️ NonMaxSuppression ✔️ NonZero ✔️ Optional Help wanted OptionalGetElement ✔️ OptionalHasElement ✔️ Not ✔️ OneHot ✔️ Or ✔️ Pad ✔️ Pow ✔️ PRelu ✔️ QLinearAdd ✔️ QLinearConcat ✔️ QLinearConv ✔️ QLinearLeakyRelu ✔️ QLinearMatMul ✔️ QLinearMul ✔️ QLinearSigmoid ✔️ QLinearSoftmax ✔️ QuantizeLinear ✔️ RandomNormalLike ✔️ RandomNormal ✔️ RandomUniformLike ✔️ RandomUniform ✔️ Range ✔️ Reciprocal ✔️ ReduceL1 ✔️ ReduceL2 ✔️ ReduceLogSum ✔️ ReduceLogSumExp ✔️ ReduceMax ✔️ ReduceMean ✔️ ReduceMin ✔️ ReduceProd ✔️ ReduceSum ✔️ ReduceSumSquare ✔️ Relu ✔️ Reshape ✔️ Resize ✔️ ReverseSequence ✔️ RNN ✔️ RoiAlign ✔️ Round ✔️ ScaleAndTranslate ✔️ Scatter ✔️ ScatterElements ✔️ ScatterND ✔️ Scan Help wanted Selu ✔️ SequenceAt ✔️ SequenceConstruct ✔️ SequenceEmpty ✔️ SequenceErase ✔️ SequenceInsert ✔️ SequenceLength ✔️ Shape ✔️ Shrink ✔️ Sigmoid ✔️ Sign ✔️ Sinh ✔️ Sin ✔️ Size ✔️ Slice ✔️ Softmax ✔️ Softplus ✔️ Softsign ✔️ SpaceToDepth ✔️ Split ✔️ SplitToSequence ✔️ Sqrt ✔️ Squeeze ✔️ STFT ✅ StringNormalizer ✅ Sub ✔️ Sum ✔️ Tanh ✔️ Tan ✔️ TfIdfVectorizer Help wanted ThresholdedRelu ✔️ Tile ✔️ TopK ✔️ Transpose ✔️ Trilu ✔️ Unique ✔️ Unsqueeze ✔️ Upsample ✔️ Where ✔️ Xor ✔️
Video speed is adjusted approximately 50 times slower than actual speed.
- Linux / Windows
- >
- >
- onnx-simplifier==0.4.33 or 0.4.30
(onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] (op_type:Slice, node name: /xxxx/Slice): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (x) vs (y))
- onnx_graphsurgeon
- simple_onnx_processing_tools
- tensorflow==2.15.0, Note: #515, Special bugs: #436
- psutil==5.9.5
- ml_dtypes==0.2.0
- flatbuffers-compiler (Optional, Only when using the
-coion
option. Executable file namedflatc
.)# Custom flatc v23.5.26 binary for Ubuntu 20.04+ # https://github.com/PINTO0309/onnx2tf/issues/196 wget https://github.com/PINTO0309/onnx2tf/releases/download/1.16.31/flatc.tar.gz \ && tar -zxvf flatc.tar.gz \ && sudo chmod +x flatc \ && sudo mv flatc /usr/bin/
Note: If you are using TensorFlow v2.13.0 or earlier, use a version older than onnx2tf v1.17.5. onnx2tf v1.17.6 or later will not work properly due to changes in TensorFlow's API. See: #515
- HostPC
-
When using GHCR, see
Authenticating to the Container registry
# PAT authentication is required to pull from GHCR. docker login ghcr.io Username (xxxx): {Enter} Password: {Personal Access Token} Login Succeeded docker run --rm -it \ -v `pwd`:/workdir \ -w /workdir \ ghcr.io/pinto0309/onnx2tf:1.19.6 or # Authentication is not required for pulls from Docker Hub. docker run --rm -it \ -v `pwd`:/workdir \ -w /workdir \ docker.io/pinto0309/onnx2tf:1.19.6 or pip install -U \ && pip install -U nvidia-pyindex \ && pip install -U onnx-graphsurgeon \ && pip install -U \ && pip install -U \ && pip install -U simple_onnx_processing_tools \ && pip install -U onnx2tf \ && pip install -U h5py==3.7.0 \ && pip install -U psutil==5.9.5 \ && pip install -U ml_dtypes==0.2.0 or pip install -e .
-
or
- Google Colaboratory Python3.10
!sudo apt-get -y update !sudo apt-get -y install python3-pip !sudo apt-get -y install python-is-python3 !wget https://github.com/PINTO0309/onnx2tf/releases/download/1.16.31/flatc.tar.gz \ && tar -zxvf flatc.tar.gz \ && sudo chmod +x flatc \ && sudo mv flatc /usr/bin/ !pip install -U pip \ && pip install tensorflow==2.15.0 \ && pip install -U \ && python -m pip install onnx_graphsurgeon \ --index-url https://pypi.ngc.nvidia.com \ && pip install -U \ && pip install -U \ && pip install -U simple_onnx_processing_tools \ && pip install -U onnx2tf \ && pip install -U protobuf==3.20.3 \ && pip install -U h5py==3.7.0 \ && pip install -U psutil==5.9.5 \ && pip install -U ml_dtypes==0.2.0
Only patterns that are considered to be used particularly frequently are described. In addition, there are several other options, such as disabling Flex OP and additional options to improve inference performance. See: CLI Parameter
# Float32, Float16
# This is the fastest way to generate tflite,
# but the accompanying saved_model will not have a signature.
# "ValueError: Only support at least one signature key."
# If you are having trouble with this error, please use the `-osd` option.
wget https://github.com/PINTO0309/onnx2tf/releases/download/0.0.2/resnet18-v1-7.onnx
onnx2tf -i resnet18-v1-7.onnx
# saved_model with signaturedefs added.
# Output in the form of saved_model that can be used for serving
# or conversion to other frameworks. e.g. TensorFlow.js, CoreML, etc
# https://github.com/PINTO0309/onnx2tf#15-conversion-to-tensorflowjs
# https://github.com/PINTO0309/onnx2tf#16-conversion-to-coreml
wget https://github.com/PINTO0309/onnx2tf/releases/download/0.0.2/resnet18-v1-7.onnx
onnx2tf -i resnet18-v1-7.onnx -osd
# In the interest of efficiency for my development and debugging of onnx2tf,
# the default configuration shows a large amount of debug level logs.
# However, for most users, a large number of debug logs are unnecessary.
# If you want to reduce the amount of information displayed in the conversion log,
# you can change the amount of information in the log by specifying the
# `--verbosity` or `-v` option as follows.
# Possible values are "debug", "info", "warn", and "error".
wget https://github.com/PINTO0309/onnx2tf/releases/download/0.0.2/resnet18-v1-7.onnx
onnx2tf -i resnet18-v1-7.onnx -v info
# Override undefined batch size or other dimensions with static values.
# If the model has undefined dimensions, rewriting them to a static size will significantly
# improve the success rate of the conversion.
# The `-b` option overwrites the zero-dimensional batch size with the number specified
# without input OP name.
# Note that if there are multiple input OPs, the zero dimension of all input OPs is
# forced to be rewritten.
# The `-ois` option allows undefined dimensions in all dimensions, including
# the zero dimensionality, to be overwritten to a static shape, but requires
# the input OP name to be specified.
# e.g. -ois data1:1,3,224,224 data2:1,255 data3:1,224,6
wget https://github.com/PINTO0309/onnx2tf/releases/download/0.0.2/resnet18-v1-7.onnx
onnx2tf -i resnet18-v1-7.onnx -b 1
or
onnx2tf -i resnet18-v1-7.onnx -ois data:1,3,224,224
# Suppress automatic transposition of input OPs from NCW, NCHW, NCDHW to NWC, NHWC, NDHWC.
# onnx2tf is a specification that automatically transposes the input OP to [N,H,W,C] format
# before converting the model. However, since onnx2tf cannot determine from the structure of
# the model whether the input data is image, audio data, or something else, it unconditionally
# transposes the channels. Therefore, it is the models of STT/TTS models where the input is
# not NHWC that tend to have particular problems with the automatic transposition of the
# input OP.
# If you do not want input OPs to be automatically transposed, you can disable automatic
# transposition of input OPs by specifying the `-kat` option.
wget https://github.com/PINTO0309/onnx2tf/releases/download/1.1.28/double_gru.onnx
# INPUT OPs: "spec": float32[1,3,257,1], "states_in": float32[2,1,32]
# The following command suppresses the automatic transposition of "states_in" and converts it.
onnx2tf -i double_gru.onnx -kat states_in
# Keras h5 format
# .h5, .json, .keras, .weights.h5, .weights.keras, .data-00000-of-00001, .index
wget https://github.com/PINTO0309/onnx2tf/releases/download/0.0.2/resnet18-v1-7.onnx
onnx2tf -i resnet18-v1-7.onnx -oh5
# Keras keras_v3 format (TensorFlow v2.12.0 or later only)
wget https://github.com/PINTO0309/onnx2tf/releases/download/0.0.2/resnet18-v1-7.onnx
onnx2tf -i resnet18-v1-7.onnx -okv3
# TensorFlow v1 (.pb) format
wget https://github.com/PINTO0309/onnx2tf/releases/download/0.0.2/resnet18-v1-7.onnx
onnx2tf -i resnet18-v1-7.onnx -otfv1pb
# INT8 Quantization, Full INT8 Quantization
# INT8 Quantization with INT16 activation, Full INT8 Quantization with INT16 activation
# Dynamic Range Quantization
wget https://github.com/PINTO0309/onnx2tf/releases/download/1.1.1/emotion-ferplus-8.onnx
# INT8 Quantization (per-channel)
onnx2tf -i emotion-ferplus-8.onnx -oiqt
# INT8 Quantization (per-tensor)
onnx2tf -i emotion-ferplus-8.onnx -oiqt -qt per-tensor
# Split the model at the middle position for debugging
# Specify the output name of the OP
onnx2tf -i resnet18-v1-7.onnx -onimc resnetv15_stage2_conv1_fwd resnetv15_stage2_conv2_fwd
# Suppress generation of Flex OP and replace with Pseudo-Function
# [Asin, Acos, Atan, Abs, PReLU, LeakyReLU, Power, GatherND, Neg, HardSwish, Erf, GeLU]
# Below is a sample of replacing Erf with another set of operations.
wget https://s3.ap-northeast-2.wasabisys.com/temp-models/onnx2tf_readme/Erf_11.onnx
onnx2tf -i Erf_11.onnx -rtpo Erf
# High-dimensional Transpose decomposition
# If you do not like FlexTranspose being generated, try `-nodaftc`.
# Suppresses the generation of FlexTranspose by decomposing Transpose
# to the specified number of dimensions.
# In TensorFlow v2.12.0 and later, up to 6 dimensions are converted to normal Transpose;
# in v2.11.0 and earlier, up to 5 dimensions are converted to normal Transpose.
# Note that specifying `2` for the `-nodaftc` option causes all Transpose OPs to disappear
# from the model structure.
# Below is an example of decomposing a Transpose of 5 or more dimensions into a Transpose
# of 4 dimensions.
onnx2tf -i xxxx.onnx -nodaftc 4
# High-dimensional Slice(StridedSlice) decomposition
# If your special circumstances do not allow you to deploy a `StridedSlice` with more than
# 5 dimensions to a device, you can use the `-nodafsc` option to decompose the `StridedSlice`
# into a process with 4 or fewer dimensions.
# Below is an example of decomposing a `StridedSlice` of 5 or more dimensions into a
# `StridedSlice` of 4 dimensions.
onnx2tf -i xxxx.onnx -nodafsc 4
# Float16 inference doubling on devices with ARM64 ARMv8.2 or higher instruction set
# Double the inference speed with Float16 precision tflite models on devices with
# high-performance CPUs such as Snapdragon.
# (Pixel 3a, Pixel 5a, Pixel 7, Galaxy M12 and Galaxy S22, ...)
# XNNPACK float16 inference on certain ARM64 cores is 2x faster.
# Unfortunately, Float16 inference cannot be accelerated when using the RaspberryPi4's
# ARM64 CPU.
onnx2tf -i xxxx.onnx -eatfp16
# Parameter replacement (Resize,Transpose,Softmax)
rm replace.json
wget https://github.com/PINTO0309/onnx2tf/releases/download/1.1.27/human_segmentation_pphumanseg_2021oct.onnx
wget https://github.com/PINTO0309/onnx2tf/releases/download/1.1.27/replace.json
onnx2tf -i human_segmentation_pphumanseg_2021oct.onnx -prf replace.json
Perform error checking of ONNX output and TensorFlow output. Verify that the error of all outputs, one operation at a time, is below a certain threshold. Automatically determines before and after which OPs the tool's automatic conversion of the model failed. Know where dimensional compression, dimensional expansion, and dimensional transposition by Reshape
and Traspose
are failing. Once you have identified the problem area, you can refer to the tutorial on Parameter replacement to modify the tool's behavior.
After many upgrades, the need for JSON parameter correction has become much less common, but there are still some edge cases where JSON correction is required. If the PC has sufficient free space in its RAM, onnx2tf will convert the model while carefully performing accuracy checks on all OPs. Thus, at the cost of successful model conversion, the conversion speed is a little slower. If the amount of RAM required for the accuracy check is expected to exceed 80% of the total available RAM capacity of the entire PC, the conversion operation will be performed without an accuracy check. Therefore, if the accuracy of the converted model is found to be significantly degraded, the accuracy may be automatically corrected by re-conversion on a PC with a large amount of RAM. For example, my PC has 128GB of RAM, but the StableDiffusion v1.5 model is too complex in its structure and consumed about 180GB of RAM in total with 50GB of SWAP space.
-ois
an option to overwrite the input OP to a static size if it has undefined dimensions. -cotof
option checks the accuracy of all OPs one by one. -cotoa
is the error value of the threshold for determining an accuracy error. If there are undefined dimensions in the input OP, it is better to fix them to the static geometry to improve the accuracy of the accuracy measurement.
Also, you can use the -cind
option to specify custom input for -cotof
, instead of using the default dummy input. Otherwise, all input values will be set to 1. For more information about the -cind
option, please refer to here.
The -cotof
option only compares the original ONNX and converted TensorFlow (Keras) models at Float32 precision, not at Float16 or INT8 precision.
onnx2tf -i mobilenetv2-12.onnx -ois input:1,3,224,224 -cotof -cotoa 1e-1
or
onnx2tf -i mobilenetv2-12.onnx -b 1 -cotof -cotoa 1e-1
or
onnx2tf -i mobilenetv2-12.onnx -cotof -cotoa 1e-1 -cind "input" "/your/path/x.npy"
If you want to match tflite's input/output OP names and the order of input/output OPs with ONNX, you can use the interpreter.get_signature_runner()
to infer this after using the -coion
/ --copy_onnx_input_output_names_to_tflite
option to output tflite file. See: PINTO0309#228
import torch
import onnxruntime
import numpy as np
import onnx2tf
import tensorflow as tf
from tensorflow.lite.python import interpreter as tflite_interpreter
class Model(torch.nn.Module):
def forward(self, x, y):
return {
"add": x + y,
"sub": x - y,
}
# Let's double check what PyTorch gives us
model = Model()
pytorch_output = model.forward(10, 2)
print("[PyTorch] Model Predictions:", pytorch_output)
# First, export the above model to ONNX
torch.onnx.export(
Model(),
{"x": 10, "y": 2},
"model.onnx",
opset_version=16,
input_names=["x", "y"],
output_names=["add", "sub"],
)
# And check its output
session = onnxruntime.InferenceSession("model.onnx")
onnx_output = session.run(["add", "sub"], {"x": np.array(10), "y": np.array(2)})
print("[ONNX] Model Outputs:", [o.name for o in session.get_outputs()])
print("[ONNX] Model Predictions:", onnx_output)
# Now, let's convert the ONNX model to TF
onnx2tf.convert(
input_onnx_file_path="model.onnx",
output_folder_path="model.tf",
copy_onnx_input_output_names_to_tflite=True,
non_verbose=True,
)
# Now, test the newer TFLite model
interpreter = tf.lite.Interpreter(model_path="model.tf/model_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'x': np.asarray([10], dtype=np.int64),
'y': np.asarray([2], dtype=np.int64),
}
tf_lite_output = tf_lite_model(**inputs)
print("[TFLite] Model Predictions:", tf_lite_output)
[PyTorch] Model Predictions:
{
'add': 12,
'sub': 8
}
[ONNX] Model Outputs:
[
'add',
'sub'
]
[ONNX] Model Predictions:
[
array(12, dtype=int64),
array(8, dtype=int64)
]
[TFLite] Model Predictions:
{
'add': array([12]),
'sub': array([8])
}
If you do not like tflite input/output names such as serving_default_*:0
or StatefulPartitionedCall:0
, you can rewrite them using the following tools and procedures. It can be rewritten from any name to any name, so it does not have to be serving_default_*:0
or StatefulPartitionedCall:0
.
https://github.com/PINTO0309/tflite-input-output-rewriter
# Install custom flatc
wget https://github.com/PINTO0309/onnx2tf/releases/download/1.7.3/flatc.tar.gz \
&& tar -zxvf flatc.tar.gz \
&& sudo chmod +x flatc \
&& sudo mv flatc /usr/bin/ \
&& rm flatc.tar.gz
# Path check
which flatc
/usr/bin/flatc
# Install tfliteiorewriter
pip install -U tfliteiorewriter
-
Before
< 10000 div class="highlight highlight-source-shell notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="tfliteiorewriter \ -i xxxx.tflite \ -r serving_default_input_1:0 aaa \ -r StatefulPartitionedCall:0 bbb">tfliteiorewriter \ -i xxxx.tflite \ -r serving_default_input_1:0 aaa \ -r StatefulPartitionedCall:0 bbb