Some problems I found in Vitis AI 1.4.1 and Vitis AI 2.x with quantization and compilation step #997

vaan2010 · 2022-09-02T07:22:36Z

[English Version]
Hi, I found the problem of quantize and compile in Vitis AI 1.4.1 and Vitis AI 2.x respectively

First mention the part of Vitis AI 1.4.1 or 1.4
The environment and training model sources I use is the following:

Vitis AI github 1.4.1
Vitis AI Docker Image 1.4.1.978
Tensorflow 2.6 for Training
Reference Yolov4-tiny from this github

Let me mention again, because the original reference Yolov4-tiny has tf.split operator, it must be changed to conv 1x1 to convert xmodel, because Vitis AI does not support tf.split yet

After training the model, I use the following code to quantize:

from dataset import synth_input_fn
from dataset import input_fn, NUM_IMAGES
from dataset import get_images_infor_from_file, ImagenetSequence
from nets.yolo import yolo_body
from utils.utils import get_classes
from tensorflow_model_optimization.quantization.keras import vitis_quantize
import tensorflow as tf

input_shape     = [416, 416]
anchors_mask    = [[3, 4, 5], [1, 2, 3]]
phi             = 0
classes_path    = 'model_data/voc_classes.txt'
weight_decay    = 0
model_path      = './yolov4-tiny.h5'
TF2_NETWORK_PATH = '../../../'

img_paths, labels = get_images_infor_from_file(TF2_NETWORK_PATH+'images/', TF2_NETWORK_PATH+'val.txt', 1)
imagenet_seq = ImagenetSequence(img_paths[0:1000], labels[0:1000], 50)

class_names, num_classes = get_classes(classes_path)

model_body  = yolo_body((input_shape[0], input_shape[1], 3), anchors_mask, num_classes, phi = phi, weight_decay = weight_decay)

model_body.load_weights(model_path)

quantizer = vitis_quantize.VitisQuantizer(model_body)

quantized_model = quantizer.quantize_model( calib_dataset=imagenet_seq, calib_batch_size=10, 
                                            calib_steps=2, fold_conv_bn=True, fold_bn=True, 
                                            separate_conv_act=False)
# save quantized model
quantized_model.save('./quantized_model_tf2.h5')

After quantize the model, I use the following command to compile:

export TF2_NETWORK_PATH='tf2_resnet50_imagenet_224_224_7.76G_2.0'

vai_c_tensorflow2 -m ${TF2_NETWORK_PATH}/code/com/quantized_model_tf2.h5 \
                  -a /opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json \
                  -o ${TF2_NETWORK_PATH}/vai_c_output \
                  -n yolov4-tiny \

The result of compile is as follows:

The compiled model in Netron is as follows:

It can be seen that Vitis AI 1.4.1 does not seem to support the operation of Leaky ReLU, so the DPU operation is divided into many subgraphs during the compilation process, and the compiled xmodel cannot be run on KV260, the following error will be displayed:

But I have successfully compiled Leaky ReLU before and let the DPU form a single graph. The previous model is as follows:

You can see that Leaky ReLU is successfully included in Conv2d and supports DPU

Also using Vitis AI 1.4.1 for conversion, why not now? Does the code in my quantize have anything to do with it?

So I tried to use Vitis AI 2.x for quantize and compile
Unfortunately, Vitis AI 2.x still has its own problems
I refer the solution from this github
But during the quantize process of Vitis AI 2.x, the following error message keeps appearing

I checked the overall model architecture and did not find the shape [14, 14, 256], and I also checked that there is nothing wrong with the operation of Concat, so I think Vitis AI 2.x has a bug in quantize for Concat

Conclusion：

Does Vitis AI 1.4.x really support Leaky ReLU? If yes, is there a specific way of writing in the quantize code so that Leaky ReLU can be successfully compiled by Vitis AI 1.4.x?
There is a bug in the quantize of Concat operation in Vitis AI 2.x. How to solve it?

In the end I replaced Leaky ReLU with ReLU and retrained the model, here is the image after retraining and converted to xmodel:

and can run successfully on KV260, but
3. Compared with Leaky ReLU, ReLU is obviously much slower in execution speed. Is this another bug in the DPU conversion process?

Attached below is the xmodel file that I successfully and failed to compile in Vitis AI 1.4.x and run on KV260:
Success: yolov4-tiny_success.xmodel
Fail: yolov4-tiny_fail.xmodel

Hope someone could give me some suggestions and solutions, thanks!

BR,
Norris

[Chinese Version]
嗨，我在Vitis AI 1.4.1和Vitis AI 2.x的版本中，各自发现了quantize和compile的问题

先提Vitis AI 1.4.1或是1.4的部分
我使用的环境和训练model来源

Vitis AI github 1.4.1
Vitis AI Docker Image 1.4.1.978
Tensorflow 2.6 for Training
Reference Yolov4-tiny from this github

再提一下，因为原本参考的Yolov4-tiny里头有tf.split的操作算子，必须改成conv 1x1才能进行xmodel的转换，原因是Vitis AI还不支援tf.split

训练好model之后，我使用以下code进行quantize:

from dataset import synth_input_fn
from dataset import input_fn, NUM_IMAGES
from dataset import get_images_infor_from_file, ImagenetSequence
from nets.yolo import yolo_body
from utils.utils import get_classes
from tensorflow_model_optimization.quantization.keras import vitis_quantize
import tensorflow as tf

input_shape     = [416, 416]
anchors_mask    = [[3, 4, 5], [1, 2, 3]]
phi             = 0
classes_path    = 'model_data/voc_classes.txt'
weight_decay    = 0
model_path      = './yolov4-tiny.h5'
TF2_NETWORK_PATH = '../../../'

img_paths, labels = get_images_infor_from_file(TF2_NETWORK_PATH+'images/', TF2_NETWORK_PATH+'val.txt', 1)
imagenet_seq = ImagenetSequence(img_paths[0:1000], labels[0:1000], 50)

class_names, num_classes = get_classes(classes_path)

model_body  = yolo_body((input_shape[0], input_shape[1], 3), anchors_mask, num_classes, phi = phi, weight_decay = weight_decay)

model_body.load_weights(model_path)

quantizer = vitis_quantize.VitisQuantizer(model_body)

quantized_model = quantizer.quantize_model( calib_dataset=imagenet_seq, calib_batch_size=10, 
                                            calib_steps=2, fold_conv_bn=True, fold_bn=True, 
                                            separate_conv_act=False)
# save quantized model
quantized_model.save('./quantized_model_tf2.h5')

quantize结束后的model，我使用以下的指令进行compile:

export TF2_NETWORK_PATH='tf2_resnet50_imagenet_224_224_7.76G_2.0'

vai_c_tensorflow2 -m ${TF2_NETWORK_PATH}/code/com/quantized_model_tf2.h5 \
                  -a /opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json \
                  -o ${TF2_NETWORK_PATH}/vai_c_output \
                  -n yolov4-tiny \

compile的结果如下：

而compile后的model在Netron里的图如下：

可以看到Vitis AI 1.4.1似乎不支援Leaky ReLU的运算，因此在compile的过程中将DPU的运算分成了许多subgraph，并且这个compile后的xmodel是不能在KV260上运行的，会显示以下错误：

但我之前有成功compile过Leaky ReLU并让DPU形成单一graph，之前的model形式如下：

你可以看到Leaky ReLU是有成功包含在Conv2d里面并支持DPU的

同样是使用Vitis AI 1.4.1进行转换，为何现在不行了？是否在我quantize里面的code有关系呢？

因此我尝试了使用Vitis AI 2.x来做quantize和compile
可惜的是，Vitis AI 2.x依旧有自己的问题存在
我参考了这篇的解决方式
但在Vitis AI 2.x的quantize过程中，一直出现下面的错误讯息

我查看了整体的model架构，并没有找到[14, 14, 256]这个shape，并且也察看过Concat这个运算中是没有错的，因此我认为Vitis AI 2.x在quantize中对于Concat是有bug的

结论：

Vitis AI 1.4.x是否真的支援Leaky ReLU？若有，是否在quantize的code中有特定的撰写方式才能让Leaky ReLU成功的被Vitis AI 1.4.x compile？
Vitis AI 2.x中对于Concat运算的quantize有bug存在，请问当前该如何解决？

最后我将Leaky ReLU换成了ReLU并重新训练模型，下面是重新训练后并转换成xmodel的图：

并可以成功在KV260上运行，但是
3. 相比于Leaky ReLU来说，ReLU在执行速度上很明显的慢了许多，这是否又是一个在DPU转换过程中的bug呢？

以下附件是我在Vitis AI 1.4.x中compile後成功跟失败在KV260上运行的xmodel档案:
成功: yolov4-tiny_success.xmodel
失败: yolov4-tiny_fail.xmodel

希望有人能给我一些建议和解决方式，感谢！

BR,
Norris

The text was updated successfully, but these errors were encountered:

zhenzhen-AMD · 2022-09-06T05:48:13Z

Hi @vaan2010 ,
We're working to resolve this issue, and we'll let you know if there's progress.

zhenzhen-AMD · 2022-09-06T05:52:08Z

Hi @vaan2010 ,

Can floating point and quantized models and code be provided? We need these files to analyze the problem.
Thanks.

Regards

vaan2010 · 2022-09-06T06:16:34Z

Hi @zhenzhen-AMD,

The following attchment is my model files:

Floating point model: yolov4-tiny-float.h5
Quantized model: yolov4-tiny-quantized.h5
Quantized yolov4-tiny code: code

And I put the quantized code and my float model at Xilinx Vitis AI 1.4 Lab location:
vai_q_c_tf2_pt/lab/tf2_resnet50_imagenet_224_224_7.76G_1.4/code/com/
then put the above files under Vitis AI

Looking forward your reply and solution.

BR,
Norris

vaan2010 · 2022-09-13T02:10:50Z

Hi @zhenzhen-AMD,

Is there any progress in these problems about Vitis AI 1.4.1 and Vitis AI 2.x with quantization and compilation step?

vaan2010 · 2022-11-02T09:18:55Z

Hi @zhenzhen-AMD ,
It's for almost two months to wait the solution for this issue, could you provide me any progress about this?

Thanks.

zhenzhen-AMD · 2022-11-03T05:22:11Z

Hi @vaan2010 ，

Sorry for the late reply, I have been busy developing recently. This will be dealt with later.

Best Regards,
zhenzhen

zhenzhen-AMD · 2022-12-26T11:09:17Z

Hi @vaan2010 ,

[English Version]
Sorry for replying so late. I reproduce this issue with vitis ai docker 2.5. The reappearance results are as follows:

quantization model stage: leaky_relu and concat are no problem.
The compilation stage has the following problems：
The current compiler issue has been fixed.

A new docker will be provided later. Please use the new docker. thank you very much.

[Chinese Version]
很抱歉如此晚回复。我用vitis ai docker 2.5 复现了这个问题。复现结果如下：

量化模型阶段： leaky_relu 和 concat 没有问题。
编译阶段存在如下问题：
目前编译器的问题已经被修复了。

稍后新的 docker 将会被提供。请您使用新的docker。非常感谢。

Regards，
Zhenzhen

vaan2010 · 2023-01-09T09:20:33Z

Hi @zhenzhen-AMD,

Thanks for your reply!
I have another problem about if Leakly Relu is supported by Vitis AI 1.4 and the size of shape is not the same error in Vitis AI 2.5 as my description as title.

Looking forward your reply!

BR,
Norris

* update url of replace_pytorch.sh * fix the docker install html link (#994) Co-authored-by: qiuyuny <qiuyuny@xilinx.com> * 3.0 docker fix (#996) Co-authored-by: qiuyuny <qiuyuny@xilinx.com>

zhenzhen-AMD · 2023-01-17T07:09:33Z

Hi @vaan2010 ，

Leaky ReLU has been supported in vai-1.4. With the version update, we improved the support for it.

It is recommended to use the latest release of VIitis AI 3.0 docker to run your model.

Get the latest docker reference documentation:
https://xilinx.github.io/Vitis-AI/docs/install/install.html#option-2-build-the-docker-container-from-xilinx-recipes

If you still have problems using the latest 3.0 docker, please give us feedback in time, thank you.

vaan2010 · 2023-01-31T07:11:54Z

Hi @zhenzhen-AMD,

Leaky ReLU has been supported in vai-1.4
But when I use Vitis AI 1.4 to convert Tensorflow2 yolov4-tiny model to xmodel, it always has the following result:

That means using Leaky ReLU will cut xmodel graph to multiple subgraphs, just like this issue:
#593
is that right?

* update url of replace_pytorch.sh * fix the docker install html link (Xilinx#994) Co-authored-by: qiuyuny <qiuyuny@xilinx.com> * 3.0 docker fix (Xilinx#996) Co-authored-by: qiuyuny <qiuyuny@xilinx.com> Former-commit-id: 2057c58

huisunCompiler · 2023-06-25T02:37:47Z

Hi @vaan2010 ,
The compiler doesn't recognize conv2d-fix + fix2float+ leaky-relu +float2fix. From the point of compiler, we need to delete the fix op between conv2d and leaky-relu from the original xmodel.

Here I use the following code to delete the fix between conv2d and leaky-relu manually... For detailed information about why the quantization team insert the fix between the conv2d and leaky-relu. @zhenzhen-AMD please provide more help. Many Thanks

import xir
g = xir.Graph.deserialize("quantized-yolov4-tiny.xmodel")
ops = g.toposort()
relu_ops = [op for op in ops if op.get_fanout_num() >= 1 and op.get_fanout_ops()[0].get_type() == "leaky-relu"]
for op in relu_ops:
    succ = op.get_fanout_ops()[0]
    succ.replace_input_ops(op, op.get_input_ops()["input"][0])
for op in relu_ops:
    g.remove_op(op)
g.serialize("quantized-yolov4-tiny_modify.xmodel")

xcompiler -i quantized-yolov4-tiny_modify.xmodel -o quantized-yolov4-tiny_compiled_DPUCZDX8G_ISA0_B4096_MAX_BG2.xmodel -t DPUCZDX8G_ISA0_B4096_MAX_BG2

zhenzhen-AMD · 2023-07-13T08:29:00Z

Hi @vaan2010 ,
The fix between conv2d and leaky-relu is a bug in the 1.4 quantization tool.
This bug has been fixed. Please use the latest released Docker version. Thank you.

vaan2010 mentioned this issue Sep 2, 2022

Vitis AI 2.5 compile error when using tensorflow2 yolov4-tiny model #963

Closed

janifer112x added a commit that referenced this issue Jan 13, 2023

3.0 fix (#997)

2057c58

* update url of replace_pytorch.sh * fix the docker install html link (#994) Co-authored-by: qiuyuny <qiuyuny@xilinx.com> * 3.0 docker fix (#996) Co-authored-by: qiuyuny <qiuyuny@xilinx.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some problems I found in Vitis AI 1.4.1 and Vitis AI 2.x with quantization and compilation step #997

Some problems I found in Vitis AI 1.4.1 and Vitis AI 2.x with quantization and compilation step #997

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Some problems I found in Vitis AI 1.4.1 and Vitis AI 2.x with quantization and compilation step #997

Some problems I found in Vitis AI 1.4.1 and Vitis AI 2.x with quantization and compilation step #997

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!