How to convert PyTorch model to FP16 TFLite? #169

tumuyan · 2025-02-25T15:18:12Z

The TFLite model supports FP16 quantization, which is known toreduce the application size. Different with INT8 and INT16 quantization, FP16 does not require calibration.
However, Qualcomm's documentation does not provide the example how to optimize for FP16.
Could you provide some suggestions on how to convert a PyTorch model to an FP16 TFLite model? pytorch -> onnx -> fp16 onnx -> tflite or others way?
Or FP16 quantization is not recommended?

kory · 2025-03-06T22:27:03Z

Do you need weight quantization?

If not, we run your model with fp16 activations for GPU and NPU by default. No explicit quantization is needed; rather it's just a set of runtime flags (that AI Hub enables by default). See our default settings here:
https://github.com/quic/ai-hub-apps/blob/main/apps/android/tflite_helpers/TFLiteHelpers.java

mestrona-3 added the question Please ask any questions on Slack. This issue will be closed once responded to. label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert PyTorch model to FP16 TFLite? #169

How to convert PyTorch model to FP16 TFLite? #169

How to convert PyTorch model to FP16 TFLite? #169

How to convert PyTorch model to FP16 TFLite? #169

Comments