Releases: quic/ai-hub-models
v0.30.2
- Add option
--fetch-static-assets
to export scripts.- This fetches static assets from Hugging Face (pinned to the qai-hub-models version) instead of compiling through AI Hub. Example:
python -m qai_hub_models.models.ddrnet23_slim.export --fetch-static-assets --output-dir ddrnet
- This fetches static assets from Hugging Face (pinned to the qai-hub-models version) instead of compiling through AI Hub. Example:
- Fix bug causing
--eval-mode on-device
not to work. - Changes in DDRnet23-Slim (
ddrnet23_slim
):- Add CityScapes evaluation
- Change input shape to 1024x2048 (now consistent with model card)
- Add Carvana evaluation dataset, used by:
- Unet-Segmentation (
unet_segmentation
)
- Unet-Segmentation (
- New variants added:
- VIT (
vit
) / w8a16 / ORT - Segformer_Base (
segformer_base
) / w8a16 / ORT
- VIT (
- Models that were temporarily removed and have been re-instated:
- Riffusion (
riffusion
) - ControlNet (
controlnet
)
- Riffusion (
- The following variants have been removed due to severe numerical accuracy issues:
- Beit (
beit
) / w8a16 / ORT - Depth-Anything (
depth_anything
) / w8a16 / ORT - Depth-Anything-V2 (
depth_anything_v2
) / w8a16 / ORT - EfficientNet-B4 (
efficientnet_b4
) / w8a16 / ORT - MobileNet-v3-Small (
mobilenet_v3_small
) / w8a16 / ORT - EffficientViT-l2-seg (
efficientvit_l2_seg
) / float / ORT - Posenet-Mobilenet (
posenet_mobilenet
) / w8a8 / QNN - Posenet-Mobilenet (
posenet_mobilenet
) / w8a8 / TFLite
- Beit (
v0.29.1
v0.29
- Llama 3 changes:
- As part of a refactor, these models are re-quantized and may differ slightly without being worse.
- Model name changed from
_chat
to_instruct
(these models have always been the "Instruct" version so now the name reflects that correctly):llama_v3_8b_chat
->llama_v3_8b_instruct
llama_v3_1_8b_chat
->llama_v3_1_8b_instruct
llama_v3_2_3b_chat
->llama_v3_2_3b_instruct
- Whisper small enabled on ONNX
v0.28.2
v0.28.1
New additions:
- Added quantization + evaluation piplines for the following models:
- yolov8_seg
- yolov11_seg
- segformer_base
- mobile_vit
- mask2former
- conditional/deformable detr
- New model fomm
- Allow specifying qnn_context_binary as a target runtime in export
- Fixed export script for 5 LLMs:
- baichuan2_7b
- controlnet
- mistral_7b_instruct_v0_3
- qwen2_7b_instruct
- riffusion
- Website was updated to show quantized and unquantized performance info for a single model in the same webpage
Bug fixes and improvements:
- Removed aimet-torch dependency entirely; all models use quantize job or aimet-onnx; As a result, no models are constrained to linux only
- Allow variable input size for yolov7
- Updated performance numbers for all models
- Moved several quantized models to w8a16 due to accuracy issues with w8a8
- Whisper demo was fixed to run locally on X elite
- Quantized model folders were deleted. Use --quantize in the export script of the unquantized model to create a quantized model
v0.27.1
v0.27
Models
- Added MobileSAM
- Removed YoloNAS and OpenPose
LLama Python Demo
- Added GPU Support
- Allow loading prompt from a file
- Allow passing "raw" prompt to model instead of prepending / appending prompt helper tags
- Improved top-k/top-p sampling. Prevents poor results.
v0.26.1
New Models:
New Features:
- Allow customizing number of default calibration samples for quantize job.
- CLIP has been rewritten as a single model
- All YOLO class outputs export as int8
Deprecation Note:
Most model folders with the suffix _quantized are being deprecated. You should use the equivalent un-quantized model folder instead.
When running export.py or evaluate.py, use the --quantize or --precision flag to choose the desired precision. The README file for each model will outline what precisions are supported if the model can be quantize
0.25.5
0.25.2
New models:
- BGNet (
bgnet
) - BiseNet (
bisetnet
) - DeformableDETR (
deformable_detr
) - NASNet (
nasnet
) - Nomic-Embed-Text (
nomic_embed_text
) - PidNet (
pidnet
) - ResNet-2Plus1D-Quantized (
resnet_2plus1d_quantized
) - ResNet-3D-Quantized (
resnet_3d_quantized
) - ResNet-Mixed-Convolution-Quantized (
resnet_mixed_quantized
) - RTMDet (
rtmdet
) - Video-MAE (
video_mae
) - Video-MAE-Quantized (
video_mae_quantized
) - YamNet (
yamnet
) - Yolo-X (
yolox
)
Reinstated models:
- ConvNext-Tiny-W8A16-Quantized (
convnext_tiny_w8a16_quantized
) - Midas-Quantized (
midas_quantized
) - Simple-Bev (
simple_bev_cam
) - Stable-Diffusion-v1.5 (
stable_diffusion_v1_5_w8a16_quantized
) - Whisper-Medium-En (
whisper_medium_en
)
Performance numbers:
- Upgraded performance numbers, including an upgrade to QAIRT 2.32.
Bug fixes
- Llama3-TAIDE-LX-8B-Chat-Alpha1 export bug fix.
- Various minor bug fixes.