Releases · quic/ai-hub-models

Add option --fetch-static-assets to export scripts.
- This fetches static assets from Hugging Face (pinned to the qai-hub-models version) instead of compiling through AI Hub. Example:
```
python -m qai_hub_models.models.ddrnet23_slim.export --fetch-static-assets --output-dir ddrnet
```
Fix bug causing --eval-mode on-device not to work.
Changes in DDRnet23-Slim (ddrnet23_slim):
- Add CityScapes evaluation
- Change input shape to 1024x2048 (now consistent with model card)
Add Carvana evaluation dataset, used by:
- Unet-Segmentation (unet_segmentation)
New variants added:
- VIT (vit) / w8a16 / ORT
- Segformer_Base (segformer_base) / w8a16 / ORT
Models that were temporarily removed and have been re-instated:
- Riffusion (riffusion)
- ControlNet (controlnet)
The following variants have been removed due to severe numerical accuracy issues:
- Beit (beit) / w8a16 / ORT
- Depth-Anything (depth_anything) / w8a16 / ORT
- Depth-Anything-V2 (depth_anything_v2) / w8a16 / ORT
- EfficientNet-B4 (efficientnet_b4) / w8a16 / ORT
- MobileNet-v3-Small (mobilenet_v3_small) / w8a16 / ORT
- EffficientViT-l2-seg (efficientvit_l2_seg) / float / ORT
- Posenet-Mobilenet (posenet_mobilenet) / w8a8 / QNN
- Posenet-Mobilenet (posenet_mobilenet) / w8a8 / TFLite

Critical fixes to the export scripts for several LLMs / Gen AI models:
- Llama 3.0
- Llama 3.1
- Llama 3.2
- Baichuan 2 7B
- Mistral 7B v0.3
- Qwen 2 7B
- Riffusion
- ControlNet

Llama 3 changes:
- As part of a refactor, these models are re-quantized and may differ slightly without being worse.
- Model name changed from _chat to _instruct (these models have always been the "Instruct" version so now the name reflects that correctly):
  - llama_v3_8b_chat -> llama_v3_8b_instruct
  - llama_v3_1_8b_chat -> llama_v3_1_8b_instruct
  - llama_v3_2_3b_chat -> llama_v3_2_3b_instruct
Whisper small enabled on ONNX

Add streaming support for the Whisper demo.
Add an ONNX session wrapper that makes the session interoperable with PyTorch-based pipelines.
Fix issue in the llama script where profiling would fail.

New additions:
- Added quantization + evaluation piplines for the following models:
- yolov8_seg
- yolov11_seg
- segformer_base
- mobile_vit
- mask2former
- conditional/deformable detr
- New model fomm
- Allow specifying qnn_context_binary as a target runtime in export
- Fixed export script for 5 LLMs:
- baichuan2_7b
- controlnet
- mistral_7b_instruct_v0_3
- qwen2_7b_instruct
- riffusion
- Website was updated to show quantized and unquantized performance info for a single model in the same webpage

Bug fixes and improvements:
- Removed aimet-torch dependency entirely; all models use quantize job or aimet-onnx; As a result, no models are constrained to linux only
- Allow variable input size for yolov7
- Updated performance numbers for all models
- Moved several quantized models to w8a16 due to accuracy issues with w8a8
- Whisper demo was fixed to run locally on X elite
- Quantized model folders were deleted. Use --quantize in the export script of the unquantized model to create a quantized model

Updated YOLOv7 export to support varying the image input shape.

Models

Added MobileSAM
Removed YoloNAS and OpenPose

LLama Python Demo

Added GPU Support
Allow loading prompt from a file
Allow passing "raw" prompt to model instead of prepending / appending prompt helper tags
Improved top-k/top-p sampling. Prevents poor results.

New Models:

New Features:

Allow customizing number of default calibration samples for quantize job.
CLIP has been rewritten as a single model
All YOLO class outputs export as int8

Deprecation Note:
Most model folders with the suffix _quantized are being deprecated. You should use the equivalent un-quantized model folder instead.
When running export.py or evaluate.py, use the --quantize or --precision flag to choose the desired precision. The README file for each model will outline what precisions are supported if the model can be quantize

Fixes

Important fix for export scripts when using qai-hub 0.26.0.

New models:

BGNet (bgnet)
BiseNet (bisetnet)
DeformableDETR (deformable_detr)
NASNet (nasnet)
Nomic-Embed-Text (nomic_embed_text)
PidNet (pidnet)
ResNet-2Plus1D-Quantized (resnet_2plus1d_quantized)
ResNet-3D-Quantized (resnet_3d_quantized)
ResNet-Mixed-Convolution-Quantized (resnet_mixed_quantized)
RTMDet (rtmdet)
Video-MAE (video_mae)
Video-MAE-Quantized (video_mae_quantized)
YamNet (yamnet)
Yolo-X (yolox)

Reinstated models:

ConvNext-Tiny-W8A16-Quantized (convnext_tiny_w8a16_quantized)
Midas-Quantized (midas_quantized)
Simple-Bev (simple_bev_cam)
Stable-Diffusion-v1.5 (stable_diffusion_v1_5_w8a16_quantized)
Whisper-Medium-En (whisper_medium_en)

Performance numbers:

Upgraded performance numbers, including an upgrade to QAIRT 2.32.

Bug fixes

Llama3-TAIDE-LX-8B-Chat-Alpha1 export bug fix.
Various minor bug fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: quic/ai-hub-models

v0.30.2

Uh oh!

v0.29.1

Uh oh!

v0.29

Uh oh!

v0.28.2

Uh oh!

v0.28.1

Uh oh!

v0.27.1

Uh oh!

v0.27

Uh oh!

v0.26.1

Uh oh!

0.25.5

Uh oh!

0.25.2

Uh oh!