This is a patch release containing the following changes to v3.8:
- Fixed correctness issue in reorder primitive with non-trivial strides on Intel CPUs (a762d32)
- Fixed runtime error in convolution weight gradient on Xe2 architecture-based Intel GPUs (a8fac73, c409ef9)
- Fixed performance regression in
bf16
convolution on Intel Datacenter GPU Max Series (98170d0, c6bae4a, c5edd53, bb1a591) - Improved performance of
fp16
matmul withfp8
compressed weights on Intel GPUs (58f3ec1, abff176, ffd7dd3, 3b1e855, 2e140de, 3429f79) - Fixed runtime error in
fp16
pooling primitive on Xe2 architecture based Intel GPUs (c0f6b6d) - Improved performance of
fp16
matmul withint4
weights and32 < m <= 64
on Intel GPUs (2fa7072) - Fixed correctness issues in
bf16
matmul with 3 or more dimensional tensors on processors with Intel AMX support (dd20965, ea1b4a1) - Fixed performance regression in
fp16
orbf16
matmul with transposed source and weight tensors on Intel Datacenter GPU Max Series (e45e1aa) - Improved performance of
bf16
matmul withint4
weights on Intel GPUs (7a15c23) - Fixed runtime error in
fp16
SDPA subgraph with head size512
on Intel Core Ultra (Series 2) processor integrated GPU (bde6985)