8000 Release v3.8.1 · uxlfoundation/oneDNN · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

v3.8.1

Latest
Compare
Choose a tag to compare
@vpirogov vpirogov released this 27 May 22:56
· 1 commit to rls-v3.8 since this release

This is a patch release containing the following changes to v3.8:

  • Fixed correctness issue in reorder primitive with non-trivial strides on Intel CPUs (a762d32)
  • Fixed runtime error in convolution weight gradient on Xe2 architecture-based Intel GPUs (a8fac73, c409ef9)
  • Fixed performance regression in bf16 convolution on Intel Datacenter GPU Max Series (98170d0, c6bae4a, c5edd53, bb1a591)
  • Improved performance of fp16 matmul with fp8 compressed weights on Intel GPUs (58f3ec1, abff176, ffd7dd3, 3b1e855, 2e140de, 3429f79)
  • Fixed runtime error in fp16 pooling primitive on Xe2 architecture based Intel GPUs (c0f6b6d)
  • Improved performance of fp16 matmul with int4 weights and 32 < m <= 64 on Intel GPUs (2fa7072)
  • Fixed correctness issues in bf16 matmul with 3 or more dimensional tensors on processors with Intel AMX support (dd20965, ea1b4a1)
  • Fixed performance regression in fp16 or bf16 matmul with transposed source and weight tensors on Intel Datacenter GPU Max Series (e45e1aa)
  • Improved performance of bf16 matmul with int4 weights on Intel GPUs (7a15c23)
  • Fixed runtime error in fp16 SDPA subgraph with head size 512 on Intel Core Ultra (Series 2) processor integrated GPU (bde6985)
0