10000 Release v0.24.0 ยท ml-explore/mlx ยท GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

v0.24.0

Compare
Choose a tag to compare
@jagrit06 jagrit06 released this 20 Mar 22:31
· 160 commits to main since this release
1177d28

Highlights

  • Much faster fused attention with support for causal masking
    • Benchmarks
    • Improvements in prompt processing speed and memory use, benchmarks
    • Much faster small batch fused attention for e.g. speculative decoding, benchmarks
  • Major redesign of CPU back-end for faster CPU-GPU synchronization

Core

Performance

  • Support fused masking in scaled_dot_product_attention
  • Support transposed head/seq for fused vector scaled_dot_product_attention
  • SDPA support for small batch (over sequence) queries
  • Enabling fused attention for head dim 128
  • Redesign CPU back-end for faster cpu/gpu synch

Features

  • Allow debugging in distributed mode
  • Support mx.fast.rms_norm without scale
  • Adds nuclear norm support in mx.linalg.norm
  • Add XOR on arrays
  • Added mlx::core::version()
  • Allow non-square lu in mx.linalg.lu
  • Double for lapack ops (eigh, svd, etc)
  • Add a prepare tb ring script
  • Ring docs
  • Affine quant always in fp32

Optimizers

  • Add a multi optimizer optimizers.MultiOptimizer

Bug Fixes

  • Do not define MLX_VERSION globally
  • Reduce binary size post fast synch
  • Fix vmap for flatten
  • Fix copy for large arrays with JIT
  • Fix grad with inplace updates
  • Use same accumulation precision in gemv as gemm
  • Fix slice data size
  • Use a heap for small sizes
  • Fix donation in scan
  • Ensure linspace always contains start and stop
  • Raise an exception in the rope op if input is integer
  • Limit compile buffers by
  • fix mx.float64 type promotion
  • Fix CPU SIMD erf_inv
  • Update smooth_l1_loss in losses.
0