Releases
v0.24.0
Highlights
Much faster fused attention with support for causal masking
Benchmarks
Improvements in prompt processing speed and memory use, benchmarks
Much faster small batch fused attention for e.g. speculative decoding, benchmarks
Major redesign of CPU back-end for faster CPU-GPU synchronization
Core
Performance
Support fused masking in scaled_dot_product_attention
Support transposed head/seq for fused vector scaled_dot_product_attention
SDPA support for small batch (over sequence) queries
Enabling fused attention for head dim 128
Redesign CPU back-end for faster cpu/gpu synch
Features
Allow debugging in distributed mode
Support mx.fast.rms_norm
without scale
Adds nuclear norm support in mx.linalg.norm
Add XOR on arrays
Added mlx::core::version()
Allow non-square lu in mx.linalg.lu
Double for lapack ops (eigh
, svd
, etc)
Add a prepare tb ring script
Ring docs
Affine quant always in fp32
Optimizers
Add a multi optimizer optimizers.MultiOptimizer
Bug Fixes
Do not define MLX_VERSION
globally
Reduce binary size post fast synch
Fix vmap for flatten
Fix copy for large arrays with JIT
Fix grad with inplace updates
Use same accumulation precision in gemv as gemm
Fix slice data size
Use a heap for small sizes
Fix donation in scan
Ensure linspace always contains start and stop
Raise an exception in the rope op if input is integer
Limit compile buffers by
fix mx.float64
type promotion
Fix CPU SIMD erf_inv
Update smooth_l1_loss in losses.
You canโt perform that action at this time.