This repository contains a performance investigation of various matrix multiplication implementations on CPU. The project uses picobench for benchmarking.
This investigation compares the performance of different matrix multiplication algorithms:
- Naive implementation
- Naive implementation with accumulator
- Cache-friendly implementation
- SIMD implementation
- Cache-friendly implementation with OpenMP parallelization
- Advanced SIMD implementation
- Eigen library implementation
The benchmarks were run on an Apple M1 Pro processor. Here are the latest results:
Name (* = baseline) | Dim | Total ms | ns/op | Baseline | Ops/second |
---|---|---|---|---|---|
mat_mul_naive_b * | 1 | 1730.107 | 173010e4 | - | 0.6 |
mat_mul_naive_acc_b | 1 | 1645.850 | 164585e4 | 0.951 | 0.6 |
mat_mul_cache_b | 1 | 107.201 | 107201e3 | 0.062 | 9.3 |
mat_mul_simd_b | 1 | 45.826 | 45825750 | 0.026 | 21.8 |
mat_mul_simd_advanced_b | 1 | 17.634 | 17633750 | 0.010 | 56.7 |
mat_mul_cache_omp_b | 1 | 16.527 | 16527500 | 0.010 | 60.5 |
mat_mul_eigen_b | 1 | 9.785 | 9785125 | 0.006 | 102.2 |
Note:
- All benchmarks were performed on an Apple M1 Pro processor.
- The 'Dim' column indicates that each implementation is run once.
- Each run performs multiplication of 1000x1000 matrices (N = 1000 in the code).
The repository includes the following matrix multiplication implementations:
mat_mul_naive
: A basic triple-nested loop implementationmat_mul_naive_acc
: A slightly optimized version of the naive implementation using an accumulatormat_mul_cache
: A cache-friendly implementation with reordered loopsmat_mul_simd
: An implementation using SIMD instructions (ARM NEON)mat_mul_cache_omp
: A cache-friendly implementation using OpenMP for parallelizationmat_mul_simd_advanced
: An advanced implementation combining SIMD and cache optimization techniquesmat_mul_eigen
: An implementation using the Eigen linear algebra library
To run the benchmarks:
make run
- picobench: A micro-benchmarking library for C++
- OpenMP: Used for parallelization in various implementations
- Eigen: A C++ template library for linear algebra
- Clang compiler with OpenMP support
- Homebrew-installed OpenMP and Eigen libraries (for macOS)
Note: The Makefile assumes you have OpenMP and Eigen installed via Homebrew on macOS. If you're using a different system or setup, you may need to adjust the compiler flags in the Makefile.