H-div is a parallelized implementation of construction of hierarchical matrices (H-matrices) using Cilk Plus and Tascell based on a sequential Fortran implementation in HACApK library coded by Prof. Akihiro Ida et al.
-
To learn more about HACApK, please visit HACApK introduction or HACApK github repository.
-
To learn more about Tascell (Prof. Tasuku Hiraishi et al.), please visit Tascell introduction or Tascell github repository.
-
For details of this implementation on shared memory systems, please read paper Parallelization of Matrix Partitioning in Construction of Hierarchical Matrices using Task Parallel Languages.
-
For details of this implementation on distributed memory systems, please read paper Parallelization of Matrix Partitioning in Hierarchical Matrix Construction on Distributed Memory Systems.
-
For details of parallel implementation with both matrix partitioning and filling operation using Tascell, please read paper Construction of Hierarchical Matrix on Distributed Memory Systems using a Task Parallel Language.
-
Papers listed above has been summarized into my Ph.D. dissertation Research on Parallel Hierarchical Matrix Construction.
-
One or more Intel multi-core CPUs
-
To run Cilk Plus versions, Intel C++ Compiler version >= 17
(Note: Cilk Plus is no longer in support from Intel and will be removed from Intel compiler some day in the future.)
-
To run tascell versions on shared memory systems
- tascell compiler version later than Jan 21, 2019.
- GCC version >= 4.8.5 (or ICC with compatibility of GCC version higher than 4.8.5)
-
To run tascell versions on distributed memory systems
- tascell compiler version later than May 15, 2022 of branch
mpi-bcst
. - Intel C++ Compiler version >= 17
- Intel MPI version >= 17
- tascell compiler version later than May 15, 2022 of branch
git clone https://github.com/simon2/H-div.git
make hmat_div
./hmat_div
- Sequential
- hmat_div.c: The first sequential implementation using C directly translated from Fortran implementation in HACApK.
- hmat_div_direct.c: Sequential C implementation but exchange data elements directly instead of exchange their index. This is the baseline of sequential implementation in paper.
- hmat_div.cpp: Sequential implementation of C++.
- hmat_div.sc: Sequential implementation using S-expression-based syntax. This is the base of Tascell.
- hmat_div_array.c: CT is not in linked-tree manner, but use a pre-allocated array. (will be discribed in next paper)
- Cilk Plus
- hmat_div_cilk.c: Final implementation using Cilk Plus.
- hmat_div_BCT_cilk_list_reducer.cpp: Tried CILK_LIST_REDUCER, but not included in paper due to bad performance.
- hmat_div_CT_cilk_parByLevel.c: Tried to swith parallel code to sequential code by tree level information. Not included in paper due to bad performance.
- hmat_div_BCT_cilk_malloc.c: Create private BCT array for each worker by using malloc function. Not included in paper due to bad performance.
- Tascell
- hmat_div.tcell: The final implementation using Tascell.
- hmat_div_locality.tcell: Tried to make upper tree levels execute sequentially and execute in parallel in lower levels, for better data locality. However, the result of speedup is not essential.
- hmat_dist.tcell: Baseline version parallelized on distributed memory systems.
- hmat_dist_bcst.tcell: Add broad-cast to
hmat_dist.tcell
. - hmat_dist_cas.tcell: Use CAS to store CT nodes.
- hmat_dist_casc.tcell: Use CAS to store CT nodes, but in chunks.
- OpenMP
- hmat_div_omp.c: The OpenMP implementation we mentioned in paper.
- Sequential
- hmat_filling.c: Sequential version of C which do filling after all leaf-nodes are created.
- hmat_array_filling_wBCT.c: Sequential version of C which do filling a leaf-node is created.
- MPI + OpenMP
- hmat_array_filling_MPI.c: Parallelized
hmat_filling.c
using MPI. - hmat_array_filling_dynamic.c: Parallelized
hmat_filling.c
using MPI and OpenMP with dynamic scheduling.
- hmat_array_filling_MPI.c: Parallelized
- Zhengyang Bai
- Tasuku Hiraishi
- Hiroshi Nakashima
- Akihiro Ida
- Masahiro Yasugi
- Keiichiro Fukazawa