SPLA provides specialized functions for linear algebra computations with a C++ and C interface, which are inspired by requirements in computational material science codes.
Currently, SPLA provides functions for distributed matrix multiplications with specific matrix distributions, which cannot be used directly with a ScaLAPACK interface. All computations can optionally utilize GPUs through CUDA or ROCm, where matrices can be located either in host or device memory.
The function gemm(...)
computes a local general matrix product, that works similar to cuBLASXt. If GPU support is enabled, the function may take any combination of host and device pointer. In addition, it may use custom multi-threading for host computations, if the provided BLAS library does not support multi-threading.
The pgemm_ssb(...)
function computes
where matrices A and B are stored in a "stripe" distribution with variable block length. Matrix C can be in any supported block distribution, including the block-cyclic ScaLAPACK layout. Matrix A may be read as transposed or conjugate transposed.
------ T ------
| | | |
| | | |
------ ------
------- | | | | -------
| | | | | | | | | |
------- <-- ------ * ------ + -------
| | | | | | | | | |
------- | | | | -------
C ------ ------ C
| | | |
| | | |
------ ------
A B
The pgemm_sbs(...)
function computes
where matrices A and C are stored in a "stripe" distribution with variable block length. Matrix B can be in any supported block distribution, including the block-cyclic ScaLAPACK layout.
------ ------ ------
| | | | | |
| | | | | |
------ ------ ------
| | | | ------- | |
| | | | | | | | |
------ <-- ------ * ------- + ------
| | | | | | | | |
| | | | ------- | |
------ ------ B ------
| | | | | |
| | | | | |
------ ------ ------
C A C
Documentation can be found here.
The build system follows the standard CMake workflow. Example:
mkdir build
cd build
cmake .. -DSPLA_OMP=ON -DSPLA_GPU_BACKEND=CUDA -DCMAKE_INSTALL_PREFIX=${path_to_install_to}
make -j8 install
Option | Default | Description |
---|---|---|
SPLA_OMP | ON | Enable multi-threading with OpenMP |
SPLA_GPU_BACKEND | OFF | Select GPU backend. Can be OFF, CUDA or ROCM |
SPLA_BUILD_TESTS | OFF | Build test executables for developement purposes |
SPLA_INSTALL | ON | Add library to install target |
This work was supported by:
Swiss Federal Institute of Technology in Zurich | |
---|---|
Swiss National Supercomputing Centre | |
MAterials design at the eXascale (Horizon2020, grant agreement MaX CoE, No. 824143) |