Skip to content
Snippets Groups Projects
Jakub Homola's avatar
Jakub Homola authored
eec8607a
History

dualop_explicit_test

My testing project for explicit assembly of the dual operator using accelerators with emphasis on intel

THIS REPO IS OUTDATED. I AM NOW WORKING DIRECTLY IN THE ESPRESO LIBRARY.

How to compile

To install all the required dependencies (mainly the SuiteSparse package), run the install_dependencies.sh script.

Alternatively, put a symlink pointing to your SuiteSparse installation into the dependencies directory.

For each machine and software, there are different compilation commands. To compile for a certain platform, search it in the Makefile (lines 70-100), load the necessary modules as noted in the comments, and make the associated target, one or more of dpcpp, hostmkl, cuda, hipamd, hipnvidia.

How to run the program

To run the program, one has to pass several parameters to the program. Use e.g.

make cuda
./program.cuda.x 2 SQUARE4 64 128 100 NGPEDSDMMS

This runs the program with matrices representing a 2-dimensional mesh divided into 64x64 SQUARE4 elements, there will be 128 such matrices, and each of them will then be applied 100 times. Nvidia GPU will be used, the kernels are all submitted in Parallel, we Explicitly and Directly assemble the matrix

F
, use a Sparse triangle for the triangular solve, sparse matrices are converted to dense on Device, we Manually transpose the triangular factor, use a single call call to trsM, and as the final operation we use Syrk.

For a brief summary of the options, run make help. The most relevant is the mesh dimensionality and size, and the CPU/GPU option.

What this program actually does

It computes the matrix product

F = B K^+ B^{\top}
where
K^+ = K_{reg}^{-1} = (K + RegMat)^{-1}
This appears in the TFETI method, the
F
matrix is usually called the dual operator.

The matrices

B
,
K
,
RegMat
and the known correct result
F
are loaded from files, the program calculates the matrix product, and compares the result with the correct loaded from the file.
B
and
K_{reg}
are sparse,
F
is dense.

Because

K_{reg}
is symmetric positive definite, we use the cholesky decomposition, factorizing
K_{reg} = U^{\top}U
on CPU using the cholmod library from SuiteSparse. We can then rewrite
F = B (U^{\top}U)^{-1} B^{\top} = B U^{-1} U^{-\top} B^{\top}
This can be approached in two ways (both can run on an accelerator). Either we go from right to left, first solving
U^{\top}X=B^{\top}
, followed by solving
UY=X
and multiplying
F=BY
(this is the gemm variant). The other way is to use the paranthesised formula
F = (U^{-\top} B^{\top})^{\top} (U^{-\top} B^{\top})
, thus first solving
U^{\top}X=B^{\top}
and then multiplying
F=X^{\top}X
(this is the syrk variant).

A totally different way to calculate

F
is to calculate a schur complement
S
of the matrix
\begin{bmatrix} K_{reg} & B^{\top} \\ B & 0 \end{bmatrix}
and then assigning
F=-S
. We use pardiso for this, which is not acceleratable, but already very fast on CPU.