Snippets Groups Projects

Jakub Homola authored 1 year ago

eec8607a

eec8607a 1 year ago

Name	Last commit	Last update
old_sources
.gitignore
Makefile
README.md
cbmba.hpp
download_matrices.sh
dualop.h
dualop_explicit.hpp
dualop_explicitsc.hpp
dualop_implicit.hpp
generate_matrices.sh
heat_fulldirichlet2.ecf
heat_fulldirichlet3.ecf
hq_job_irene.sh
hq_job_karolina.sh
hq_job_lumi.sh
hq_submit_irene.sh
hq_submit_karolina.sh
hq_submit_lumi.sh
hq_worker_irene.sh
hq_worker_karolina.sh
hq_worker_lumi.sh
install_dependencies_barbora.sh
install_dependencies_cs.sh
install_dependencies_home.sh
install_dependencies_intel.sh
install_dependencies_irene.sh
install_dependencies_karolina.sh
install_dependencies_lumi.sh
matrices.hpp
my_common.hpp
my_timer.hpp
myaxpby_cuda.hpp
myaxpby_cuda_kernels.cu
myaxpby_roc.hpp
mygpulib.h
mygpulib_cuda.hpp
mygpulib_cuda_common.hpp
mygpulib_cuda_kernels.cu
mygpulib_cudaleg.hpp
mygpulib_roc.hpp
omnitrace.cfg
run.sh
run_intel.sh
run_karolina.sh
run_lumi.sh
run_test.hpp
solver.h
solver_cholmod.hpp
source.cpp
summarize.sh

dualop_explicit_test

My testing project for explicit assembly of the dual operator using accelerators with emphasis on intel

THIS REPO IS OUTDATED. I AM NOW WORKING DIRECTLY IN THE ESPRESO LIBRARY.

How to compile

To install all the required dependencies (mainly the SuiteSparse package), run the install_dependencies.sh script.

Alternatively, put a symlink pointing to your SuiteSparse installation into the dependencies directory.

For each machine and software, there are different compilation commands. To compile for a certain platform, search it in the Makefile (lines 70-100), load the necessary modules as noted in the comments, and make the associated target, one or more of dpcpp, hostmkl, cuda, hipamd, hipnvidia.

How to run the program

To run the program, one has to pass several parameters to the program. Use e.g.

make cuda
./program.cuda.x 2 SQUARE4 64 128 100 NGPEDSDMMS

This runs the program with matrices representing a 2-dimensional mesh divided into 64x64 SQUARE4 elements, there will be 128 such matrices, and each of them will then be applied 100 times. Nvidia GPU will be used, the kernels are all submitted in Parallel, we Explicitly and Directly assemble the matrix

F

, use a Sparse triangle for the triangular solve, sparse matrices are converted to dense on Device, we Manually transpose the triangular factor, use a single call call to trsM, and as the final operation we use Syrk.

For a brief summary of the options, run make help. The most relevant is the mesh dimensionality and size, and the CPU/GPU option.

What this program actually does

It computes the matrix product

F = B K^+ B^{\top}

where

K^+ = K_{reg}^{-1} = (K + RegMat)^{-1}

This appears in the TFETI method, the

F

matrix is usually called the dual operator.

The matrices

B

,

K

,

RegMat

and the known correct result

F

are loaded from files, the program calculates the matrix product, and compares the result with the correct loaded from the file.

B

and

K_{reg}

are sparse,

F

is dense.

Because

K_{reg}

is symmetric positive definite, we use the cholesky decomposition, factorizing

K_{reg} = U^{\top}U

on CPU using the cholmod library from SuiteSparse. We can then rewrite

F = B (U^{\top}U)^{-1} B^{\top} = B U^{-1} U^{-\top} B^{\top}

This can be approached in two ways (both can run on an accelerator). Either we go from right to left, first solving

U^{\top}X=B^{\top}

, followed by solving

UY=X

and multiplying

F=BY

(this is the gemm variant). The other way is to use the paranthesised formula

F = (U^{-\top} B^{\top})^{\top} (U^{-\top} B^{\top})

, thus first solving

U^{\top}X=B^{\top}

and then multiplying

F=X^{\top}X

(this is the syrk variant).

A totally different way to calculate

F

is to calculate a schur complement

S

of the matrix

\begin{bmatrix} K_{reg} & B^{\top} \\ B & 0 \end{bmatrix}

and then assigning

F=-S

. We use pardiso for this, which is not acceleratable, but already very fast on CPU.