Skip to content
Snippets Groups Projects
Commit 115754a4 authored by Branislav Jansik's avatar Branislav Jansik
Browse files

Edit README.md

parent ef030157
No related branches found
No related tags found
No related merge requests found
......@@ -60,6 +60,7 @@ Use the following source files according to instruction set and precision:
|[accumulator-mfma-cdna-f32.cpp](accumulator-mfma-cdna-f32.cpp)| CDNA MFMA| double | AMD CDNA | Uses AMD Instinct MI100 ISA MFMA 16x16x4 matrix multiplication instuctions. Optimized for AMD Radeon MI100. Set NBLOCKS and NTHREADS accordingly for other devices. |
|[mandelbrot-real-fma-sve-f64-omp.c](mandelbrot-real-fma-sve-f64-omp.c)| SVE FMA| double | AArch64 | ARM AArch64. Supports vector-length agnostic SVE FMA instructions. The code automatically adapts to different SVE register vector lengths. |
|[jansik-real-fmla-neon-f64-omp.c](jansik-real-fmla-neon-f64-omp.c) | NEON FMLA | double | AArch64 | ARM AArch64. Supports ARM NEON (128bit) FMLA instructions. The code runs Mandelbrot inspired Jansik iterations to accomodate FMLA semantics. |
|[mandelbrot-real-neon-f64-omp.c](mandelbrot-real-neon-f64-omp.c) | NEON | double | AArch64 | ARM AArch64. ARM NEON (128bit) FMUL + FADD instructions. |
|[mandelbrot-real-fma-power-f64-omp.c](mandelbrot-real-fma-power-f64-omp.c) | VSX FMA | double | Power ppc64/ppc64le | Mandelbrot variant for the OpenPOWER Power ISA architecture. Executes Power VSX (128bit) FMA instructions. |
|[accumulator-ger-power-f64-omp.c](accumulator-ger-power-f64-omp.c) | VSX GER | double | Power ppc64/ppc64le | Code executing Power VSX (128bit) rank-2 update GER instructions. These compute outer product between 4x1 and 2x1 double precision vectors and store results in dedicated accumulator registers. |
|[cpuid.c](cpuid.c) | x86 CPUID | N/A | x86 | Runs CPUID instruction. Find out about x86 CPU capabilities. |
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment