@@ -60,6 +60,7 @@ Use the following source files according to instruction set and precision:
|[accumulator-mfma-cdna-f32.cpp](accumulator-mfma-cdna-f32.cpp)| CDNA MFMA| double | AMD CDNA | Uses AMD Instinct MI100 ISA MFMA 16x16x4 matrix multiplication instuctions. Optimized for AMD Radeon MI100. Set NBLOCKS and NTHREADS accordingly for other devices. |
|[mandelbrot-real-fma-sve-f64-omp.c](mandelbrot-real-fma-sve-f64-omp.c)| SVE FMA| double | AArch64 | ARM AArch64. Supports vector-length agnostic SVE FMA instructions. The code automatically adapts to different SVE register vector lengths. |
|[jansik-real-fmla-neon-f64-omp.c](jansik-real-fmla-neon-f64-omp.c) | NEON FMLA | double | AArch64 | ARM AArch64. Supports ARM NEON (128bit) FMLA instructions. The code runs Mandelbrot inspired Jansik iterations to accomodate FMLA semantics. |
|[mandelbrot-real-neon-f64-omp.c](mandelbrot-real-neon-f64-omp.c) | NEON | double | AArch64 | ARM AArch64. ARM NEON (128bit) FMUL + FADD instructions. |
|[mandelbrot-real-fma-power-f64-omp.c](mandelbrot-real-fma-power-f64-omp.c) | VSX FMA | double | Power ppc64/ppc64le | Mandelbrot variant for the OpenPOWER Power ISA architecture. Executes Power VSX (128bit) FMA instructions. |
|[accumulator-ger-power-f64-omp.c](accumulator-ger-power-f64-omp.c) | VSX GER | double | Power ppc64/ppc64le | Code executing Power VSX (128bit) rank-2 update GER instructions. These compute outer product between 4x1 and 2x1 double precision vectors and store results in dedicated accumulator registers. |
|[cpuid.c](cpuid.c) | x86 CPUID | N/A | x86 | Runs CPUID instruction. Find out about x86 CPU capabilities. |