Commit 38dfb7e4 authored by Branislav Jansik's avatar Branislav Jansik
Browse files

Update README.md

parent 688ec7b5
......@@ -48,6 +48,7 @@ Use the following source files according to instruction set and precision:
* AVX-512 FMA (Xeon Phi, Skylake) double precision: mandelbrot-real-fma-mpi-dump-mic.c . Uses AVX-512 FMA instructions. Works for MIC, SKX and KNL architectures
* PTX FMA (NVIDIA CUDA PTX) double precision: mandelbrot-real-fma-ptx-dump.cu . Uses PTX FMA instuctions. Optimized for Nvidia K20 peak performance. Set NBLOCKS and NTHREADS accordingly for other devices
* PTX WMMA (NVIDIA CUDA WMMA) half precision: mandelbrot-real-wmma-ptx-f16-dump.cu . Uses PTX WMMA (Warp Matrx-matrix Multiply Add) instructions, targeting NVIDIA V100 tensor cores.
* PTX WMMA (NVIDIA CUDA WMMA) double precision: mandelbrot-real-wmma-ptx-f64-dump.cu . Uses PTX WMMA (Warp Matrx-matrix Multiply Add) instructions, targeting NVIDIA A100 tensor cores.
* CPUID: cpuid.c. Find out about CPU capabilities.
## Build
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment