Update nvidia-cuda.md

d0fd0ee4 · Jan Siwiec · 97f1c648 · d0fd0ee4
Commit d0fd0ee4 authored 5 years ago by Jan Siwiec
--- a/docs.it4i/software/nvidia-cuda.md
+++ b/docs.it4i/software/nvidia-cuda.md
 # NVIDIA CUDA
-Guide to NVIDIA CUDA Programming and GPU Usage
+Guide to NVIDIA CUDA Programming and GPU Usage.
 ## CUDA Programming on Anselm
-The default programming model for GPU accelerators on Anselm is Nvidia CUDA. To set up the environment for CUDA use;
+The default programming model for GPU accelerators on Anselm is NVIDIA CUDA. To set up the environment for CUDA, use:
 ```console
 $ ml av cuda
 $ ml cuda **or** ml CUDA
 ```
-If the user code is hybrid and uses both CUDA and MPI, the MPI environment has to be set up as well. One way to do this is to use the PrgEnv-gnu module, which sets up the correct combination of the GNU compiler and MPI library;
+If the user code is hybrid and uses both CUDA and MPI, the MPI environment has to be set up as well. One way to do this is to use the PrgEnv-gnu module, which sets up the correct combination of the GNU compiler and MPI library:
 ```console
 $ ml PrgEnv-gnu
 ```
-CUDA code can be compiled directly on login1 or login2 nodes. The user does not have to use compute nodes with GPU accelerators for compilation. To compile CUDA source code, use an nvcc compiler;
+CUDA code can be compiled directly on login1 or login2 nodes. The user does not have to use compute nodes with GPU accelerators for compilation. To compile CUDA source code, use the NVCC compiler:
 ```console
 $ nvcc --version
 ```
-The CUDA Toolkit comes with large number of examples which can be a helpful reference to start with. To compile and test these examples, users should copy them to their home directory;
+The CUDA Toolkit comes with a large number of examples, which can be a helpful reference to start with. To compile and test these examples, users should copy them to their home directory:
 ```console
 $ cd ~
@@ -38,7 +38,7 @@ $ cd ~/cuda-samples/1_Utilities/deviceQuery
 $ make
 ```
-To run the code, the user can use PBS interactive session to get access to a node from qnvidia queue (note: use your project name with parameter -A in the qsub command) and execute the binary file;
+To run the code, the user can use a PBS interactive session to get access to a node from the qnvidia queue (note: use your project name with the -A parameter in the qsub command) and execute the binary file:
 ```console
 $ qsub -I -q qnvidia -A OPEN-0-0
@@ -46,7 +46,7 @@ $ ml cuda
 $ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery
 ```
-The expected output of the deviceQuery example executed on a node with a Tesla K20m is;
+The expected output of the deviceQuery example executed on a node with a Tesla K20m is:
 ```console
    CUDA Device Query (Runtime API) version (CUDART static linking)
@@ -89,7 +89,7 @@ The expected output of the deviceQuery example executed on a node with a Tesla K
 ### Code Example
-In this section we provide a basic CUDA based vector addition code example. You can directly copy and paste the code to test it.
+In this section, we provide a basic CUDA based vector addition code example. You can directly copy and paste the code to test it:
 ```cpp
 $ vim test.cu
@@ -179,13 +179,13 @@ int main( void ) {
 }
 ```
-This code can be compiled using the following command;
+This code can be compiled using the following command:
 ```console
 $ nvcc test.cu -o test_cuda
 ```
-To run the code, use an interactive PBS session to get access to one of the GPU accelerated nodes;
+To run the code, use an interactive PBS session to get access to one of the GPU accelerated nodes:
 ```console
 $ qsub -I -q qnvidia -A OPEN-0-0
@@ -201,7 +201,7 @@ The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accel
 #### cuBLAS Example: SAXPY
-The SAXPY function multiplies the vector x by the scalar alpha, and adds it to the vector y, overwriting the latest vector with the result. A description of the cuBLAS function can be found in [NVIDIA CUDA documentation][b]. Code can be pasted in the file and compiled without any modification.
+The SAXPY function multiplies the vector x by the scalar alpha and adds it to the vector y, overwriting the latest vector with the result. A description of the cuBLAS function can be found in the [NVIDIA CUDA documentation][b]. The code can be pasted in the file and compiled without any modification:
 ```cpp
 /* Includes, system */
@@ -286,7 +286,7 @@ int main(int argc, char **argv)
    - [cublasSetVector][c] - transfers data from CPU to GPU memory
    - [cublasGetVector][d] - transfers data from GPU to CPU memory
-To compile the code using the NVCC compiler a "-lcublas" compiler flag has to be specified:
+To compile the code using the NVCC compiler, the "-lcublas" compiler flag has to be specified:
 ```console
 $ ml cuda
@@ -300,7 +300,7 @@ $ ml cuda
 $ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart
 ```
-To compile the same code with an Intel compiler:
+To compile the same code with the Intel compiler:
 ```console
 $ ml cuda