Skip to content
Snippets Groups Projects
Commit 3131904d authored by John Cawley's avatar John Cawley
Browse files

Update nvidia-cuda.md PROOFREAD

NB I haven't proofread the lines of code, I don't dare.

line 41;  qnvidia needs either 'a' or 'the' depending on whether there is one or more
line 204; do you mean 'Code can be copied and pasted from the file and com....'
parent 3c976183
No related branches found
No related tags found
5 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section,!171WIP: Resolve "John's froofreading"
......@@ -4,26 +4,26 @@ Guide to NVIDIA CUDA Programming and GPU Usage
## CUDA Programming on Anselm
The default programming model for GPU accelerators on Anselm is Nvidia CUDA. To set up the environment for CUDA use
The default programming model for GPU accelerators on Anselm is Nvidia CUDA. To set up the environment for CUDA use;
```console
$ ml av cuda
$ ml cuda **or** ml CUDA
```
If the user code is hybrid and uses both CUDA and MPI, the MPI environment has to be set up as well. One way to do this is to use the PrgEnv-gnu module, which sets up correct combination of GNU compiler and MPI library.
If the user code is hybrid and uses both CUDA and MPI, the MPI environment has to be set up as well. One way to do this is to use the PrgEnv-gnu module, which sets up the correct combination of the GNU compiler and MPI library;
```console
$ ml PrgEnv-gnu
```
CUDA code can be compiled directly on login1 or login2 nodes. User does not have to use compute nodes with GPU accelerator for compilation. To compile a CUDA source code, use nvcc compiler.
CUDA code can be compiled directly on login1 or login2 nodes. The user does not have to use compute nodes with GPU accelerators for compilation. To compile CUDA source code, use an nvcc compiler;
```console
$ nvcc --version
```
CUDA Toolkit comes with large number of examples, that can be helpful to start with. To compile and test these examples user should copy them to its home directory
The CUDA Toolkit comes with large number of examples which can be a helpful reference to start with. To compile and test these examples, users should copy them to their home directory;
```console
$ cd ~
......@@ -31,14 +31,14 @@ $ mkdir cuda-samples
$ cp -R /apps/nvidia/cuda/6.5.14/samples/* ~/cuda-samples/
```
To compile an examples, change directory to the particular example (here the example used is deviceQuery) and run "make" to start the compilation
To compile examples, change directory to the particular example (here the example used is deviceQuery) and run "make" to start the compilation;
```console
$ cd ~/cuda-samples/1_Utilities/deviceQuery
$ make
```
To run the code user can use PBS interactive session to get access to a node from qnvidia queue (note: use your project name with parameter -A in the qsub command) and execute the binary file
To run the code, the user can use PBS interactive session to get access to a node from qnvidia queue (note: use your project name with parameter -A in the qsub command) and execute the binary file;
```console
$ qsub -I -q qnvidia -A OPEN-0-0
......@@ -46,7 +46,7 @@ $ ml cuda
$ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery
```
Expected output of the deviceQuery example executed on a node with Tesla K20m is
The expected output of the deviceQuery example executed on a node with a Tesla K20m is;
```console
CUDA Device Query (Runtime API) version (CUDART static linking)
......@@ -179,13 +179,13 @@ int main( void ) {
}
```
This code can be compiled using following command
This code can be compiled using the following command;
```console
$ nvcc test.cu -o test_cuda
```
To run the code use interactive PBS session to get access to one of the GPU accelerated nodes
To run the code, use an interactive PBS session to get access to one of the GPU accelerated nodes;
```console
$ qsub -I -q qnvidia -A OPEN-0-0
......@@ -197,11 +197,11 @@ $ ./test.cuda
### cuBLAS
The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library with 152 standard BLAS routines. Basic description of the library together with basic performance comparison with MKL can be found [here](https://developer.nvidia.com/cublas "Nvidia cuBLAS").
The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library with 152 standard BLAS routines. A basic description of the library together with basic performance comparisons with MKL can be found [here](https://developer.nvidia.com/cublas "Nvidia cuBLAS").
#### cuBLAS Example: SAXPY
SAXPY function multiplies the vector x by the scalar alpha and adds it to the vector y overwriting the latest vector with the result. The description of the cuBLAS function can be found in [NVIDIA CUDA documentation](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy "Nvidia CUDA documentation "). Code can be pasted in the file and compiled without any modification.
The SAXPY function multiplies the vector x by the scalar alpha, and adds it to the vector y, overwriting the latest vector with the result. A description of the cuBLAS function can be found in [NVIDIA CUDA documentation](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy "Nvidia CUDA documentation "). Code can be pasted in the file and compiled without any modification.
```cpp
/* Includes, system */
......@@ -286,7 +286,7 @@ int main(int argc, char **argv)
- [cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) - transfers data from CPU to GPU memory
- [cublasGetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector) - transfers data from GPU to CPU memory
To compile the code using NVCC compiler a "-lcublas" compiler flag has to be specified:
To compile the code using the NVCC compiler a "-lcublas" compiler flag has to be specified:
```console
$ ml cuda
......@@ -300,7 +300,7 @@ $ ml cuda
$ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart
```
To compile the same code with Intel compiler:
To compile the same code with an Intel compiler:
```console
$ ml cuda
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment