Commit b890a4e4 authored by Sweety Wadhwa's avatar Sweety Wadhwa
Browse files

Update README.md

parent 88706542
CUDA DGEMV for Block Diagonal matrices
**CUDA DGEMV for Block Diagonal matrices**
• computes Ax=y
• dgemv_without_pointer.cu
• dgemv_with_pointer(calling __device).cu
Compiling
**Compiling**
export CUDA_PATH=....
nvcc dgemv_without_pointer.cu –o dgemv_without_pointer
or
make
Executing
**Executing**
./dgemv_without_pointer
Tests on Anselm
**Tests on Anselm**
• node cn199
• Nvidia Tesla K20m -- 13 SM, 192 threads per SM
• square blocks
• Bandwidth averaged over 100 runs
• Assumed peak bandwidth 149 GB/s
**130 Blocks**
| Rows | Threads | Bandwidth GB/s(without pointer) | Bandwidth GB/s(with pointer) |
| :--- | :---: | ---: | ---: |
| 32 | 32 | 20.53 | 8.99 |
......@@ -27,6 +45,7 @@ Tests on Anselm
| 384 | 384 | 117.44 | 15.27 |
**1300 Blocks**
| Rows | Threads | Bandwidth GB/s(without pointer) | Bandwidth GB/s(with pointer) |
| :--- | :---: | ---: | ---: |
......@@ -38,6 +57,7 @@ Tests on Anselm
| 384 | 384 | 128.11 | 15.81 |
**6500 Blocks**
| Rows | Threads | Bandwidth GB/s(without pointer) | Bandwidth GB/s(with pointer) |
| :--- | :---: | ---: | ---: |
......@@ -50,6 +70,7 @@ Tests on Anselm
**26 Blocks**
| Rows | Threads | Bandwidth GB/s(without pointer) | Bandwidth GB/s(with pointer) |
| :--- | :---: | ---: | ---: |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment