Skip to content
Snippets Groups Projects
Commit 87835800 authored by David Hrbáč's avatar David Hrbáč
Browse files

Merge branch 'mic' into 'master'

MIC

See merge request !174
parents 8af03cf3 90ffe4be
No related branches found
No related tags found
6 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section,!196Master,!174MIC
Showing
with 775 additions and 586 deletions
...@@ -24,7 +24,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const ...@@ -24,7 +24,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const
* **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. * **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour.
* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. * **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
* **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 x 48 h). * **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 x 48 h).
* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project. * **qnvidia**, **qmic**, **qfat**, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project.
* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. * **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
## Queue Notes ## Queue Notes
......
...@@ -62,4 +62,6 @@ local $ ...@@ -62,4 +62,6 @@ local $
Although we have taken every care to ensure the accuracy of the content, mistakes do happen. Although we have taken every care to ensure the accuracy of the content, mistakes do happen.
If you find an inconsistency or error, please report it by visiting <http://support.it4i.cz/rt>, creating a new ticket, and entering the details. If you find an inconsistency or error, please report it by visiting <http://support.it4i.cz/rt>, creating a new ticket, and entering the details.
By doing so, you can save other readers from frustration and help us improve. By doing so, you can save other readers from frustration and help us improve.
We will fix the problem as soon as possible.
!!! tip
We will fix the problem as soon as possible.
This diff is collapsed.
# Available Modules
## Compiler
| Module | Description |
| ------ | ----------- |
| [icc](http://software.intel.com/en-us/intel-compilers/) | Intel C and C++ compilers |
## Devel
| Module | Description |
| ------ | ----------- |
| devel_environment | &nbsp; |
| M4 | &nbsp; |
| ncurses | &nbsp; |
## Lang
| Module | Description |
| ------ | ----------- |
| [Bison](http://www.gnu.org/software/bison) | Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR(1) parser tables. |
| [flex](http://flex.sourceforge.net/) | Flex (Fast Lexical Analyzer) is a tool for generating scanners. A scanner, sometimes called a tokenizer, is a program which recognizes lexical patterns in text. |
| [Tcl](http://www.tcl.tk/) | Tcl (Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more. |
## Lib
| Module | Description |
| ------ | ----------- |
| [libreadline](http://cnswww.cns.cwru.edu/php/chet/readline/rltop.html) | The GNU Readline library provides a set of functions for use by applications that allow users to edit command lines as they are typed in. Both Emacs and vi editing modes are available. The Readline library includes additional functions to maintain a list of previously-entered command lines, to recall and perhaps reedit those lines, and perform csh-like history expansion on previous commands. |
| [zlib](http://www.zlib.net/) | zlib is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library for use on virtually any computer hardware and operating system. |
## Math
| Module | Description |
| ------ | ----------- |
| [Octave](http://www.gnu.org/software/octave/) | GNU Octave is a high-level interpreted language, primarily intended for numerical computations. |
## Mpi
| Module | Description |
| ------ | ----------- |
| [impi](http://software.intel.com/en-us/intel-mpi-library/) | Intel MPI Library, compatible with MPICH ABI |
## Toolchain
| Module | Description |
| ------ | ----------- |
| [iccifort](http://software.intel.com/en-us/intel-cluster-toolkit-compiler/) | Intel C, C++ & Fortran compilers |
| [ifort](http://software.intel.com/en-us/intel-compilers/) | Intel Fortran compiler |
## Tools
| Module | Description |
| ------ | ----------- |
| bzip2 | &nbsp; |
| cURL | &nbsp; |
| [expat](http://expat.sourceforge.net/) | Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags) |
## Vis
| Module | Description |
| ------ | ----------- |
| gettext | &nbsp; |
This diff is collapsed.
This diff is collapsed.
...@@ -69,6 +69,32 @@ $ qsub -A OPEN-0-0 -I -q qlong -l select=4:ncpus=24:accelerator=True:naccelerat ...@@ -69,6 +69,32 @@ $ qsub -A OPEN-0-0 -I -q qlong -l select=4:ncpus=24:accelerator=True:naccelerat
In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores), with 2 Xeon Phi 7120p cards per node (totalling 8 Phi cards), running interactive job for 56 hours. The accelerator model name was omitted. In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores), with 2 Xeon Phi 7120p cards per node (totalling 8 Phi cards), running interactive job for 56 hours. The accelerator model name was omitted.
#### Intel Xeon Phi - Queue QMIC
Examples executions
```console
-l select=1
exec_vnode = (r21u05n581-mic0:naccelerators=1:ncpus=0)
-l select=4
(r21u05n581-mic0:naccelerators=1:ncpus=0)+(r21u05n581-mic1:naccelerators=1:ncpus=0)+(r21u06n582-mic0:naccelerators=1:ncpus=0)+(r21u06n582-mic1:naccelerators=1:ncpus=0)
-l select=4:naccelerators=1
(r21u05n581-mic0:naccelerators=1:ncpus=0)+(r21u05n581-mic1:naccelerators=1:ncpus=0)+(r21u06n582-mic0:naccelerators=1:ncpus=0)+(r21u06n582-mic1:naccelerators=1:ncpus=0)
-l select=1:naccelerators=2
(r21u05n581-mic0:naccelerators=1+r21u05n581-mic1:naccelerators=1)
-l select=2:naccelerators=2
(r21u05n581-mic0:naccelerators=1+r21u05n581-mic1:naccelerators=1)+(r21u06n582-mic0:naccelerators=1+r21u06n582-mic1:naccelerators=1)
-l select=1:ncpus=24:naccelerators=2
(r22u32n610:ncpus=24+r22u32n610-mic0:naccelerators=1+r22u32n610-mic1:naccelerators=1)
-l select=1:ncpus=24:naccelerators=0+4
(r33u17n878:ncpus=24:naccelerators=0)+(r33u13n874-mic0:naccelerators=1:ncpus=0)+(r33u13n874-mic1:naccelerators=1:ncpus=0)+(r33u16n877-mic0:naccelerators=1:ncpus=0)+(r33u16n877-mic1:naccelerators=1:ncpus=0)
```
### UV2000 SMP ### UV2000 SMP
!!! note !!! note
......
...@@ -10,6 +10,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const ...@@ -10,6 +10,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const
* **qprod**, the Production queue * **qprod**, the Production queue
* **qlong**, the Long queue * **qlong**, the Long queue
* **qmpp**, the Massively parallel queue * **qmpp**, the Massively parallel queue
* **qmic**, the 864 MIC nodes queue
* **qfat**, the queue to access SMP UV2000 machine * **qfat**, the queue to access SMP UV2000 machine
* **qfree**, the Free resource utilization queue * **qfree**, the Free resource utilization queue
......
...@@ -14,8 +14,9 @@ The resources are allocated to the job in a fair-share fashion, subject to const ...@@ -14,8 +14,9 @@ The resources are allocated to the job in a fair-share fashion, subject to const
| **qlong** Long queue | yes | > 0 | 256 nodes, max 40 per job, only non-accelerated nodes allowed | 24 | 0 | no | 72 / 144h | | **qlong** Long queue | yes | > 0 | 256 nodes, max 40 per job, only non-accelerated nodes allowed | 24 | 0 | no | 72 / 144h |
| **qmpp** Massive parallel queue | yes | > 0 | 1006 nodes | 24 | 0 | yes | 2 / 4h | | **qmpp** Massive parallel queue | yes | > 0 | 1006 nodes | 24 | 0 | yes | 2 / 4h |
| **qfat** UV2000 queue | yes | > 0 | 1 (uv1) | 8 | 0 | yes | 24 / 48h | | **qfat** UV2000 queue | yes | > 0 | 1 (uv1) | 8 | 0 | yes | 24 / 48h |
| **qfree** Free resource queue | yes | none required | 752 nodes, max 86 per job | 24 | -1024 | no | 12 / 12h | | **qfree** Free resource queue | yes | none required | 752 nodes, max 86 per job | 24 | -1024 | no | 12 / 12h |
| **qviz** Visualization queue | yes | none required | 2 (with NVIDIA Quadro K5000) | 4 | 150 | no | 1 / 8h | | **qviz** Visualization queue | yes | none required | 2 (with NVIDIA Quadro K5000) | 4 | 150 | no | 1 / 8h |
| **qmic** Intel Xeon Phi cards | yes | > 0 | 864 Intel Xeon Phi cards, max 8 mic per job | 0 | 0 | no | 24 / 48h |
!!! note !!! note
**The qfree queue is not free of charge**. [Normal accounting](#resource-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply to Directors Discretion (DD projects) but may be allowed upon request. **The qfree queue is not free of charge**. [Normal accounting](#resource-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply to Directors Discretion (DD projects) but may be allowed upon request.
...@@ -27,6 +28,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const ...@@ -27,6 +28,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const
* **qfat**, the UV2000 queue. This queue is dedicated to access the fat SGI UV2000 SMP machine. The machine (uv1) has 112 Intel IvyBridge cores at 3.3GHz and 3.25TB RAM (8 cores and 128GB RAM are dedicated for system). An PI needs explicitly ask support for authorization to enter the queue for all users associated to her/his Project. * **qfat**, the UV2000 queue. This queue is dedicated to access the fat SGI UV2000 SMP machine. The machine (uv1) has 112 Intel IvyBridge cores at 3.3GHz and 3.25TB RAM (8 cores and 128GB RAM are dedicated for system). An PI needs explicitly ask support for authorization to enter the queue for all users associated to her/his Project.
* **qfree**, the Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 24 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. * **qfree**, the Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 24 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
* **qviz**, the Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 73 GB of RAM and 1/7 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 4 cores each) up to one whole node per user, so that all 28 cores, 512 GB RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. * **qviz**, the Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 73 GB of RAM and 1/7 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 4 cores each) up to one whole node per user, so that all 28 cores, 512 GB RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum.
* **qmic**, the queue qmic to access MIC nodes. It is required that active project with nonzero remaining resources is specified to enter the qmic. All 846 MICs are included.
!!! note !!! note
To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/). To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/).
......
...@@ -551,12 +551,6 @@ First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC, ...@@ -551,12 +551,6 @@ First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC,
To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Salomon. To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Salomon.
```console
$ cp /apps/intel/opencl-examples/CapsBasic/* .
$ qsub -I -q qmic -A NONE-0-0
$ make
```
The compilation command for this example is: The compilation command for this example is:
```console ```console
...@@ -594,21 +588,6 @@ CL_DEVICE_TYPE_ACCELERATOR[0] ...@@ -594,21 +588,6 @@ CL_DEVICE_TYPE_ACCELERATOR[0]
!!! note !!! note
More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/>
The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it.
```console
$ cp -r /apps/intel/opencl-examples/* .
$ qsub -I -q qmic -A NONE-0-0
$ cd GEMM
$ make
```
The compilation command for this example is:
```console
$ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/
```
To see the performance of Intel Xeon Phi performing the DGEMM run the example as follows: To see the performance of Intel Xeon Phi performing the DGEMM run the example as follows:
```console ```console
......
# Intel Xeon Phi Environment
Intel Xeon Phi (so-called MIC) accelerator can be used in several modes ([Offload](../intel/intel-xeon-phi-salomon/#offload-mode) and [Native](#native-mode)). The default mode on the cluster is offload mode, but all modes described in this document are supported.
See sections below for more details.
## Intel Utilities for Xeon Phi
Continue [here](../intel/intel-xeon-phi-salomon/)
## GCC With [KNC](https://en.wikipedia.org/wiki/Xeon_Phi) Support
On Salomon cluster we have module `GCC/5.1.1-knc` with cross-compiled (offload) support. (gcc, g++ and gfortran)
!!! warning
Only Salomon cluster.
To compile a code using GCC compiler run following commands
* Create `reduce_mul.c`
```console
$ vim reduce_mul.c
```
```c
#include <immintrin.h>
double reduce(double* values)
{
__m512d val = _mm512_load_pd(values);
return _mm512_reduce_mul_pd(val);
}
```
* A create `main.c`
```console
vim main.c
```
```c
#include <immintrin.h>
#include <stdio.h>
#include <stdlib.h>
double reduce(double* values);
int main(int argc, char* argv[])
{
// Generate random input vector of [-1, 1] values.
double values[8] __attribute__((aligned(64)));
for (int i = 0; i < 8; i++)
values[i] = 2 * (0.5 - rand() / (double)RAND_MAX);
double vector = reduce(values);
double scalar = values[0];
for (int i = 1; i < 8; i++)
scalar *= values[i];
printf("%f vs %f\n", vector, scalar);
fflush(stdout);
return 0;
}
```
* Compile
```console
$ ml GCC/5.1.1-knc
$ gcc -mavx512f -O3 -c reduce_mul.c -o reduce_mul.s -S
$ gcc -O3 -c reduce_mul.s -o reduce_mul.o
$ gcc -std=c99 -O3 -c main.c -o main_gcc.o
$ gcc -O3 reduce_mul.o main_gcc.o -o reduce_mul
```
* To execute the code, run the following command on the host
```console
$ micnativeloadex ./reduce_mul
-0.004276 vs -0.004276
```
## Native Mode
In the native mode a program is executed directly on Intel Xeon Phi without involvement of the host machine. Similarly to offload mode, the code is compiled on the host computer with Intel compilers.
To compile a code user has to be connected to a compute with MIC and load Intel compilers module. To get an interactive session on a compute node with an Intel Xeon Phi and load the module use following commands
```console
$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True -A NONE-0-0
$ ml intel/2017b
```
To produce a binary compatible with Intel Xeon Phi architecture user has to specify `-mmic` compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code `vect-add.c` for host only
```c
#include <stdio.h>
typedef int T;
#define SIZE 1000
#pragma offload_attribute(push, target(mic))
T in1[SIZE];
T in2[SIZE];
T res[SIZE];
#pragma offload_attribute(pop)
// MIC function to add two vectors
__attribute__((target(mic))) add_mic(T *a, T *b, T *c, int size) {
int i = 0;
#pragma omp parallel for
for (i = 0; i < size; i++)
c[i] = a[i] + b[i];
}
// CPU function to add two vectors
void add_cpu (T *a, T *b, T *c, int size) {
int i;
for (i = 0; i < size; i++)
c[i] = a[i] + b[i];
}
// CPU function to generate a vector of random numbers
void random_T (T *a, int size) {
int i;
for (i = 0; i < size; i++)
a[i] = rand() % 10000; // random number between 0 and 9999
}
// CPU function to compare two vectors
int compare(T *a, T *b, T size ){
int pass = 0;
int i;
for (i = 0; i < size; i++){
if (a[i] != b[i]) {
printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]);
pass = 1;
}
}
if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn");
return pass;
}
int main()
{
int i;
random_T(in1, SIZE);
random_T(in2, SIZE);
#pragma offload target(mic) in(in1,in2) inout(res)
{
// Parallel loop from main function
#pragma omp parallel for
for (i=0; i<SIZE; i++)
res[i] = in1[i] + in2[i];
// or parallel loop is called inside the function
add_mic(in1, in2, res, SIZE);
}
//Check the results with CPU implementation
T res_cpu[SIZE];
add_cpu(in1, in2, res_cpu, SIZE);
compare(res, res_cpu, SIZE);
}
```
```console
$ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host
```
* To run this code on host, use
```console
$ ./vect-add-host
Test passed
```
* The second example shows how to compile the same code for Intel Xeon Phi
```console
$ icc -mmic -fopenmp vect-add.c -o vect-add-mic
```
* Execution of the Program in Native Mode on Intel Xeon Phi
The user access to the Intel Xeon Phi is through the SSH. Since user home directories are mounted using NFS on the accelerator, users do not have to copy binary files or libraries between the host and accelerator. Get the PATH of MIC enabled libraries for currently used Intel Compiler.
* To run this code on Intel Xeon Phi
```console
$ ssh mic0
$ ./vect-add-mic
./vect-add-mic: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
$ export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/apps/all/icc/2017.4.196-GCC-6.4.0-2.28/compilers_and_libraries/linux/lib/mic
$ ./vect-add-mic
Test passed
```
!!! tip
Or use the procedure from the chapter [Devel Environment](#devel-environment).
## Only Intel Xeon Phi Cards
Execute native job
```console
$ qsub -A NONE-0-0 -q qmic -l select=1 -l walltime=10:00:00 -I
r21u01n577-mic1:~$
```
## Devel Environment
To get an overview of the currently loaded modules, use `module list` or `ml` (without specifying extra arguments).
```console
r21u02n578-mic0:~$ ml
No modules loaded
```
To get an overview of all available modules, you can use `ml avail` or simply `ml av`
```console
r21u02n578-mic0:~$ ml av
-------------- /apps/phi/system/devel --------------------------
devel_environment/1.0 (S)
Where:
S: Module is Sticky, requires --force to unload or purge
```
Activate devel environment
```console
r21u02n578-mic0:~$ ml devel_environment
```
And again to get an overview of all available modules, you can use `ml avail` or simply `ml av`
```console
r21u02n578-mic0:~$ ml av
-------------- /apps/phi/modules/compiler --------------------------
icc/2017.4.196-GCC-6.4.0-2.28
-------------- /apps/phi/modules/devel --------------------------
M4/1.4.18 devel_environment/1.0 (S) ncurses/6.0
-------------- /apps/phi/modules/lang --------------------------
Bison/3.0.4 Tcl/8.6.6 flex/2.6.4
-------------- /apps/phi/modules/lib --------------------------
libreadline/7.0 zlib/1.2.11
-------------- /apps/phi/modules/math --------------------------
Octave/3.8.2
-------------- /apps/phi/modules/mpi --------------------------
impi/2017.3.196-iccifort-2017.4.196-GCC-6.4.0-2.28
-------------- /apps/phi/modules/toolchain --------------------------
iccifort/2017.4.196-GCC-6.4.0-2.28 ifort/2017.4.196-GCC-6.4.0-2.28
-------------- /apps/phi/modules/tools --------------------------
bzip2/1.0.6 cURL/7.53.1 expat/2.2.5
-------------- /apps/phi/modules/vis --------------------------
gettext/0.19.8
Where:
S: Module is Sticky, requires --force to unload or purge
```
After load module `devel_environment` are available modules for architecture k1om-mpss-linux and now exists systems software (gcc, cmake, make, git, htop, vim, ...).
* Example
```console
r21u02n578-mic0:~$ gcc --version
gcc (GCC) 5.1.1
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
@r21u02n578-mic0:~$ cmake --version
cmake version 2.8.7
r21u02n578-mic0:~$ git --version
git version 1.7.7
r21u02n578-mic0:~$ make --version
GNU Make 3.82
Built for k1om-mpss-linux-gnu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
r21u02n578-mic0:~$ perl --version
This is perl 5, version 14, subversion 2 (v5.14.2) built for k1om-linux
Copyright 1987-2011, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
...
```
* Execute previous cross-compiled code `vect-add-mic`
```console
r21u01n577-mic1:~$ ml devel_environment
r21u01n577-mic1:~$ ml icc
r21u01n577-mic1:~$ ./vect-add-mic
Test passed
```
!!! tip
PATH of MIC libraries for Intel Compiler set automatically.
## Modules
Examples for modules.
### MPI
Load module for devel environment `devel_environment` and load mpi module `impi/2017.3.196-iccifort-2017.4.196-GCC-6.4.0-2.28` (intel/2017b)
Execute test
```console
$ qsub -A SERVICE -q qmic -l select=4 -l walltime=01:00:00 -I
r21u01n577-mic0:~$ ml devel_environment
r21u01n577-mic0:~$ ml impi
r21u01n577-mic0:~$ ml
Currently Loaded Modules:
1) devel_environment/1.0 (S) 3) ifort/2017.4.196-GCC-6.4.0-2.28 5) impi/2017.3.196-iccifort-2017.4.196-GCC-6.4.0-2.28
2) icc/2017.4.196-GCC-6.4.0-2.28 4) iccifort/2017.4.196-GCC-6.4.0-2.28
Where:
S: Module is Sticky, requires --force to unload or purge
r21u01n577-mic0:~$ mpirun -n 244 hostname | sort | uniq -c | sort -n
61 r21u01n577-mic0
61 r21u01n577-mic1
61 r21u02n578-mic0
61 r21u02n578-mic1
r21u01n577-mic0:~$ mpirun -n 976 hostname | sort | uniq -c | sort -n
244 r21u01n577-mic0
244 r21u01n577-mic1
244 r21u02n578-mic0
244 r21u02n578-mic1
r21u01n577-mic0:~$ mpirun hostname | sort | uniq -c | sort -n
1 r21u01n577-mic0
1 r21u01n577-mic1
1 r21u02n578-mic0
1 r21u02n578-mic1
```
!!! warning
Modules icc, ifort and iccifort are only libraries and headers, not compilers... For compile use the procedure from the chapter [Native Mode](#native-mode)
### Octave/3.8.2
Load module for devel environment `devel_environment`, load module `Octave/3.8.2` and run test
```console
r21u01n577-mic0:~$ ml devel_environment
r21u01n577-mic0:~$ ml Octave/3.8.2
r21u01n577-mic0:~$ octave -q /apps/phi/software/Octave/3.8.2/example/test0.m
warning: docstring file '/apps/phi/software/Octave/3.8.2/share/octave/3.8.2/etc/built-in-docstrings' not found
warning: readline is not linked, so history control is not available
Use some basic operators ...
Work with some small matrixes ...
Save matrix to file ...
Load matrix from file ...
Display matrix ...
m3 =
39.200 19.600 39.200
58.800 117.600 156.800
254.800 411.600 686.000
Work with some big matrixes ...
Sum ...
Multiplication ...
r21u01n577-mic0:~$ cat test.mat
# Created by Octave 3.8.2, Thu Dec 07 11:11:09 2017 CET <kru0052@r21u01n577-mic0>
# name: m3
# type: matrix
# rows: 3
# columns: 3
39.2 19.6 39.2
58.8 117.6 156.8
254.8 411.6 686
```
## Native Build Software With Devel Environment
Compiler
* gcc (GCC) 5.1.1 **without** gfortran support
Architecture (depends on compiled software):
* k1om-unknown-linux-gnu
* k1om-mpss-linux-gnu
* x86_64-k1om-linux
* k1om-mpss-linux
Configure step (for `configure`,`make` and `make install` software)
* specify architecture `--build=`
```console
./configure --prefix=/apps/phi/software/ncurses/6.0 --build=k1om-mpss-linux
```
Modulefile and Lmod
* Read [Lmod](../modules/lmod/)
...@@ -61,9 +61,11 @@ pages: ...@@ -61,9 +61,11 @@ pages:
- Software: - Software:
- Modules: - Modules:
- Lmod Environment: software/modules/lmod.md - Lmod Environment: software/modules/lmod.md
- Intel Xeon Phi Environment: software/mic/mic_environment.md
- Modules Matrix: modules-matrix.md - Modules Matrix: modules-matrix.md
- Available Salomon Modules: modules-salomon.md - Available Salomon Modules: modules-salomon.md
- Available Salomon Modules on UV: modules-salomon-uv.md - Available Salomon Modules on UV: modules-salomon-uv.md
- Available Salomon Modules on PHI Cards: modules-salomon-phi.md
- Available Anselm Modules: modules-anselm.md - Available Anselm Modules: modules-anselm.md
- ISV Licenses: software/isv_licenses.md - ISV Licenses: software/isv_licenses.md
- Bioinformatics: - Bioinformatics:
...@@ -110,8 +112,8 @@ pages: ...@@ -110,8 +112,8 @@ pages:
- Intel TBB: software/intel/intel-suite/intel-tbb.md - Intel TBB: software/intel/intel-suite/intel-tbb.md
- Intel Trace Analyzer and Collector: software/intel/intel-suite/intel-trace-analyzer-and-collector.md - Intel Trace Analyzer and Collector: software/intel/intel-suite/intel-trace-analyzer-and-collector.md
- Intel Xeon Phi: - Intel Xeon Phi:
- Intel Xeon Phi Salomon: software/intel/intel-xeon-phi.md - Intel Xeon Phi Salomon: software/intel/intel-xeon-phi-salomon.md
- Intel Xeon Phi Anselm: software/intel/intel-xeon-phi.anselm.md - Intel Xeon Phi Anselm: software/intel/intel-xeon-phi-anselm.md
- Machine Learning: - Machine Learning:
- Introduction: software/machine-learning/introduction.md - Introduction: software/machine-learning/introduction.md
- TensorFlow: software/machine-learning/tensorflow.md - TensorFlow: software/machine-learning/tensorflow.md
......
...@@ -2,3 +2,4 @@ ...@@ -2,3 +2,4 @@
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/anselm.csv -o modules-anselm.csv curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/anselm.csv -o modules-anselm.csv
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/salomon.csv -o modules-salomon.csv curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/salomon.csv -o modules-salomon.csv
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/uv2000.csv -o modules-salomon-uv.csv curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/uv2000.csv -o modules-salomon-uv.csv
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/phi.csv -o modules-salomon-phi.csv
...@@ -2,6 +2,8 @@ ...@@ -2,6 +2,8 @@
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/anselm.md -o docs.it4i/modules-anselm.md curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/anselm.md -o docs.it4i/modules-anselm.md
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/salomon.md -o docs.it4i/modules-salomon.md curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/salomon.md -o docs.it4i/modules-salomon.md
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/uv2000.md -o docs.it4i/modules-salomon-uv.md curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/uv2000.md -o docs.it4i/modules-salomon-uv.md
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/phi.md -o docs.it4i/modules-salomon-phi.md
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/anselm.csv -o scripts/modules-anselm.csv curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/anselm.csv -o scripts/modules-anselm.csv
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/salomon.csv -o scripts/modules-salomon.csv curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/salomon.csv -o scripts/modules-salomon.csv
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/uv2000.csv -o scripts/modules-salomon-uv.csv curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/uv2000.csv -o scripts/modules-salomon-uv.csv
curl -s https://code.it4i.cz/sccs/it4i-modules/raw/master/phi.csv -o scripts/modules-salomon-phi.csv
...@@ -9,7 +9,7 @@ def get_data(filename): ...@@ -9,7 +9,7 @@ def get_data(filename):
'''function to read the data form the input csv file to use in the analysis''' '''function to read the data form the input csv file to use in the analysis'''
reader = [] # Just in case the file open fails reader = [] # Just in case the file open fails
with open(filename, 'rb') as f: with open(filename, 'rb') as f:
reader = csv.reader(f,delimiter=',') reader = csv.reader(f,delimiter=',')
#returns all the data from the csv file in list form #returns all the data from the csv file in list form
#f.close() # May need to close the file when done #f.close() # May need to close the file when done
return list(reader) # only return the reader when you have finished. return list(reader) # only return the reader when you have finished.
...@@ -63,5 +63,5 @@ packages = {} ...@@ -63,5 +63,5 @@ packages = {}
for m in sorted(software.items(), key=lambda i: i[0].lower()): for m in sorted(software.items(), key=lambda i: i[0].lower()):
packages[m[0]]=sorted(m[1], key=LooseVersion)[len(m[1])-1] packages[m[0]]=sorted(m[1], key=LooseVersion)[len(m[1])-1]
data = {'total': len(packages), 'projects': packages } data = {'total': len(packages), 'projects': packages }
print json.dumps(data) print json.dumps(data)
...@@ -8,7 +8,7 @@ def get_data(filename): ...@@ -8,7 +8,7 @@ def get_data(filename):
'''function to read the data form the input csv file to use in the analysis''' '''function to read the data form the input csv file to use in the analysis'''
reader = [] # Just in case the file open fails reader = [] # Just in case the file open fails
with open(filename, 'rb') as f: with open(filename, 'rb') as f:
reader = csv.reader(f,delimiter=',') reader = csv.reader(f,delimiter=',')
#returns all the data from the csv file in list form #returns all the data from the csv file in list form
#f.close() # May need to close the file when done #f.close() # May need to close the file when done
return list(reader) # only return the reader when you have finished. return list(reader) # only return the reader when you have finished.
......
docs.it4i/anselm/software/numerical-languages/introduction.md
docs.it4i/anselm/software/numerical-languages/matlab.md
docs.it4i/anselm/software/numerical-languages/matlab_1314.md
docs.it4i/anselm/software/numerical-languages/octave.md
docs.it4i/anselm/software/numerical-languages/r.md
docs.it4i/salomon/software/comsol/licensing-and-available-versions.md
docs.it4i/salomon/software/java.md
docs.it4i/salomon/software/numerical-languages/introduction.md
docs.it4i/salomon/software/numerical-languages/matlab.md
docs.it4i/salomon/software/numerical-languages/octave.md
docs.it4i/salomon/software/numerical-languages/opencoarrays.md
docs.it4i/salomon/software/numerical-languages/r.md
./docs.it4i/anselm/software/ansys/ansys-cfx.md
./docs.it4i/anselm/software/ansys/ansys-fluent.md
./docs.it4i/anselm/software/ansys/ansys-ls-dyna.md
./docs.it4i/anselm/software/ansys/ansys-mechanical-apdl.md
./docs.it4i/anselm/software/ansys/ansys.md
./docs.it4i/anselm/software/ansys/ls-dyna.md
./docs.it4i/salomon/software/ansys/ansys-cfx.md
./docs.it4i/salomon/software/ansys/ansys-fluent.md
./docs.it4i/salomon/software/ansys/ansys-ls-dyna.md
./docs.it4i/salomon/software/ansys/ansys-mechanical-apdl.md
./docs.it4i/salomon/software/ansys/ansys.md
./docs.it4i/salomon/software/ansys/licensing.md
./docs.it4i/salomon/software/ansys/setting-license-preferences.md
./docs.it4i/salomon/software/ansys/workbench.md
./docs.it4i/anselm/software/machine-learning/introduction.md
./docs.it4i/anselm/software/machine-learning/tensorflow.md
./docs.it4i/salomon/software/machine-learning/introduction.md
./docs.it4i/salomon/software/machine-learning/tensorflow.md
./docs.it4i/anselm/software/debuggers
./docs.it4i/anselm/software/debuggers/allinea-ddt.md
./docs.it4i/anselm/software/debuggers/allinea-performance-reports.md
./docs.it4i/anselm/software/debuggers/cube.md
./docs.it4i/anselm/software/debuggers/debuggers.md
./docs.it4i/anselm/software/debuggers/intel-performance-counter-monitor.md
./docs.it4i/anselm/software/debuggers/intel-vtune-amplifier.md
./docs.it4i/anselm/software/debuggers/papi.md
./docs.it4i/anselm/software/debuggers/scalasca.md
./docs.it4i/anselm/software/debuggers/score-p.md
./docs.it4i/anselm/software/debuggers/total-view.md
./docs.it4i/anselm/software/debuggers/valgrind.md
./docs.it4i/anselm/software/debuggers/vampir.md
./docs.it4i/salomon/software/debuggers
./docs.it4i/salomon/software/debuggers/Introduction.md
./docs.it4i/salomon/software/debuggers/aislinn.md
./docs.it4i/salomon/software/debuggers/allinea-ddt.md
./docs.it4i/salomon/software/debuggers/allinea-performance-reports.md
./docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md
./docs.it4i/salomon/software/debuggers/mympiprog_32p_2014-10-15_16-56.html
./docs.it4i/salomon/software/debuggers/mympiprog_32p_2014-10-15_16-56.txt
./docs.it4i/salomon/software/debuggers/total-view.md
./docs.it4i/salomon/software/debuggers/valgrind.md
./docs.it4i/salomon/software/debuggers/vampir.md
./docs.it4i/anselm/software/numerical-libraries
./docs.it4i/anselm/software/numerical-libraries/fftw.md
./docs.it4i/anselm/software/numerical-libraries/gsl.md
./docs.it4i/anselm/software/numerical-libraries/hdf5.md
./docs.it4i/anselm/software/numerical-libraries/intel-numerical-libraries.md
./docs.it4i/anselm/software/numerical-libraries/magma-for-intel-xeon-phi.md
./docs.it4i/anselm/software/numerical-libraries/petsc.md
./docs.it4i/anselm/software/numerical-libraries/trilinos.md
./docs.it4i/anselm/software/intel-suite
./docs.it4i/anselm/software/intel-suite/intel-compilers.md
./docs.it4i/anselm/software/intel-suite/intel-debugger.md
./docs.it4i/anselm/software/intel-suite/intel-integrated-performance-primitives.md
./docs.it4i/anselm/software/intel-suite/intel-mkl.md
./docs.it4i/anselm/software/intel-suite/intel-tbb.md
./docs.it4i/anselm/software/intel-suite/introduction.md
./docs.it4i/salomon/software/intel-suite
./docs.it4i/salomon/software/intel-suite/intel-advisor.md
./docs.it4i/salomon/software/intel-suite/intel-compilers.md
./docs.it4i/salomon/software/intel-suite/intel-debugger.md
./docs.it4i/salomon/software/intel-suite/intel-inspector.md
./docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md
./docs.it4i/salomon/software/intel-suite/intel-mkl.md
./docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md
./docs.it4i/salomon/software/intel-suite/intel-tbb.md
./docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md
./docs.it4i/anselm/software/paraview.md
./docs.it4i/anselm/software/compilers.md
./docs.it4i/salomon/software/compilers.md
./docs.it4i/salomon/software/paraview.md
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment