diff --git a/docs.it4i/salomon/capacity-computing.md b/docs.it4i/salomon/capacity-computing.md index 702ef7f5220722e447a562f2e1397cb6c79e85f4..39b4c029903b04c067c9f9e2d7e48d13fac3f133 100644 --- a/docs.it4i/salomon/capacity-computing.md +++ b/docs.it4i/salomon/capacity-computing.md @@ -41,7 +41,7 @@ Assume we have 900 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` @@ -78,7 +78,7 @@ If huge number of parallel multicore (in means of multinode multithread, e. g. M To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing/#array_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME -J 1-900 jobscript 506493[].isrv5 ``` @@ -87,7 +87,7 @@ In this example, we submit a job array of 900 subjobs. Each subjob will run on f Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround: -```bash +```console $ qsub -N JOBNAME -J 9-10:2 jobscript ``` @@ -97,7 +97,7 @@ This will only choose the lower index (9 in this example) for submitting/running Check status of the job array by the qstat command. -```bash +```console $ qstat -a 506493[].isrv5 isrv5: @@ -111,7 +111,7 @@ The status B means that some subjobs are already running. Check status of the first 100 subjobs by the qstat command. -```bash +```console $ qstat -a 12345[1-100].isrv5 isrv5: @@ -129,7 +129,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted. -```bash +```console $ qdel 12345[].isrv5 ``` @@ -137,13 +137,13 @@ Deleting large job arrays may take a while. Display status information for all user's jobs, job arrays, and subjobs. -```bash +```console $ qstat -u $USER -t ``` Display status information for all user's subjobs. -```bash +```console $ qstat -u $USER -tJ ``` @@ -158,7 +158,7 @@ GNU parallel is a shell tool for executing jobs in parallel using one or more co For more information and examples see the parallel man page: -```bash +```console $ module add parallel $ man parallel ``` @@ -173,7 +173,7 @@ Assume we have 101 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file, listing all tasks - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` @@ -211,7 +211,7 @@ In this example, tasks from tasklist are executed via the GNU parallel. The jobs To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing/#gp_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME jobscript 12345.dm2 ``` @@ -241,13 +241,13 @@ Assume we have 992 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file, listing all tasks - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` Next we create a file, controlling how many tasks will be executed in one subjob -```bash +```console $ seq 32 > numtasks ``` @@ -296,7 +296,7 @@ When deciding this values, think about following guiding rules : To submit the job array, use the qsub -J command. The 992 tasks' job of the [example above](capacity-computing/#combined_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME -J 1-992:32 jobscript 12345[].dm2 ``` @@ -312,7 +312,7 @@ Download the examples in [capacity.zip](capacity.zip), illustrating the above li Unzip the archive in an empty directory on Anselm and follow the instructions in the README file -```bash +```console $ unzip capacity.zip $ cd capacity $ cat README diff --git a/docs.it4i/salomon/environment-and-modules.md b/docs.it4i/salomon/environment-and-modules.md index 9671013566e7621e42b2d0cdf693eed783f13197..5de3931c3d2b060d69a544343836c46caba20509 100644 --- a/docs.it4i/salomon/environment-and-modules.md +++ b/docs.it4i/salomon/environment-and-modules.md @@ -4,7 +4,7 @@ After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions and module loads in the .bashrc file -```bash +```console # ./bashrc # Source global definitions @@ -32,7 +32,7 @@ In order to configure your shell for running particular application on Salomon w Application modules on Salomon cluster are built using [EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). The modules are divided into the following structure: -```bash +```console base: Default module class bio: Bioinformatics, biology and biomedical cae: Computer Aided Engineering (incl. CFD) @@ -63,33 +63,33 @@ The modules may be loaded, unloaded and switched, according to momentary needs. To check available modules use -```bash -$ module avail +```console +$ module avail **or** ml av ``` To load a module, for example the Open MPI module use -```bash -$ module load OpenMPI +```console +$ module load OpenMPI **or** ml OpenMPI ``` loading the Open MPI module will set up paths and environment variables of your active shell such that you are ready to run the Open MPI software To check loaded modules use -```bash -$ module list +```console +$ module list **or** ml ``` To unload a module, for example the Open MPI module use -```bash -$ module unload OpenMPI +```console +$ module unload OpenMPI **or** ml -OpenMPI ``` Learn more on modules by reading the module man page -```bash +```console $ man module ``` diff --git a/docs.it4i/salomon/job-submission-and-execution.md b/docs.it4i/salomon/job-submission-and-execution.md index e7a4c4ff0039815504804e9f5fcb30959e8713e6..dea86065b70048af16b40dc9252525cfc0816de0 100644 --- a/docs.it4i/salomon/job-submission-and-execution.md +++ b/docs.it4i/salomon/job-submission-and-execution.md @@ -16,7 +16,7 @@ When allocating computational resources for the job, please specify Submit the job using the qsub command: -```bash +```console $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript ``` @@ -27,25 +27,25 @@ The qsub submits the job into the queue, in another words the qsub command creat ### Job Submission Examples -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=24,walltime=03:00:00 ./myjob ``` In this example, we allocate 64 nodes, 24 cores per node, for 3 hours. We allocate these resources via the qprod queue, consumed resources will be accounted to the Project identified by Project ID OPEN-0-0. Jobscript myjob will be executed on the first node in the allocation. -```bash +```console $ qsub -q qexp -l select=4:ncpus=24 -I ``` In this example, we allocate 4 nodes, 24 cores per node, for 1 hour. We allocate these resources via the qexp queue. The resources will be available interactively -```bash +```console $ qsub -A OPEN-0-0 -q qlong -l select=10:ncpus=24 ./myjob ``` In this example, we allocate 10 nodes, 24 cores per node, for 72 hours. We allocate these resources via the qlong queue. Jobscript myjob will be executed on the first node in the allocation. -```bash +```console $ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=24 ./myjob ``` @@ -57,13 +57,13 @@ To allocate a node with Xeon Phi co-processor, user needs to specify that in sel The absence of specialized queue for accessing the nodes with cards means, that the Phi cards can be utilized in any queue, including qexp for testing/experiments, qlong for longer jobs, qfree after the project resources have been spent, etc. The Phi cards are thus also available to PRACE users. There's no need to ask for permission to utilize the Phi cards in project proposals. -```bash +```console $ qsub -A OPEN-0-0 -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 ./myjob ``` In this example, we allocate 1 node, with 24 cores, with 2 Xeon Phi 7120p cards, running batch job ./myjob. The default time for qprod is used, e. g. 24 hours. -```bash +```console $ qsub -A OPEN-0-0 -I -q qlong -l select=4:ncpus=24:accelerator=True:naccelerators=2 -l walltime=56:00:00 -I ``` @@ -78,13 +78,13 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores The UV2000 (node uv1) offers 3328GB of RAM and 112 cores, distributed in 14 NUMA nodes. A NUMA node packs 8 cores and approx. 236GB RAM. In the PBS the UV2000 provides 14 chunks, a chunk per NUMA node (see [Resource allocation policy](resources-allocation-policy/)). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself. -```bash +```console $ qsub -A OPEN-0-0 -q qfat -l select=14 ./myjob ``` In this example, we allocate all 14 NUMA nodes (corresponds to 14 chunks), 112 cores of the SGI UV2000 node for 72 hours. Jobscript myjob will be executed on the node uv1. -```bash +```console $ qsub -A OPEN-0-0 -q qfat -l select=1:mem=2000GB ./myjob ``` @@ -94,13 +94,13 @@ In this example, we allocate 2000GB of memory on the UV2000 for 72 hours. By req All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such a case, no options to qsub are needed. -```bash +```console $ qsub ./myjob ``` By default, the PBS batch system sends an e-mail only when the job is aborted. Disabling mail events completely can be done like this: -```bash +```console $ qsub -m n ``` @@ -113,13 +113,13 @@ $ qsub -m n Specific nodes may be selected using PBS resource attribute host (for hostnames): -```bash +```console qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=r24u35n680+1:ncpus=24:host=r24u36n681 -I ``` Specific nodes may be selected using PBS resource attribute cname (for short names in cns[0-1]+ format): -```bash +```console qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=cns680+1:ncpus=24:host=cns681 -I ``` @@ -142,7 +142,7 @@ Nodes directly connected to the one InifiBand switch can be allocated using node In this example, we request all 9 nodes directly connected to the same switch using node grouping placement. -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob ``` @@ -155,13 +155,13 @@ Nodes directly connected to the specific InifiBand switch can be selected using In this example, we request all 9 nodes directly connected to r4i1s0sw1 switch. -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24:switch=r4i1s0sw1 ./myjob ``` List of all InifiBand switches: -```bash +```console $ qmgr -c 'print node @a' | grep switch | awk '{print $6}' | sort -u r1i0s0sw0 r1i0s0sw1 @@ -169,12 +169,11 @@ r1i1s0sw0 r1i1s0sw1 r1i2s0sw0 ... -... ``` List of all all nodes directly connected to the specific InifiBand switch: -```bash +```console $ qmgr -c 'p n @d' | grep 'switch = r36sw3' | awk '{print $3}' | sort r36u31n964 r36u32n965 @@ -203,7 +202,7 @@ Nodes located in the same dimension group may be allocated using node grouping o In this example, we allocate 16 nodes in the same [hypercube dimension](7d-enhanced-hypercube/) 1 group. -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=16:ncpus=24 -l place=group=ehc_1d -I ``` @@ -211,7 +210,7 @@ For better understanding: List of all groups in dimension 1: -```bash +```console $ qmgr -c 'p n @d' | grep ehc_1d | awk '{print $6}' | sort |uniq -c 18 r1i0 18 r1i1 @@ -222,7 +221,7 @@ $ qmgr -c 'p n @d' | grep ehc_1d | awk '{print $6}' | sort |uniq -c List of all all nodes in specific dimension 1 group: -```bash +```console $ $ qmgr -c 'p n @d' | grep 'ehc_1d = r1i0' | awk '{print $3}' | sort r1i0n0 r1i0n1 @@ -236,7 +235,7 @@ r1i0n11 !!! note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands -```bash +```console $ qstat -a $ qstat -a -u username $ qstat -an -u username @@ -245,7 +244,7 @@ $ qstat -f 12345.isrv5 Example: -```bash +```console $ qstat -a srv11: @@ -261,7 +260,7 @@ In this example user1 and user2 are running jobs named job1, job2 and job3x. The Check status of your jobs using check-pbs-jobs command. Check presence of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output. -```bash +```console $ check-pbs-jobs --check-all $ check-pbs-jobs --print-load --print-processes $ check-pbs-jobs --print-job-out --print-job-err @@ -271,7 +270,7 @@ $ check-pbs-jobs --jobid JOBID --tailf-job-out Examples: -```bash +```console $ check-pbs-jobs --check-all JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Check session id: OK @@ -282,7 +281,7 @@ r3i6n3: No process In this example we see that job 35141.dm2 currently runs no process on allocated node r3i6n2, which may indicate an execution error. -```bash +```console $ check-pbs-jobs --print-load --print-processes JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Print load @@ -298,7 +297,7 @@ r3i6n2: 99.7 run-task In this example we see that job 35141.dm2 currently runs process run-task on node r3i6n2, using one thread only, while node r3i6n3 is empty, which may indicate an execution error. -```bash +```console $ check-pbs-jobs --jobid 35141.dm2 --print-job-out JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Print job standard output: @@ -317,19 +316,19 @@ In this example, we see actual output (some iteration loops) of the job 35141.dm You may release your allocation at any time, using qdel command -```bash +```console $ qdel 12345.isrv5 ``` You may kill a running job by force, using qsig command -```bash +```console $ qsig -s 9 12345.isrv5 ``` Learn more by reading the pbs man page -```bash +```console $ man pbs_professional ``` @@ -345,7 +344,7 @@ The Jobscript is a user made script, controlling sequence of commands for execut !!! note The jobscript or interactive shell is executed on first of the allocated nodes. -```bash +```console $ qsub -q qexp -l select=4:ncpus=24 -N Name0 ./myjob $ qstat -n -u username @@ -362,7 +361,7 @@ In this example, the nodes r21u01n577, r21u02n578, r21u03n579, r21u04n580 were a !!! note The jobscript or interactive shell is by default executed in home directory -```bash +```console $ qsub -q qexp -l select=4:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready @@ -380,7 +379,7 @@ The allocated nodes are accessible via ssh from login nodes. The nodes may acces Calculations on allocated nodes may be executed remotely via the MPI, ssh, pdsh or clush. You may find out which nodes belong to the allocation by reading the $PBS_NODEFILE file -```bash +```console qsub -q qexp -l select=2:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready diff --git a/docs.it4i/salomon/network.md b/docs.it4i/salomon/network.md index 2f3f8a09f474c12ffe961781c39ea6fbea260a46..91da0de5ee2114ca159ee722f6b5f7db212a9c0d 100644 --- a/docs.it4i/salomon/network.md +++ b/docs.it4i/salomon/network.md @@ -16,7 +16,7 @@ The network provides **2170MB/s** transfer rates via the TCP connection (single ## Example -```bash +```console $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username Req'd Req'd Elap @@ -28,14 +28,14 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time In this example, we access the node r4i1n0 by Infiniband network via the ib0 interface. -```bash +```console $ ssh 10.17.35.19 ``` In this example, we get information of the Infiniband network. -```bash +```console $ ifconfig .... inet addr:10.17.35.19.... diff --git a/docs.it4i/salomon/prace.md b/docs.it4i/salomon/prace.md index bcd9c4bd6b73795b026f41abd3e0d3bd5351251e..a3c80fa840dd1ff4ccb0dd17cd1f3d82001bdbdb 100644 --- a/docs.it4i/salomon/prace.md +++ b/docs.it4i/salomon/prace.md @@ -36,14 +36,14 @@ Most of the information needed by PRACE users accessing the Salomon TIER-1 syste Before you start to use any of the services don't forget to create a proxy certificate from your certificate: -```bash - $ grid-proxy-init +```console +$ grid-proxy-init ``` To check whether your proxy certificate is still valid (by default it's valid 12 hours), use: -```bash - $ grid-proxy-info +```console +$ grid-proxy-info ``` To access Salomon cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners). @@ -60,14 +60,14 @@ It is recommended to use the single DNS name salomon-prace.it4i.cz which is dist | login3-prace.salomon.it4i.cz | 2222 | gsissh | login3 | | login4-prace.salomon.it4i.cz | 2222 | gsissh | login4 | -```bash - $ gsissh -p 2222 salomon-prace.it4i.cz +```console +$ gsissh -p 2222 salomon-prace.it4i.cz ``` When logging from other PRACE system, the prace_service script can be used: -```bash - $ gsissh `prace_service -i -s salomon` +```console +$ gsissh `prace_service -i -s salomon` ``` #### Access From Public Internet: @@ -82,27 +82,24 @@ It is recommended to use the single DNS name salomon.it4i.cz which is distribute | login3-prace.salomon.it4i.cz | 2222 | gsissh | login3 | | login4-prace.salomon.it4i.cz | 2222 | gsissh | login4 | -```bash - $ gsissh -p 2222 salomon.it4i.cz +```console +$ gsissh -p 2222 salomon.it4i.cz ``` When logging from other PRACE system, the prace_service script can be used: -```bash - $ gsissh `prace_service -e -s salomon` +```console +$ gsissh `prace_service -e -s salomon` ``` Although the preferred and recommended file transfer mechanism is [using GridFTP](prace/#file-transfers), the GSI SSH implementation on Salomon supports also SCP, so for small files transfer gsiscp can be used: -```bash - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - - $ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ - - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - - $ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ +```console +$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ ``` ### Access to X11 Applications (VNC) @@ -111,8 +108,8 @@ If the user needs to run X11 based graphical application and does not have a X11 If the user uses GSI SSH based access, then the procedure is similar to the SSH based access ([look here](../general/accessing-the-clusters/graphical-user-interface/x-window-system/)), only the port forwarding must be done using GSI SSH: -```bash - $ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961 +```console +$ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961 ``` ### Access With SSH @@ -138,26 +135,26 @@ There's one control server and three backend servers for striping and/or backup Copy files **to** Salomon by running the following commands on your local machine: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Copy files **from** Salomon: -```bash - $ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` ### Access From Public Internet @@ -171,26 +168,26 @@ Or by using prace_service script: Copy files **to** Salomon by running the following commands on your local machine: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Copy files **from** Salomon: -```bash - $ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Generally both shared file systems are available through GridFTP: @@ -222,8 +219,8 @@ All system wide installed software on the cluster is made available to the users PRACE users can use the "prace" module to use the [PRACE Common Production Environment](http://www.prace-ri.eu/prace-common-production-environment/). -```bash - $ module load prace +```console +$ module load prace ``` ### Resource Allocation and Job Execution @@ -251,8 +248,8 @@ Users who have undergone the full local registration procedure (including signin !!! note The **it4ifree** command is a part of it4i.portal.clients package, [located here](https://pypi.python.org/pypi/it4i.portal.clients). -```bash - $ it4ifree +```console +$ it4ifree Password: PID Total Used ...by me Free -------- ------- ------ -------- ------- @@ -262,9 +259,9 @@ Users who have undergone the full local registration procedure (including signin By default file system quota is applied. To check the current status of the quota (separate for HOME and SCRATCH) use -```bash - $ quota - $ lfs quota -u USER_LOGIN /scratch +```console +$ quota +$ lfs quota -u USER_LOGIN /scratch ``` If the quota is insufficient, please contact the [support](prace/#help-and-support) and request an increase. diff --git a/docs.it4i/salomon/resources-allocation-policy.md b/docs.it4i/salomon/resources-allocation-policy.md index d705a527d4ed1e0988a4c76575687c23239e41de..296844a23f2b567b11f15ad2d79b630ac81f8fb5 100644 --- a/docs.it4i/salomon/resources-allocation-policy.md +++ b/docs.it4i/salomon/resources-allocation-policy.md @@ -46,13 +46,13 @@ Salomon users may check current queue configuration at <https://extranet.it4i.cz Display the queue status on Salomon: -```bash +```console $ qstat -q ``` The PBS allocation overview may be obtained also using the rspbs command. -```bash +```console $ rspbs Usage: rspbs [options] @@ -122,7 +122,7 @@ The resources that are currently subject to accounting are the core-hours. The c User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. -```bash +```console $ it4ifree Password: PID Total Used ...by me Free diff --git a/docs.it4i/salomon/shell-and-data-access.md b/docs.it4i/salomon/shell-and-data-access.md index 5d00acc957233f192d143143e5620da3326f6e36..f843691d68938ecea39be5abd199689a6b95f1e3 100644 --- a/docs.it4i/salomon/shell-and-data-access.md +++ b/docs.it4i/salomon/shell-and-data-access.md @@ -26,13 +26,13 @@ Private key authentication: On **Linux** or **Mac**, use -```bash +```console local $ ssh -i /path/to/id_rsa username@salomon.it4i.cz ``` If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this command to set lower permissions to private key file. -```bash +```console local $ chmod 600 /path/to/id_rsa ``` @@ -40,7 +40,7 @@ On **Windows**, use [PuTTY ssh client](../general/accessing-the-clusters/shell-a After logging in, you will see the command prompt: -```bash +```console _____ _ / ____| | | | (___ __ _| | ___ _ __ ___ ___ _ __ @@ -75,23 +75,23 @@ The authentication is by the [private key](../general/accessing-the-clusters/she On linux or Mac, use scp or sftp client to transfer the data to Salomon: -```bash +```console local $ scp -i /path/to/id_rsa my-local-file username@salomon.it4i.cz:directory/file ``` -```bash +```console local $ scp -i /path/to/id_rsa -r my-local-dir username@salomon.it4i.cz:directory ``` or -```bash +```console local $ sftp -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz ``` Very convenient way to transfer files in and out of the Salomon computer is via the fuse filesystem [sshfs](http://linux.die.net/man/1/sshfs) -```bash +```console local $ sshfs -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz:. mountpoint ``` @@ -136,7 +136,7 @@ It works by tunneling the connection from Salomon back to users workstation and Pick some unused port on Salomon login node (for example 6000) and establish the port forwarding: -```bash +```console local $ ssh -R 6000:remote.host.com:1234 salomon.it4i.cz ``` @@ -146,7 +146,7 @@ Port forwarding may be done **using PuTTY** as well. On the PuTTY Configuration Port forwarding may be established directly to the remote host. However, this requires that user has ssh access to remote.host.com -```bash +```console $ ssh -L 6000:localhost:1234 remote.host.com ``` @@ -160,7 +160,7 @@ First, establish the remote port forwarding form the login node, as [described a Second, invoke port forwarding from the compute node to the login node. Insert following line into your jobscript or interactive shell -```bash +```console $ ssh -TN -f -L 6000:localhost:6000 login1 ``` @@ -175,7 +175,7 @@ Port forwarding is static, each single port is mapped to a particular port on re To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: -```bash +```console local $ ssh -D 1080 localhost ``` @@ -183,7 +183,7 @@ On Windows, install and run the free, open source [Sock Puppet](http://sockspupp Once the proxy server is running, establish ssh port forwarding from Salomon to the proxy server, port 1080, exactly as [described above](#port-forwarding-from-login-nodes). -```bash +```console local $ ssh -R 6000:localhost:1080 salomon.it4i.cz ``` diff --git a/docs.it4i/salomon/software/ansys/ansys-fluent.md b/docs.it4i/salomon/software/ansys/ansys-fluent.md index 33e711b285cc8066604c43ebb7c943dcb1294fb6..4132b5724b0d6e2fba983992d8f6703afed0e88c 100644 --- a/docs.it4i/salomon/software/ansys/ansys-fluent.md +++ b/docs.it4i/salomon/software/ansys/ansys-fluent.md @@ -44,7 +44,7 @@ Working directory has to be created before sending pbs job into the queue. Input Journal file with definition of the input geometry and boundary conditions and defined process of solution has e.g. the following structure: -```bash +```console /file/read-case aircraft_2m.cas.gz /solve/init init @@ -58,7 +58,7 @@ The appropriate dimension of the problem has to be set by parameter (2d/3d). 1. Fast way to run Fluent from command line -```bash +```console fluent solver_version [FLUENT_options] -i journal_file -pbs ``` @@ -145,7 +145,7 @@ It runs the jobs out of the directory from which they are submitted (PBS_O_WORKD Fluent could be run in parallel only under Academic Research license. To do so this ANSYS Academic Research license must be placed before ANSYS CFD license in user preferences. To make this change anslic_admin utility should be run -```bash +```console /ansys_inc/shared_les/licensing/lic_admin/anslic_admin ``` diff --git a/docs.it4i/salomon/software/ansys/ansys.md b/docs.it4i/salomon/software/ansys/ansys.md index f93524a3e580f8a5c83302f8d1cd9997bb68c2be..d7e0f2e1444ddc77dd861a4cce4eef06b4c78a6c 100644 --- a/docs.it4i/salomon/software/ansys/ansys.md +++ b/docs.it4i/salomon/software/ansys/ansys.md @@ -6,8 +6,8 @@ Anselm provides as commercial as academic variants. Academic variants are distin To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module: -```bash - $ module load ansys +```console +$ ml ansys ``` ANSYS supports interactive regime, but due to assumed solution of extremely difficult tasks it is not recommended. diff --git a/docs.it4i/salomon/software/ansys/licensing.md b/docs.it4i/salomon/software/ansys/licensing.md index 04ff6513349ccede25a0846dd21227251e954732..eac78966d4b5183b2f0052d2ab6aea37f28eccc5 100644 --- a/docs.it4i/salomon/software/ansys/licensing.md +++ b/docs.it4i/salomon/software/ansys/licensing.md @@ -18,6 +18,7 @@ The licence intended to be used for science and research, publications, students * 16.1 * 17.0 +* 18.0 ## License Preferences diff --git a/docs.it4i/salomon/software/ansys/setting-license-preferences.md b/docs.it4i/salomon/software/ansys/setting-license-preferences.md index fe14541d46b1fe4cab38eb7b883c58e40e03dd32..b3f594d14863cde6aaa28f7a5139223d30a7d95b 100644 --- a/docs.it4i/salomon/software/ansys/setting-license-preferences.md +++ b/docs.it4i/salomon/software/ansys/setting-license-preferences.md @@ -6,8 +6,8 @@ Thus you need to configure preferred license order with ANSLIC_ADMIN. Please fol Launch the ANSLIC_ADMIN utility in a graphical environment: -```bash - $ANSYSLIC_DIR/lic_admin/anslic_admin +```console +$ANSYSLIC_DIR/lic_admin/anslic_admin ``` ANSLIC_ADMIN Utility will be run diff --git a/docs.it4i/salomon/software/ansys/workbench.md b/docs.it4i/salomon/software/ansys/workbench.md index 8ed07d789dea69798e68c177ac1612a3e391ec88..1b138ccd09fa64fd6ccbafbcb40ff14b2959bad4 100644 --- a/docs.it4i/salomon/software/ansys/workbench.md +++ b/docs.it4i/salomon/software/ansys/workbench.md @@ -8,7 +8,7 @@ It is possible to run Workbench scripts in batch mode. You need to configure sol Enable Distribute Solution checkbox and enter number of cores (eg. 48 to run on two Salomon nodes). If you want the job to run on more then 1 node, you must also provide a so called MPI appfile. In the Additional Command Line Arguments input field, enter: -```bash +```console -mpifile /path/to/my/job/mpifile.txt ``` diff --git a/docs.it4i/salomon/software/chemistry/nwchem.md b/docs.it4i/salomon/software/chemistry/nwchem.md index a26fc701ee44585dbab1f942685b92d9190adfa5..add429da99d2044e2ddaa64d29350e766c558bc2 100644 --- a/docs.it4i/salomon/software/chemistry/nwchem.md +++ b/docs.it4i/salomon/software/chemistry/nwchem.md @@ -15,8 +15,8 @@ The following versions are currently installed: For a current list of installed versions, execute: -```bash - module avail NWChem +```console +$ ml av NWChem ``` The recommend to use version 6.5. Version 6.3 fails on Salomon nodes with accelerator, because it attempts to communicate over scif0 interface. In 6.5 this is avoided by setting ARMCI_OPENIB_DEVICE=mlx4_0, this setting is included in the module. diff --git a/docs.it4i/salomon/software/chemistry/phono3py.md b/docs.it4i/salomon/software/chemistry/phono3py.md index 3f747d23bc9775f80137c0d6e4f1b4821d97439b..9dba8ab1809959a7b08afa811e41c26197ef3a4a 100644 --- a/docs.it4i/salomon/software/chemistry/phono3py.md +++ b/docs.it4i/salomon/software/chemistry/phono3py.md @@ -4,11 +4,14 @@ This GPL software calculates phonon-phonon interactions via the third order force constants. It allows to obtain lattice thermal conductivity, phonon lifetime/linewidth, imaginary part of self energy at the lowest order, joint density of states (JDOS) and weighted-JDOS. For details see Phys. Rev. B 91, 094306 (2015) and <http://atztogo.github.io/phono3py/index.html> -!!! note - Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module +Available modules + +```console +$ ml av phono3py +``` ```bash -$ module load phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 +$ ml phono3py ``` ## Example of Calculating Thermal Conductivity of Si Using VASP Code. @@ -17,7 +20,7 @@ $ module load phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 One needs to calculate second order and third order force constants using the diamond structure of silicon stored in [POSCAR](poscar-si) (the same form as in VASP) using single displacement calculations within supercell. -```bash +```console $ cat POSCAR Si 1.0 @@ -39,14 +42,14 @@ Direct ### Generating Displacement Using 2 by 2 by 2 Supercell for Both Second and Third Order Force Constants -```bash +```console $ phono3py -d --dim="2 2 2" -c POSCAR ``` 111 displacements is created stored in disp_fc3.yaml, and the structure input files with this displacements are POSCAR-00XXX, where the XXX=111. -```bash +```console disp_fc3.yaml POSCAR-00008 POSCAR-00017 POSCAR-00026 POSCAR-00035 POSCAR-00044 POSCAR-00053 POSCAR-00062 POSCAR-00071 POSCAR-00080 POSCAR-00089 POSCAR-00098 POSCAR-00107 POSCAR POSCAR-00009 POSCAR-00018 POSCAR-00027 POSCAR-00036 POSCAR-00045 POSCAR-00054 POSCAR-00063 POSCAR-00072 POSCAR-00081 POSCAR-00090 POSCAR-00099 POSCAR-00108 POSCAR-00001 POSCAR-00010 POSCAR-00019 POSCAR-00028 POSCAR-00037 POSCAR-00046 POSCAR-00055 POSCAR-00064 POSCAR-00073 POSCAR-00082 POSCAR-00091 POSCAR-00100 POSCAR-00109 @@ -60,7 +63,7 @@ POSCAR-00007 POSCAR-00016 POSCAR-00025 POSCAR-00034 POSCAR-00043 POSCAR-00052 For each displacement the forces needs to be calculated, i.e. in form of the output file of VASP (vasprun.xml). For a single VASP calculations one needs [KPOINTS](KPOINTS), [POTCAR](POTCAR), [INCAR](INCAR) in your case directory (where you have POSCARS) and those 111 displacements calculations can be generated by [prepare.sh](prepare.sh) script. Then each of the single 111 calculations is submitted [run.sh](run.sh) by [submit.sh](submit.sh). -```bash +```console $./prepare.sh $ls disp-00001 disp-00009 disp-00017 disp-00025 disp-00033 disp-00041 disp-00049 disp-00057 disp-00065 disp-00073 disp-00081 disp-00089 disp-00097 disp-00105 INCAR @@ -75,7 +78,7 @@ disp-00008 disp-00016 disp-00024 disp-00032 disp-00040 disp-00048 disp-00056 dis Taylor your run.sh script to fit into your project and other needs and submit all 111 calculations using submit.sh script -```bash +```console $ ./submit.sh ``` @@ -83,13 +86,13 @@ $ ./submit.sh Once all jobs are finished and vasprun.xml is created in each disp-XXXXX directory the collection is done by -```bash +```console $ phono3py --cf3 disp-{00001..00111}/vasprun.xml ``` and `disp_fc2.yaml, FORCES_FC2`, `FORCES_FC3` and disp_fc3.yaml should appear and put into the hdf format by -```bash +```console $ phono3py --dim="2 2 2" -c POSCAR ``` @@ -99,13 +102,13 @@ resulting in `fc2.hdf5` and `fc3.hdf5` The phonon lifetime calculations takes some time, however is independent on grid points, so could be splitted: -```bash +```console $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" --sigma 0.1 --wgp ``` ### Inspecting ir_grid_points.yaml -```bash +```console $ grep grid_point ir_grid_points.yaml num_reduced_ir_grid_points: 35 ir_grid_points: # [address, weight] @@ -148,18 +151,18 @@ ir_grid_points: # [address, weight] one finds which grid points needed to be calculated, for instance using following -```bash +```console $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" -c POSCAR --sigma 0.1 --br --write-gamma --gp="0 1 2 ``` one calculates grid points 0, 1, 2. To automize one can use for instance scripts to submit 5 points in series, see [gofree-cond1.sh](gofree-cond1.sh) -```bash +```console $ qsub gofree-cond1.sh ``` Finally the thermal conductivity result is produced by grouping single conductivity per grid calculations using -```bash +```console $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" --br --read_gamma ``` diff --git a/docs.it4i/salomon/software/comsol/comsol-multiphysics.md b/docs.it4i/salomon/software/comsol/comsol-multiphysics.md index ca79d5235ae1bd9afb4299a45d5e2d57a79cba24..7279793d99d0c8fc9d32711f076f6e3da0fde331 100644 --- a/docs.it4i/salomon/software/comsol/comsol-multiphysics.md +++ b/docs.it4i/salomon/software/comsol/comsol-multiphysics.md @@ -22,19 +22,19 @@ On the clusters COMSOL is available in the latest stable version. There are two To load the of COMSOL load the module -```bash -$ module load COMSOL/51-EDU +```console +$ ml COMSOL/51-EDU ``` By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use -```bash -$ module avail COMSOL +```console +$ ml av COMSOL ``` If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the [Virtual Network Computing (VNC)](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/). -```bash +```console $ xhost + $ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ppn=24 $ module load COMSOL @@ -76,7 +76,7 @@ COMSOL is the software package for the numerical solution of the partial differe LiveLink for MATLAB is available in both **EDU** and **COM** **variant** of the COMSOL release. On the clusters 1 commercial (**COM**) license and the 5 educational (**EDU**) licenses of LiveLink for MATLAB (please see the [ISV Licenses](../../../anselm/software/isv_licenses/)) are available. Following example shows how to start COMSOL model from MATLAB via LiveLink in the interactive mode. -```bash +```console $ xhost + $ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ppn=24 $ module load MATLAB diff --git a/docs.it4i/salomon/software/debuggers/Introduction.md b/docs.it4i/salomon/software/debuggers/Introduction.md index a5c9cfb60154fbaf13faebaf15a508597b40703f..de8c00c66c96b98980ac789d7bf395fbd5eb1c99 100644 --- a/docs.it4i/salomon/software/debuggers/Introduction.md +++ b/docs.it4i/salomon/software/debuggers/Introduction.md @@ -10,9 +10,9 @@ Intel debugger is no longer available since Parallel Studio version 2015 The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. -```bash - $ module load intel - $ idb +```console +$ ml intel +$ idb ``` Read more at the [Intel Debugger](../intel-suite/intel-debugger/) page. @@ -21,9 +21,9 @@ Read more at the [Intel Debugger](../intel-suite/intel-debugger/) page. Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation. -```bash - $ module load Forge - $ forge +```console +$ ml Forge +$ forge ``` Read more at the [Allinea DDT](allinea-ddt/) page. @@ -32,9 +32,9 @@ Read more at the [Allinea DDT](allinea-ddt/) page. Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. -```bash - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 +```console +$ module load PerformanceReports/6.0 +$ perf-report mpirun -n 64 ./my_application argument01 argument02 ``` Read more at the [Allinea Performance Reports](allinea-performance-reports/) page. @@ -43,9 +43,9 @@ Read more at the [Allinea Performance Reports](allinea-performance-reports/) pag TotalView is a source- and machine-level debugger for multi-process, multi-threaded programs. Its wide range of tools provides ways to analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. -```bash - $ module load TotalView/8.15.4-6-linux-x86-64 - $ totalview +```console +$ ml TotalView/8.15.4-6-linux-x86-64 +$ totalview ``` Read more at the [Totalview](total-view/) page. @@ -54,7 +54,7 @@ Read more at the [Totalview](total-view/) page. Vampir is a GUI trace analyzer for traces in OTF format. -```bash +```console $ module load Vampir/8.5.0 $ vampir ``` diff --git a/docs.it4i/salomon/software/debuggers/aislinn.md b/docs.it4i/salomon/software/debuggers/aislinn.md index e1dee28b8d6d78ef7be2371afb2f8884f2b5f364..89cf7538016c004b1ba9058bcf148bbf0761eb50 100644 --- a/docs.it4i/salomon/software/debuggers/aislinn.md +++ b/docs.it4i/salomon/software/debuggers/aislinn.md @@ -49,13 +49,13 @@ The program does the following: process 0 receives two messages from anyone and To verify this program by Aislinn, we first load Aislinn itself: -```bash -$ module load aislinn +```console +$ ml aislinn ``` Now we compile the program by Aislinn implementation of MPI. There are `mpicc` for C programs and `mpicxx` for C++ programs. Only MPI parts of the verified application has to be recompiled; non-MPI parts may remain untouched. Let us assume that our program is in `test.cpp`. -```bash +```console $ mpicc -g test.cpp -o test ``` @@ -63,7 +63,7 @@ The `-g` flag is not necessary, but it puts more debugging information into the Now we run the Aislinn itself. The argument `-p 3` specifies that we want to verify our program for the case of three MPI processes -```bash +```console $ aislinn -p 3 ./test ==AN== INFO: Aislinn v0.3.0 ==AN== INFO: Found error 'Invalid write' @@ -73,8 +73,8 @@ $ aislinn -p 3 ./test Aislinn found an error and produced HTML report. To view it, we can use any browser, e.g.: -```bash - $ firefox report.html +```console +$ firefox report.html ``` At the beginning of the report there are some basic summaries of the verification. In the second part (depicted in the following picture), the error is described. diff --git a/docs.it4i/salomon/software/debuggers/allinea-ddt.md b/docs.it4i/salomon/software/debuggers/allinea-ddt.md index 41dd4c6e8266e257a425c0e7a8b54330c38ccf04..b73d21a8a33378eff3ff4236efbc42fde0b94245 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-ddt.md +++ b/docs.it4i/salomon/software/debuggers/allinea-ddt.md @@ -24,22 +24,21 @@ In case of debugging on accelerators: Load all necessary modules to compile the code. For example: -```bash - $ module load intel - $ module load impi ... or ... module load openmpi/X.X.X-icc +```console +$ ml intel +$ ml impi **or** ml OpenMPI/X.X.X-icc ``` Load the Allinea DDT module: -```bash - $ module load Forge +```console +$ module load Forge ``` Compile the code: -```bash +```console $ mpicc -g -O0 -o test_debug test.c - $ mpif90 -g -O0 -o test_debug test.f ``` @@ -56,22 +55,22 @@ Before debugging, you need to compile your code with theses flags: Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -```bash - $ ssh -X username@anselm.it4i.cz +```console +$ ssh -X username@anselm.it4i.cz ``` Other options is to access login node using VNC. Please see the detailed information on how to [use graphic user interface on Anselm](/general/accessing-the-clusters/graphical-user-interface/x-window-system/) From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command: -```bash - $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +```console +$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 ``` Then launch the debugger with the ddt command followed by the name of the executable to debug: -```bash - $ ddt test_debug +```console +$ ddt test_debug ``` A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. Command line arguments to a program can be entered to the "Arguments " box. @@ -80,16 +79,16 @@ A submission window that appears have a prefilled path to the executable to debu To start the debugging directly without the submission window, user can specify the debugging and execution parameters from the command line. For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help". -```bash - ddt -start -np 4 ./hello_debug_impi +```console +ddt -start -np 4 ./hello_debug_impi ``` ## Documentation Users can find original User Guide after loading the DDT module: -```bash - $DDTPATH/doc/userguide.pdf +```console +$DDTPATH/doc/userguide.pdf ``` [1] Discipline, Magic, Inspiration and Science: Best Practice Debugging with Allinea DDT, Workshop conducted at LLNL by Allinea on May 10, 2013, [link](https://computing.llnl.gov/tutorials/allineaDDT/index.html) diff --git a/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md b/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md index 3d0826e994bb6434b9cd0cd100249393191c03d3..2fcaee9f6ca8674f11ff92cbcf59364a074648f1 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md @@ -12,8 +12,8 @@ Our license is limited to 64 MPI processes. Allinea Performance Reports version 6.0 is available -```bash - $ module load PerformanceReports/6.0 +```console +$ module load PerformanceReports/6.0 ``` The module sets up environment variables, required for using the Allinea Performance Reports. @@ -24,8 +24,8 @@ Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi/mpi/), use the the perf report wrapper: -```bash - $ perf-report mpirun ./mympiprog.x +```console +$ perf-report mpirun ./mympiprog.x ``` The mpi program will run as usual. The perf-report creates two additional files, in \*.txt and \*.html format, containing the performance report. Note that demanding MPI codes should be run within [the queue system](../../job-submission-and-execution/). @@ -36,23 +36,24 @@ In this example, we will be profiling the mympiprog.x MPI program, using Allinea First, we allocate some nodes via the express queue: -```bash - $ qsub -q qexp -l select=2:ppn=24:mpiprocs=24:ompthreads=1 -I +```console +$ qsub -q qexp -l select=2:ppn=24:mpiprocs=24:ompthreads=1 -I qsub: waiting for job 262197.dm2 to start qsub: job 262197.dm2 ready ``` Then we load the modules and run the program the usual way: -```bash - $ module load intel impi PerfReports/6.0 - $ mpirun ./mympiprog.x +```console +$ ml intel +$ ml PerfReports/6.0 +$ mpirun ./mympiprog.x ``` Now lets profile the code: -```bash - $ perf-report mpirun ./mympiprog.x +```console +$ perf-report mpirun ./mympiprog.x ``` Performance report files [mympiprog_32p\*.txt](mympiprog_32p_2014-10-15_16-56.txt) and [mympiprog_32p\*.html](mympiprog_32p_2014-10-15_16-56.html) were created. We can see that the code is very efficient on MPI and is CPU bounded. diff --git a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md index 2fdbd18e166d3e553a8ad5719f7945f902cbd73c..192aece7e250dfb9b2938daebe83606a1f002b06 100644 --- a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md @@ -15,14 +15,14 @@ Intel *®* VTune™ Amplifier, part of Intel Parallel studio, is a GUI profiling To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on the login nodes, thus direct profiling on login nodes is not possible. By default, the kernel modules ale not loaded on compute nodes neither. In order to have the modules loaded, you need to specify vtune=version PBS resource at job submit. The version is the same as for environment module. For example to use VTune/2016_update1: -```bash - $ qsub -q qexp -A OPEN-0-0 -I -l select=1,vtune=2016_update1 +```console +$ qsub -q qexp -A OPEN-0-0 -I -l select=1,vtune=2016_update1 ``` After that, you can verify the modules sep\*, pax and vtsspp are present in the kernel : -```bash - $ lsmod | grep -e sep -e pax -e vtsspp +```console +$ lsmod | grep -e sep -e pax -e vtsspp vtsspp 362000 0 sep3_15 546657 0 pax 4312 0 @@ -30,14 +30,14 @@ After that, you can verify the modules sep\*, pax and vtsspp are present in the To launch the GUI, first load the module: -```bash - $ module add VTune/2016_update1 +```console +$ module add VTune/2016_update1 ``` and launch the GUI : -```bash - $ amplxe-gui +```console +$ amplxe-gui ``` The GUI will open in new window. Click on "New Project..." to create a new project. After clicking OK, a new window with project properties will appear. At "Application:", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "Managed code profiling mode:" select "Native" (unless you want to profile managed mode .NET/Mono applications). After clicking OK, your project is created. @@ -50,8 +50,8 @@ VTune Amplifier also allows a form of remote analysis. In this mode, data for an The command line will look like this: -```bash - /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -app-working-dir /home/sta545/tmp -- /home/sta545/tmp/sgemm +```console +/apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -app-working-dir /home/sta545/tmp -- /home/sta545/tmp/sgemm ``` Copy the line to clipboard and then you can paste it in your jobscript or in command line. After the collection is run, open the GUI once again, click the menu button in the upper right corner, and select "Open > Result...". The GUI will load the results from the run. @@ -75,14 +75,14 @@ You may also use remote analysis to collect data from the MIC and then analyze i Native launch: -```bash - $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-native:0 -collect advanced-hotspots -- /home/sta545/tmp/vect-add-mic +```console +$ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-native:0 -collect advanced-hotspots -- /home/sta545/tmp/vect-add-mic ``` Host launch: -```bash - $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-host-launch:0 -collect advanced-hotspots -- /home/sta545/tmp/sgemm +```console +$ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-host-launch:0 -collect advanced-hotspots -- /home/sta545/tmp/sgemm ``` You can obtain this command line by pressing the "Command line..." button on Analysis Type screen. diff --git a/docs.it4i/salomon/software/debuggers/total-view.md b/docs.it4i/salomon/software/debuggers/total-view.md index f4f69278ff59e8f2cd35aad8b5c79bf78a4a0171..0235c845d012f4c0f5245e7ae2c5f8d96b6efe3c 100644 --- a/docs.it4i/salomon/software/debuggers/total-view.md +++ b/docs.it4i/salomon/software/debuggers/total-view.md @@ -6,7 +6,7 @@ TotalView is a GUI-based source code multi-process, multi-thread debugger. On the cluster users can debug OpenMP or MPI code that runs up to 64 parallel processes. These limitation means that: -```bash +```console 1 user can debug up 64 processes, or 32 users can debug 2 processes, etc. ``` @@ -21,23 +21,20 @@ You can check the status of the licenses [here](https://extranet.it4i.cz/rsweb/a Load all necessary modules to compile the code. For example: -```bash - module load intel - - module load impi ... or ... module load OpenMPI/X.X.X-icc +```console + ml intel ``` Load the TotalView module: -```bash - module load TotalView/8.15.4-6-linux-x86-64 +```console + ml TotalView/8.15.4-6-linux-x86-64 ``` Compile the code: -```bash +```console mpicc -g -O0 -o test_debug test.c - mpif90 -g -O0 -o test_debug test.f ``` @@ -54,16 +51,16 @@ Before debugging, you need to compile your code with theses flags: Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -```bash - ssh -X username@salomon.it4i.cz +```console +ssh -X username@salomon.it4i.cz ``` Other options is to access login node using VNC. Please see the detailed information on how to use graphic user interface on Anselm. From the login node an interactive session with X windows forwarding (-X option) can be started by following command: -```bash - qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 +```console +$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 ``` Then launch the debugger with the totalview command followed by the name of the executable to debug. @@ -72,8 +69,8 @@ Then launch the debugger with the totalview command followed by the name of the To debug a serial code use: -```bash - totalview test_debug +```console +totalview test_debug ``` ### Debugging a Parallel Code - Option 1 @@ -83,7 +80,7 @@ To debug a parallel code compiled with **OpenMPI** you need to setup your TotalV !!! hint To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file. -```bash +```console proc mpi_auto_run_starter {loaded_id} { set starter_programs {mpirun mpiexec orterun} set executable_name [TV::symbol get $loaded_id full_pathname] @@ -105,23 +102,23 @@ To debug a parallel code compiled with **OpenMPI** you need to setup your TotalV The source code of this function can be also found in -```bash - /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl +```console +$ /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl ``` You can also add only following line to you ~/.tvdrc file instead of the entire function: -```bash -source /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl +```console +$ source /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl ``` You need to do this step only once. See also [OpenMPI FAQ entry](https://www.open-mpi.org/faq/?category=running#run-with-tv) Now you can run the parallel debugger using: -```bash - mpirun -tv -n 5 ./test_debug +```console +$ mpirun -tv -n 5 ./test_debug ``` When following dialog appears click on "Yes" @@ -138,10 +135,10 @@ Other option to start new parallel debugging session from a command line is to l The following example shows how to start debugging session with Intel MPI: -```bash - module load intel/2015b-intel-2015b impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25 TotalView/8.15.4-6-linux-x86-64 - - totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi +```console +$ ml intel +$ ml TotalView/8.15.4-6-linux-x86-64 +$ totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi ``` After running previous command you will see the same window as shown in the screenshot above. diff --git a/docs.it4i/salomon/software/debuggers/valgrind.md b/docs.it4i/salomon/software/debuggers/valgrind.md index 430118785a08bc43e67a4711396f9ac6b63c4afb..188f98502862effe90495934c6288aa64b042318 100644 --- a/docs.it4i/salomon/software/debuggers/valgrind.md +++ b/docs.it4i/salomon/software/debuggers/valgrind.md @@ -47,9 +47,9 @@ For example, lets look at this C code, which has two problems: Now, compile it with Intel compiler: -```bash - $ module add intel - $ icc -g valgrind-example.c -o valgrind-example +```console +$ module add intel +$ icc -g valgrind-example.c -o valgrind-example ``` Now, lets run it with Valgrind. The syntax is: @@ -58,8 +58,8 @@ valgrind [valgrind options] < your program binary > [your program options] If no Valgrind options are specified, Valgrind defaults to running Memcheck tool. Please refer to the Valgrind documentation for a full description of command line options. -```bash - $ valgrind ./valgrind-example +```console +$ valgrind ./valgrind-example ==12652== Memcheck, a memory error detector ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info @@ -92,8 +92,8 @@ If no Valgrind options are specified, Valgrind defaults to running Memcheck tool In the output we can see that Valgrind has detected both errors - the off-by-one memory access at line 5 and a memory leak of 40 bytes. If we want a detailed analysis of the memory leak, we need to run Valgrind with --leak-check=full option: -```bash - $ valgrind --leak-check=full ./valgrind-example +```console +$ valgrind --leak-check=full ./valgrind-example ==23856== Memcheck, a memory error detector ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info @@ -134,13 +134,13 @@ Now we can see that the memory leak is due to the malloc() at line 6. Although Valgrind is not primarily a parallel debugger, it can be used to debug parallel applications as well. When launching your parallel applications, prepend the valgrind command. For example: -```bash - $ mpirun -np 4 valgrind myapplication +```console +$ mpirun -np 4 valgrind myapplication ``` The default version without MPI support will however report a large number of false errors in the MPI library, such as: -```bash +```console ==30166== Conditional jump or move depends on uninitialised value(s) ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) @@ -181,16 +181,16 @@ Lets look at this MPI example: There are two errors - use of uninitialized memory and invalid length of the buffer. Lets debug it with valgrind : -```bash - $ module add intel impi - $ mpiicc -g valgrind-example-mpi.c -o valgrind-example-mpi - $ module add Valgrind/3.11.0-intel-2015b - $ mpirun -np 2 -env LD_PRELOAD $EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi +```console +$ module add intel impi +$ mpiicc -g valgrind-example-mpi.c -o valgrind-example-mpi +$ module add Valgrind/3.11.0-intel-2015b +$ mpirun -np 2 -env LD_PRELOAD $EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi ``` Prints this output : (note that there is output printed for every launched MPI process) -```bash +```console ==31318== Memcheck, a memory error detector ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info diff --git a/docs.it4i/salomon/software/debuggers/vampir.md b/docs.it4i/salomon/software/debuggers/vampir.md index 99053546c14b43c51d5ab7728dfa3824f2016170..852374d229d2c4f4a2e4c612c85d25b1c121faf0 100644 --- a/docs.it4i/salomon/software/debuggers/vampir.md +++ b/docs.it4i/salomon/software/debuggers/vampir.md @@ -6,11 +6,13 @@ Vampir is a commercial trace analysis and visualisation tool. It can work with t ## Installed Versions -Version 8.5.0 is currently installed as module Vampir/8.5.0 : +```console +$ ml av Vampir +``` -```bash - $ module load Vampir/8.5.0 - $ vampir & +```console +$ ml Vampir +$ vampir & ``` ## User Manual diff --git a/docs.it4i/salomon/software/intel-suite/intel-advisor.md b/docs.it4i/salomon/software/intel-suite/intel-advisor.md index 427f5c98cfccf29de4870043c08074ac1a246135..688deda17708cc23578fd50dc6063fb7716c5858 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-advisor.md +++ b/docs.it4i/salomon/software/intel-suite/intel-advisor.md @@ -16,8 +16,8 @@ Profiling is possible either directly from the GUI, or from command line. To profile from GUI, launch Advisor: -```bash - $ advixe-gui +```console +$ advixe-gui ``` Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will appear, where you can configure path to your binary, launch arguments, working directory etc. After clicking OK, the project is ready. diff --git a/docs.it4i/salomon/software/intel-suite/intel-compilers.md b/docs.it4i/salomon/software/intel-suite/intel-compilers.md index 63a05bd91e15c04afa6a3cc8d21231ba030437bc..8e2ee714f6e5c61ec8b4e3b4522a3a06fdd11f46 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-compilers.md +++ b/docs.it4i/salomon/software/intel-suite/intel-compilers.md @@ -2,28 +2,28 @@ The Intel compilers in multiple versions are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. -```bash - $ module load intel - $ icc -v - $ ifort -v +```console +$ ml intel +$ icc -v +$ ifort -v ``` The intel compilers provide for vectorization of the code, via the AVX2 instructions and support threading parallelization via OpenMP For maximum performance on the Salomon cluster compute nodes, compile your programs using the AVX2 instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance -```bash - $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x +```console +$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x ``` In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggresive loop optimizations (-O3) and vectorization (-xCORE-AVX2) The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX2 vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch. -```bash - $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x +```console +$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x ``` Read more at <https://software.intel.com/en-us/intel-cplusplus-compiler-16.0-user-and-reference-guide> diff --git a/docs.it4i/salomon/software/intel-suite/intel-debugger.md b/docs.it4i/salomon/software/intel-suite/intel-debugger.md index d0fef6ab7fbe2e50e8e7f8238585521bb5cb9695..15788c798785390777016856b8ffcc111227c1d2 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-debugger.md +++ b/docs.it4i/salomon/software/intel-suite/intel-debugger.md @@ -6,31 +6,30 @@ IDB is no longer available since Intel Parallel Studio 2015 The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use [X display](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/) for running the GUI. -```bash - $ module load intel/2014.06 - $ module load Java - $ idb +```console +$ ml intel +$ ml Java +$ idb ``` The debugger may run in text mode. To debug in text mode, use -```bash - $ idbc +```console +$ idbc ``` To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in [the GUI section](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/) Example: -```bash - $ qsub -q qexp -l select=1:ncpus=24 -X -I +```console +$ qsub -q qexp -l select=1:ncpus=24 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19654.srv11 ready - - $ module load intel - $ module load Java - $ icc -O0 -g myprog.c -o myprog.x - $ idb ./myprog.x +$ ml intel +$ ml Java +$ icc -O0 -g myprog.c -o myprog.x +$ idb ./myprog.x ``` In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger interactively on the myprog.x executable. The GUI access is via X11 port forwarding provided by the PBS workload manager. @@ -43,13 +42,12 @@ In this example, we allocate 1 full compute node, compile program myprog.c with For debugging small number of MPI ranks, you may execute and debug each rank in separate xterm terminal (do not forget the [X display](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/)). Using Intel MPI, this may be done in following way: -```bash - $ qsub -q qexp -l select=2:ncpus=24 -X -I +```console +$ qsub -q qexp -l select=2:ncpus=24 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready - - $ module load intel impi - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x +$ ml intel +$ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x ``` In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of mympiprog.x application. The xterm will pop up for each rank, with idb prompt ready. The example is not limited to use of Intel MPI @@ -58,13 +56,12 @@ In this example, we allocate 2 full compute node, run xterm on each node and sta Run the idb debugger from within the MPI debug option. This will cause the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI: -```bash - $ qsub -q qexp -l select=2:ncpus=24 -X -I +```console +$ qsub -q qexp -l select=2:ncpus=24 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready - - $ module load intel impi - $ mpirun -n 48 -idb ./mympiprog.x +$ ml intel +$ mpirun -n 48 -idb ./mympiprog.x ``` ### Debugging Multithreaded Application diff --git a/docs.it4i/salomon/software/intel-suite/intel-inspector.md b/docs.it4i/salomon/software/intel-suite/intel-inspector.md index 6231a65347abc13d442aea0586d6003ac7d3c798..bd298923813d786c7620c751a3c267983bb2a48d 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-inspector.md +++ b/docs.it4i/salomon/software/intel-suite/intel-inspector.md @@ -18,8 +18,8 @@ Debugging is possible either directly from the GUI, or from command line. To debug from GUI, launch Inspector: -```bash - $ inspxe-gui & +```console +$ inspxe-gui & ``` Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will appear, where you can configure path to your binary, launch arguments, working directory etc. After clicking OK, the project is ready. diff --git a/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md index ead2008dc115bd5b8d7d76a623e9fe22b9161d56..60628eed0744d4305f79f4b77ff2f4de8e11c10d 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md @@ -6,8 +6,8 @@ Intel Integrated Performance Primitives, version 9.0.1, compiled for AVX2 vector Check out IPP before implementing own math functions for data processing, it is likely already there. -```bash - $ module load ipp +```console +$ ml ipp ``` The module sets up environment variables, required for linking and running ipp enabled applications. @@ -57,20 +57,18 @@ The module sets up environment variables, required for linking and running ipp e Compile above example, using any compiler and the ipp module. -```bash - $ module load intel - $ module load ipp - - $ icc testipp.c -o testipp.x -lippi -lipps -lippcore +```console +$ ml intel +$ ml ipp +$ icc testipp.c -o testipp.x -lippi -lipps -lippcore ``` You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable -```bash - $ module load intel - $ module load ipp - - $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore +```console +$ ml intel +$ ml ipp +$ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore ``` ## Code Samples and Documentation diff --git a/docs.it4i/salomon/software/intel-suite/intel-mkl.md b/docs.it4i/salomon/software/intel-suite/intel-mkl.md index 322492010827e5dc2cc63d6ccd7cb3452f1a4214..13bae44b6d1aa16d39cb8207da60c04fbe287420 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-mkl.md +++ b/docs.it4i/salomon/software/intel-suite/intel-mkl.md @@ -17,8 +17,8 @@ For details see the [Intel MKL Reference Manual](http://software.intel.com/sites Intel MKL version 11.2.3.187 is available on the cluster -```bash - $ module load imkl +```console +$ module load imkl ``` The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $CPATH, $LD_LIBRARY_PATH and $MKL_EXAMPLES @@ -40,8 +40,8 @@ Linking Intel MKL libraries may be complex. Intel [mkl link line advisor](http:/ You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line: -```bash - $ icc .... -Wl,-rpath=$LIBRARY_PATH ... +```console +$ icc .... -Wl,-rpath=$LIBRARY_PATH ... ``` ### Threading @@ -50,9 +50,9 @@ Advantage in using Intel MKL library is that it brings threaded parallelization For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS -```bash - $ export OMP_NUM_THREADS=24 - $ export KMP_AFFINITY=granularity=fine,compact,1,0 +```console +$ export OMP_NUM_THREADS=24 +$ export KMP_AFFINITY=granularity=fine,compact,1,0 ``` The application will run with 24 threads with affinity optimized for fine grain parallelization. @@ -63,50 +63,45 @@ Number of examples, demonstrating use of the Intel MKL library and its linking i ### Working With Examples -```bash - $ module load intel - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ make sointel64 function=cblas_dgemm +```console +$ module load intel +$ module load imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ make sointel64 function=cblas_dgemm ``` In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on clusters. ### Example: MKL and Intel Compiler -```bash - $ module load intel - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - $ - $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x - $ ./cblas_dgemmx.x data/cblas_dgemmx.d +```console +$ module load intel +$ module load imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ +$ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x +$ ./cblas_dgemmx.x data/cblas_dgemmx.d ``` In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to: -```bash - $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 +```console +$ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 ``` In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation. ### Example: Intel MKL and GNU Compiler -```bash - $ module load GCC - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm - - $ ./cblas_dgemmx.x data/cblas_dgemmx.d +```console +$ module load GCC +$ module load imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm +$ ./cblas_dgemmx.x data/cblas_dgemmx.d ``` In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation. diff --git a/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md b/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md index 4b1c9308957a43fafafb8f5c1280c11ba2bf81a1..b22274a0e0a4c32942b15ba90244621eba21aa54 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md +++ b/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md @@ -17,10 +17,10 @@ Intel Parallel Studio XE The Intel compilers version 131.3 are available, via module iccifort/2013.5.192-GCC-4.8.3. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. -```bash - $ module load intel - $ icc -v - $ ifort -v +```console +$ ml intel +$ icc -v +$ ifort -v ``` Read more at the [Intel Compilers](intel-compilers/) page. @@ -31,9 +31,9 @@ IDB is no longer available since Parallel Studio 2015. The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. -```bash - $ module load intel - $ idb +```console +$ ml intel +$ idb ``` Read more at the [Intel Debugger](intel-debugger/) page. @@ -42,8 +42,8 @@ Read more at the [Intel Debugger](intel-debugger/) page. Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. -```bash - $ module load imkl +```console +$ ml imkl ``` Read more at the [Intel MKL](intel-mkl/) page. @@ -52,8 +52,8 @@ Read more at the [Intel MKL](intel-mkl/) page. Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. -```bash - $ module load ipp +```console +$ ml ipp ``` Read more at the [Intel IPP](intel-integrated-performance-primitives/) page. @@ -62,8 +62,8 @@ Read more at the [Intel IPP](intel-integrated-performance-primitives/) page. Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. -```bash - $ module load tbb +```console +$ ml tbb ``` Read more at the [Intel TBB](intel-tbb/) page. diff --git a/docs.it4i/salomon/software/intel-suite/intel-tbb.md b/docs.it4i/salomon/software/intel-suite/intel-tbb.md index 94e32f39073b41801f20391b04cc5081f99649f7..59976aa7ef31d2e97e9799ced80578be11a2d8ab 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-tbb.md +++ b/docs.it4i/salomon/software/intel-suite/intel-tbb.md @@ -4,10 +4,10 @@ Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. The tasks are executed by a runtime scheduler and may be offloaded to [MIC accelerator](../intel-xeon-phi/). -Intel TBB version 4.3.5.187 is available on the cluster. +Intel is available on the cluster. -```bash - $ module load tbb +```console +$ ml av tbb ``` The module sets up environment variables, required for linking and running tbb enabled applications. @@ -18,21 +18,21 @@ Link the tbb library, using -ltbb Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory. -```bash - $ module load intel - $ module load tbb - $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ - $ cd /tmp/parallel_reduce/primes - $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb - $ ./primes.x +```console +$ ml intel +$ ml tbb +$ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ +$ cd /tmp/parallel_reduce/primes +$ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb +$ ./primes.x ``` In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime numbers. You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable. -```bash - $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb +```console +$ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb ``` ## Further Reading diff --git a/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md b/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md index 5d4513d306d1b9a4bf159c71231c9677cc2b8165..ab8194b5aaab951ade7374f9e8d4862ce14ff3b9 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md +++ b/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md @@ -12,9 +12,9 @@ Currently on Salomon is version 9.1.2.024 available as module itac/9.1.2.024 ITAC can collect traces from applications that are using Intel MPI. To generate a trace, simply add -trace option to your mpirun command : -```bash - $ module load itac/9.1.2.024 - $ mpirun -trace myapp +```console +$ ml itac/9.1.2.024 +$ mpirun -trace myapp ``` The trace will be saved in file myapp.stf in the current directory. @@ -23,9 +23,9 @@ The trace will be saved in file myapp.stf in the current directory. To view and analyze the trace, open ITAC GUI in a [graphical environment](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/): -```bash - $ module load itac/9.1.2.024 - $ traceanalyzer +```console +$ module load itac/9.1.2.024 +$ traceanalyzer ``` The GUI will launch and you can open the produced `*`.stf file. diff --git a/docs.it4i/salomon/software/intel-xeon-phi.md b/docs.it4i/salomon/software/intel-xeon-phi.md index 49746016bc8f96222d8c3bc125e7bf5cfea06a71..99c3c78be0fe95b654412748ef4fcefce6d7a9d9 100644 --- a/docs.it4i/salomon/software/intel-xeon-phi.md +++ b/docs.it4i/salomon/software/intel-xeon-phi.md @@ -2,150 +2,196 @@ ## Guide to Intel Xeon Phi Usage -Intel Xeon Phi can be programmed in several modes. The default mode on Anselm is offload mode, but all modes described in this document are supported. +Intel Xeon Phi accelerator can be programmed in several modes. The default mode on the cluster is offload mode, but all modes described in this document are supported. ## Intel Utilities for Xeon Phi To get access to a compute node with Intel Xeon Phi accelerator, use the PBS interactive session -```bash - $ qsub -I -q qmic -A NONE-0-0 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 ``` -To set up the environment module "Intel" has to be loaded +To set up the environment module "intel" has to be loaded, without specifying the version, default version is loaded (at time of writing this, it's 2015b) -```bash - $ module load intel/13.5.192 +```console + $ ml intel ``` Information about the hardware can be obtained by running the micinfo program on the host. -```bash +```console $ /usr/bin/micinfo ``` -The output of the "micinfo" utility executed on one of the Anselm node is as follows. (note: to get PCIe related details the command has to be run with root privileges) - -```bash - MicInfo Utility Log - - Created Mon Jul 22 00:23:50 2013 - - System Info - HOST OS : Linux - OS Version : 2.6.32-279.5.2.bl6.Bull.33.x86_64 - Driver Version : 6720-15 - MPSS Version : 2.1.6720-15 - Host Physical Memory : 98843 MB - - Device No: 0, Device Name: mic0 - - Version - Flash Version : 2.1.03.0386 - SMC Firmware Version : 1.15.4830 - SMC Boot Loader Version : 1.8.4326 - uOS Version : 2.6.38.8-g2593b11 - Device Serial Number : ADKC30102482 - - Board - Vendor ID : 0x8086 - Device ID : 0x2250 - Subsystem ID : 0x2500 - Coprocessor Stepping ID : 3 - PCIe Width : x16 - PCIe Speed : 5 GT/s - PCIe Max payload size : 256 bytes - PCIe Max read req size : 512 bytes - Coprocessor Model : 0x01 - Coprocessor Model Ext : 0x00 - Coprocessor Type : 0x00 - Coprocessor Family : 0x0b - Coprocessor Family Ext : 0x00 - Coprocessor Stepping : B1 - Board SKU : B1PRQ-5110P/5120D - ECC Mode : Enabled - SMC HW Revision : Product 225W Passive CS - - Cores - Total No of Active Cores : 60 - Voltage : 1032000 uV - Frequency : 1052631 kHz - - Thermal - Fan Speed Control : N/A - Fan RPM : N/A - Fan PWM : N/A - Die Temp : 49 C - - GDDR - GDDR Vendor : Elpida - GDDR Version : 0x1 - GDDR Density : 2048 Mb - GDDR Size : 7936 MB - GDDR Technology : GDDR5 - GDDR Speed : 5.000000 GT/s - GDDR Frequency : 2500000 kHz - GDDR Voltage : 1501000 uV +The output of the "micinfo" utility executed on one of the cluster node is as follows. (note: to get PCIe related details the command has to be run with root privileges) + +```console +MicInfo Utility Log +Created Mon Aug 17 13:55:59 2015 + + + System Info + HOST OS : Linux + OS Version : 2.6.32-504.16.2.el6.x86_64 + Driver Version : 3.4.1-1 + MPSS Version : 3.4.1 + Host Physical Memory : 131930 MB + +Device No: 0, Device Name: mic0 + + Version + Flash Version : 2.1.02.0390 + SMC Firmware Version : 1.16.5078 + SMC Boot Loader Version : 1.8.4326 + uOS Version : 2.6.38.8+mpss3.4.1 + Device Serial Number : ADKC44601414 + + Board + Vendor ID : 0x8086 + Device ID : 0x225c + Subsystem ID : 0x7d95 + Coprocessor Stepping ID : 2 + PCIe Width : x16 + PCIe Speed : 5 GT/s + PCIe Max payload size : 256 bytes + PCIe Max read req size : 512 bytes + Coprocessor Model : 0x01 + Coprocessor Model Ext : 0x00 + Coprocessor Type : 0x00 + Coprocessor Family : 0x0b + Coprocessor Family Ext : 0x00 + Coprocessor Stepping : C0 + Board SKU : C0PRQ-7120 P/A/X/D + ECC Mode : Enabled + SMC HW Revision : Product 300W Passive CS + + Cores + Total No of Active Cores : 61 + Voltage : 1007000 uV + Frequency : 1238095 kHz + + Thermal + Fan Speed Control : N/A + Fan RPM : N/A + Fan PWM : N/A + Die Temp : 60 C + + GDDR + GDDR Vendor : Samsung + GDDR Version : 0x6 + GDDR Density : 4096 Mb + GDDR Size : 15872 MB + GDDR Technology : GDDR5 + GDDR Speed : 5.500000 GT/s + GDDR Frequency : 2750000 kHz + GDDR Voltage : 1501000 uV + +Device No: 1, Device Name: mic1 + + Version + Flash Version : 2.1.02.0390 + SMC Firmware Version : 1.16.5078 + SMC Boot Loader Version : 1.8.4326 + uOS Version : 2.6.38.8+mpss3.4.1 + Device Serial Number : ADKC44500454 + + Board + Vendor ID : 0x8086 + Device ID : 0x225c + Subsystem ID : 0x7d95 + Coprocessor Stepping ID : 2 + PCIe Width : x16 + PCIe Speed : 5 GT/s + PCIe Max payload size : 256 bytes + PCIe Max read req size : 512 bytes + Coprocessor Model : 0x01 + Coprocessor Model Ext : 0x00 + Coprocessor Type : 0x00 + Coprocessor Family : 0x0b + Coprocessor Family Ext : 0x00 + Coprocessor Stepping : C0 + Board SKU : C0PRQ-7120 P/A/X/D + ECC Mode : Enabled + SMC HW Revision : Product 300W Passive CS + + Cores + Total No of Active Cores : 61 + Voltage : 998000 uV + Frequency : 1238095 kHz + + Thermal + Fan Speed Control : N/A + Fan RPM : N/A + Fan PWM : N/A + Die Temp : 59 C + + GDDR + GDDR Vendor : Samsung + GDDR Version : 0x6 + GDDR Density : 4096 Mb + GDDR Size : 15872 MB + GDDR Technology : GDDR5 + GDDR Speed : 5.500000 GT/s + GDDR Frequency : 2750000 kHz + GDDR Voltage : 1501000 uV ``` ## Offload Mode To compile a code for Intel Xeon Phi a MPSS stack has to be installed on the machine where compilation is executed. Currently the MPSS stack is only installed on compute nodes equipped with accelerators. -```bash - $ qsub -I -q qmic -A NONE-0-0 - $ module load intel/13.5.192 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ ml intel ``` For debugging purposes it is also recommended to set environment variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher number means more debugging information. -```bash +```console export OFFLOAD_REPORT=3 ``` -A very basic example of code that employs offload programming technique is shown in the next listing. - -!!! note - This code is sequential and utilizes only single core of the accelerator. +A very basic example of code that employs offload programming technique is shown in the next listing. Please note that this code is sequential and utilizes only single core of the accelerator. -```bash - $ vim source-offload.cpp +```console +$ cat source-offload.cpp - #include <iostream> +#include <iostream> - int main(int argc, char* argv[]) - { - const int niter = 100000; - double result = 0; +int main(int argc, char* argv[]) +{ + const int niter = 100000; + double result = 0; - #pragma offload target(mic) - for (int i = 0; i < niter; ++i) { - const double t = (i + 0.5) / niter; - result += 4.0 / (t * t + 1.0); - } - result /= niter; - std::cout << "Pi ~ " << result << 'n'; + #pragma offload target(mic) + for (int i = 0; i < niter; ++i) { + const double t = (i + 0.5) / niter; + result += 4.0 / (t * t + 1.0); } + result /= niter; + std::cout << "Pi ~ " << result << '\n'; +} ``` To compile a code using Intel compiler run -```bash - $ icc source-offload.cpp -o bin-offload +```console +$ icc source-offload.cpp -o bin-offload ``` To execute the code, run the following command on the host -```bash - ./bin-offload +```console +$ ./bin-offload ``` ### Parallelization in Offload Mode Using OpenMP One way of paralelization a code for Xeon Phi is using OpenMP directives. The following example shows code for parallel vector addition. -```bash - $ vim ./vect-add +```console +$ cat ./vect-add #include <stdio.h> @@ -224,10 +270,9 @@ One way of paralelization a code for Xeon Phi is using OpenMP directives. The fo During the compilation Intel compiler shows which loops have been vectorized in both host and accelerator. This can be enabled with compiler option "-vec-report2". To compile and execute the code run -```bash - $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add - - $ ./vect-add +```console +$ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add +$ ./vect-add ``` Some interesting compiler flags useful not only for code debugging are: @@ -244,7 +289,8 @@ Some interesting compiler flags useful not only for code debugging are: Intel MKL includes an Automatic Offload (AO) feature that enables computationally intensive MKL functions called in user code to benefit from attached Intel Xeon Phi coprocessors automatically and transparently. -Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of controls is listed [here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). +!!! note + Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of controls is listed [here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). The Automatic Offload may be enabled by either an MKL function call within the code: @@ -254,7 +300,7 @@ The Automatic Offload may be enabled by either an MKL function call within the c or by setting environment variable -```bash +```console $ export MKL_MIC_ENABLE=1 ``` @@ -264,68 +310,68 @@ To get more information about automatic offload please refer to "[Using Intel® At first get an interactive PBS session on a node with MIC accelerator and load "intel" module that automatically loads "mkl" module as well. -```bash - $ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 - $ module load intel +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ ml intel ``` -Following example show how to automatically offload an SGEMM (single precision - general matrix multiply) function to MIC coprocessor. The code can be copied to a file and compiled without any necessary modification. - -```bash - $ vim sgemm-ao-short.c +The code can be copied to a file and compiled without any necessary modification. - #include <stdio.h> - #include <stdlib.h> - #include <malloc.h> - #include <stdint.h> +```console +$ vim sgemm-ao-short.c - #include "mkl.h" +#include <stdio.h> +#include <stdlib.h> +#include <malloc.h> +#include <stdint.h> - int main(int argc, char **argv) - { - float *A, *B, *C; /* Matrices */ +#include "mkl.h" - MKL_INT N = 2560; /* Matrix dimensions */ - MKL_INT LD = N; /* Leading dimension */ - int matrix_bytes; /* Matrix size in bytes */ - int matrix_elements; /* Matrix size in elements */ +int main(int argc, char **argv) +{ + float *A, *B, *C; /* Matrices */ - float alpha = 1.0, beta = 1.0; /* Scaling factors */ - char transa = 'N', transb = 'N'; /* Transposition options */ + MKL_INT N = 2560; /* Matrix dimensions */ + MKL_INT LD = N; /* Leading dimension */ + int matrix_bytes; /* Matrix size in bytes */ + int matrix_elements; /* Matrix size in elements */ - int i, j; /* Counters */ + float alpha = 1.0, beta = 1.0; /* Scaling factors */ + char transa = 'N', transb = 'N'; /* Transposition options */ - matrix_elements = N * N; - matrix_bytes = sizeof(float) * matrix_elements; + int i, j; /* Counters */ - /* Allocate the matrices */ - A = malloc(matrix_bytes); B = malloc(matrix_bytes); C = malloc(matrix_bytes); + matrix_elements = N * N; + matrix_bytes = sizeof(float) * matrix_elements; - /* Initialize the matrices */ - for (i = 0; i < matrix_elements; i++) { - A[i] = 1.0; B[i] = 2.0; C[i] = 0.0; - } + /* Allocate the matrices */ + A = malloc(matrix_bytes); B = malloc(matrix_bytes); C = malloc(matrix_bytes); - printf("Computing SGEMM on the hostn"); - sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); + /* Initialize the matrices */ + for (i = 0; i < matrix_elements; i++) { + A[i] = 1.0; B[i] = 2.0; C[i] = 0.0; + } - printf("Enabling Automatic Offloadn"); - /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */ - mkl_mic_enable(); + printf("Computing SGEMM on the host\n"); + sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); - int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */ - printf("Automatic Offload enabled: %d MIC devices presentn", ndevices); + printf("Enabling Automatic Offload\n"); + /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */ + mkl_mic_enable(); + + int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */ + printf("Automatic Offload enabled: %d MIC devices present\n", ndevices); - printf("Computing SGEMM with automatic workdivisionn"); - sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); + printf("Computing SGEMM with automatic workdivision\n"); + sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); - /* Free the matrix memory */ - free(A); free(B); free(C); + /* Free the matrix memory */ + free(A); free(B); free(C); - printf("Donen"); + printf("Done\n"); - return 0; - } + return 0; +} ``` !!! note @@ -333,31 +379,74 @@ Following example show how to automatically offload an SGEMM (single precision - To compile a code using Intel compiler use: -```bash - $ icc -mkl sgemm-ao-short.c -o sgemm +```console +$ icc -mkl sgemm-ao-short.c -o sgemm ``` For debugging purposes enable the offload report to see more information about automatic offloading. -```bash - $ export OFFLOAD_REPORT=2 +```console +$ export OFFLOAD_REPORT=2 ``` The output of a code should look similar to following listing, where lines starting with [MKL] are generated by offload reporting: -```bash - Computing SGEMM on the host - Enabling Automatic Offload - Automatic Offload enabled: 1 MIC devices present - Computing SGEMM with automatic workdivision - [MKL] [MIC --] [AO Function] SGEMM - [MKL] [MIC --] [AO SGEMM Workdivision] 0.00 1.00 - [MKL] [MIC 00] [AO SGEMM CPU Time] 0.463351 seconds - [MKL] [MIC 00] [AO SGEMM MIC Time] 0.179608 seconds - [MKL] [MIC 00] [AO SGEMM CPU->MIC Data] 52428800 bytes - [MKL] [MIC 00] [AO SGEMM MIC->CPU Data] 26214400 bytes - Done -``` +```console +[user@r31u03n799 ~]$ ./sgemm +Computing SGEMM on the host +Enabling Automatic Offload +Automatic Offload enabled: 2 MIC devices present +Computing SGEMM with automatic workdivision +[MKL] [MIC --] [AO Function] SGEMM +[MKL] [MIC --] [AO SGEMM Workdivision] 0.44 0.28 0.28 +[MKL] [MIC 00] [AO SGEMM CPU Time] 0.252427 seconds +[MKL] [MIC 00] [AO SGEMM MIC Time] 0.091001 seconds +[MKL] [MIC 00] [AO SGEMM CPU->MIC Data] 34078720 bytes +[MKL] [MIC 00] [AO SGEMM MIC->CPU Data] 7864320 bytes +[MKL] [MIC 01] [AO SGEMM CPU Time] 0.252427 seconds +[MKL] [MIC 01] [AO SGEMM MIC Time] 0.094758 seconds +[MKL] [MIC 01] [AO SGEMM CPU->MIC Data] 34078720 bytes +[MKL] [MIC 01] [AO SGEMM MIC->CPU Data] 7864320 bytes +Done +``` + +!!! note "" + Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of controls is listed [here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). + +### Automatic offload example #2 + +In this example, we will demonstrate automatic offload control via an environment vatiable MKL_MIC_ENABLE. The function DGEMM will be offloaded. + +At first get an interactive PBS session on a node with MIC accelerator. + +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +``` + +Once in, we enable the offload and run the Octave software. In octave, we generate two large random matrices and let them multiply together. + +```console +$ export MKL_MIC_ENABLE=1 +$ export OFFLOAD_REPORT=2 +$ ml Octave/3.8.2-intel-2015b +$ octave -q +octave:1> A=rand(10000); +octave:2> B=rand(10000); +octave:3> C=A*B; +[MKL] [MIC --] [AO Function] DGEMM +[MKL] [MIC --] [AO DGEMM Workdivision] 0.14 0.43 0.43 +[MKL] [MIC 00] [AO DGEMM CPU Time] 3.814714 seconds +[MKL] [MIC 00] [AO DGEMM MIC Time] 2.781595 seconds +[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 1145600000 bytes +[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 1382400000 bytes +[MKL] [MIC 01] [AO DGEMM CPU Time] 3.814714 seconds +[MKL] [MIC 01] [AO DGEMM MIC Time] 2.843016 seconds +[MKL] [MIC 01] [AO DGEMM CPU->MIC Data] 1145600000 bytes +[MKL] [MIC 01] [AO DGEMM MIC->CPU Data] 1382400000 bytes +octave:4> exit +``` + +On the example above we observe, that the DGEMM function workload was split over CPU, MIC 0 and MIC 1, in the ratio 0.14 0.43 0.43. The matrix multiplication was done on the CPU, accelerated by two Xeon Phi accelerators. ## Native Mode @@ -365,10 +454,9 @@ In the native mode a program is executed directly on Intel Xeon Phi without invo To compile a code user has to be connected to a compute with MIC and load Intel compilers module. To get an interactive session on a compute node with an Intel Xeon Phi and load the module use following commands: -```bash - $ qsub -I -q qmic -A NONE-0-0 - - $ module load intel/13.5.192 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ ml intel ``` !!! note @@ -376,105 +464,108 @@ To compile a code user has to be connected to a compute with MIC and load Intel To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: -```bash - $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host +```console +$ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host ``` To run this code on host, use: -```bash - $ ./vect-add-host +```console +$ ./vect-add-host ``` The second example shows how to compile the same code for Intel Xeon Phi: -```bash - $ icc -mmic -fopenmp vect-add.c -o vect-add-mic +```console +$ icc -mmic -fopenmp vect-add.c -o vect-add-mic ``` ### Execution of the Program in Native Mode on Intel Xeon Phi The user access to the Intel Xeon Phi is through the SSH. Since user home directories are mounted using NFS on the accelerator, users do not have to copy binary files or libraries between the host and accelerator. +Get the PATH of MIC enabled libraries for currently used Intel Compiler (here was icc/2015.3.187-GNU-5.1.0-2.25 used): + +```console +$ echo $MIC_LD_LIBRARY_PATH +/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic +``` + To connect to the accelerator run: -```bash - $ ssh mic0 +```console +$ ssh mic0 ``` If the code is sequential, it can be executed directly: -```bash - mic0 $ ~/path_to_binary/vect-add-seq-mic +```console +mic0 $ ~/path_to_binary/vect-add-seq-mic ``` If the code is parallelized using OpenMP a set of additional libraries is required for execution. To locate these libraries new path has to be added to the LD_LIBRARY_PATH environment variable prior to the execution: -```bash - mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH +```console +mic0 $ export LD_LIBRARY_PATH=/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH ``` !!! note - The path exported contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. + Please note that the path exported in the previous example contains path to a specific compiler (here the version is 2015.3.187-GNU-5.1.0-2.25). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: !!! note - /apps/intel/composer_xe_2013.5.192/compiler/lib/mic + /apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic - - libiomp5.so - - libimf.so - - libsvml.so - - libirng.so - - libintlc.so.5 + libiomp5.so + libimf.so + libsvml.so + libirng.so + libintlc.so.5 Finally, to run the compiled code use: -```bash - $ ~/path_to_binary/vect-add-mic -``` - ## OpenCL OpenCL (Open Computing Language) is an open standard for general-purpose parallel programming for diverse mix of multi-core CPUs, GPU coprocessors, and other parallel processors. OpenCL provides a flexible execution model and uniform programming environment for software developers to write portable code for systems running on both the CPU and graphics processors or accelerators like the Intel® Xeon Phi. -On Anselm OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. +On Salomon OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. -```bash - module load opencl-sdk opencl-rt +```console +module load opencl-sdk opencl-rt ``` Always load "opencl-sdk" (providing devel files like headers) and "opencl-rt" (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load "opencl-rt" for running your compiled code. There are two basic examples of OpenCL code in the following directory: -```bash - /apps/intel/opencl-examples/ +```console +/apps/intel/opencl-examples/ ``` First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC, and prints basic information about the capabilities of it. -```bash - /apps/intel/opencl-examples/CapsBasic/capsbasic +```console +/apps/intel/opencl-examples/CapsBasic/capsbasic ``` -To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Anselm. +To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Salomon. -```bash - $ cp /apps/intel/opencl-examples/CapsBasic/* . - $ qsub -I -q qmic -A NONE-0-0 - $ make +```console +$ cp /apps/intel/opencl-examples/CapsBasic/* . +$ qsub -I -q qmic -A NONE-0-0 +$ make ``` The compilation command for this example is: -```bash - $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ +```console +$ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ ``` After executing the complied binary file, following output should be displayed. -```bash +```console ./capsbasic Number of available platforms: 1 @@ -505,22 +596,22 @@ After executing the complied binary file, following output should be displayed. The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. -```bash - $ cp -r /apps/intel/opencl-examples/* . - $ qsub -I -q qmic -A NONE-0-0 - $ cd GEMM - $ make +```console +$ cp -r /apps/intel/opencl-examples/* . +$ qsub -I -q qmic -A NONE-0-0 +$ cd GEMM +$ make ``` The compilation command for this example is: -```bash - $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ +```console +$ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ ``` To see the performance of Intel Xeon Phi performing the DGEMM run the example as follows: -```bash +```console ./gemm -d 1 Platforms (1): [0] Intel(R) OpenCL [Selected] @@ -547,28 +638,48 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ### Environment Setup and Compilation +To achieve best MPI performance always use following setup for Intel MPI on Xeon Phi accelerated nodes: + +```console +$ export I_MPI_FABRICS=shm:dapl +$ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 +``` + +This ensures, that MPI inside node will use SHMEM communication, between HOST and Phi the IB SCIF will be used and between different nodes or Phi's on diferent nodes a CCL-Direct proxy will be used. + +!!! note + Other FABRICS like tcp,ofa may be used (even combined with shm) but there's severe loss of performance (by order of magnitude). + Usage of single DAPL PROVIDER (e. g. I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u) will cause failure of Host<->Phi and/or Phi<->Phi communication. + Usage of the I_MPI_DAPL_PROVIDER_LIST on non-accelerated node will cause failure of any MPI communication, since those nodes don't have SCIF device and there's no CCL-Direct proxy runnig. + Again an MPI code for Intel Xeon Phi has to be compiled on a compute node with accelerator and MPSS software stack installed. To get to a compute node with accelerator use: -```bash - $ qsub -I -q qmic -A NONE-0-0 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 ``` The only supported implementation of MPI standard for Intel Xeon Phi is Intel MPI. To setup a fully functional development environment a combination of Intel compiler and Intel MPI has to be used. On a host load following modules before compilation: -```bash - $ module load intel/13.5.192 impi/4.1.1.036 +```console +$ module load intel ``` To compile an MPI code for host use: -```bash - $ mpiicc -xhost -o mpi-test mpi-test.c +```console +$ mpiicc -xhost -o mpi-test mpi-test.c ``` To compile the same code for Intel Xeon Phi architecture use: -```bash - $ mpiicc -mmic -o mpi-test-mic mpi-test.c +```console +$ mpiicc -mmic -o mpi-test-mic mpi-test.c +``` + +Or, if you are using Fortran : + +```console +$ mpiifort -mmic -o mpi-test-mic mpi-test.f90 ``` An example of basic MPI version of "hello-world" example in C language, that can be executed on both host and Xeon Phi is (can be directly copy and pasted to a .c file) @@ -613,17 +724,17 @@ Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: In this case all environment variables are set by modules, so to execute the compiled MPI program on a single node, use: -```bash - $ mpirun -np 4 ./mpi-test +```console +$ mpirun -np 4 ./mpi-test ``` The output should be similar to: -```bash - Hello world from process 1 of 4 on host cn207 - Hello world from process 3 of 4 on host cn207 - Hello world from process 2 of 4 on host cn207 - Hello world from process 0 of 4 on host cn207 +```console +Hello world from process 1 of 4 on host r38u31n1000 +Hello world from process 3 of 4 on host r38u31n1000 +Hello world from process 2 of 4 on host r38u31n1000 +Hello world from process 0 of 4 on host r38u31n1000 ``` ### Coprocessor-Only Model @@ -635,18 +746,25 @@ coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a host. Similarly to execution of OpenMP programs in native mode, since the environmental module are not supported on MIC, user has to setup paths to Intel MPI libraries and binaries manually. One time setup can be done by creating a "**.profile**" file in user's home directory. This file sets up the environment on the MIC automatically once user access to the accelerator through the SSH. +At first get the LD_LIBRARY_PATH for currenty used Intel Compiler and Intel MPI: + +```console +$ echo $MIC_LD_LIBRARY_PATH +/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic/ +``` + +Use it in your ~/.profile: + ```bash - $ vim ~/.profile +PS1='[\u@\h \W]\$ ' +export PATH=/usr/bin:/usr/sbin:/bin:/sbin - PS1='[u@h W]$ ' - export PATH=/usr/bin:/usr/sbin:/bin:/sbin +#IMPI +export PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25/mic/bin/:$PATH - #OpenMP - export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH +#OpenMP (ICC, IFORT), IMKL and IMPI +export LD_LIBRARY_PATH=/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH - #Intel MPI - export LD_LIBRARY_PATH=/apps/intel/impi/4.1.1.036/mic/lib/:$LD_LIBRARY_PATH - export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` !!! note @@ -655,29 +773,29 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta To access a MIC accelerator located on a node that user is currently connected to, use: -```bash - $ ssh mic0 +```console +$ ssh mic0 ``` or in case you need specify a MIC accelerator on a particular node, use: -```bash - $ ssh cn207-mic0 +```console +$ ssh r38u31n1000-mic0 ``` To run the MPI code in parallel on multiple core of the accelerator, use: -```bash - $ mpirun -np 4 ./mpi-test-mic +```console +$ mpirun -np 4 ./mpi-test-mic ``` The output should be similar to: -```bash - Hello world from process 1 of 4 on host cn207-mic0 - Hello world from process 2 of 4 on host cn207-mic0 - Hello world from process 3 of 4 on host cn207-mic0 - Hello world from process 0 of 4 on host cn207-mic0 +```console +Hello world from process 1 of 4 on host r38u31n1000-mic0 +Hello world from process 2 of 4 on host r38u31n1000-mic0 +Hello world from process 3 of 4 on host r38u31n1000-mic0 +Hello world from process 0 of 4 on host r38u31n1000-mic0 ``` #### Execution on Host @@ -686,20 +804,20 @@ If the MPI program is launched from host instead of the coprocessor, the environ First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the environmental variable "I_MPI_MIC" -```bash - $ export I_MPI_MIC=1 +```console +$ export I_MPI_MIC=1 ``` Now the MPI program can be executed as: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic ``` or using mpirun ```bash - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic ``` !!! note @@ -708,11 +826,11 @@ or using mpirun The output should be again similar to: -```bash - Hello world from process 1 of 4 on host cn207-mic0 - Hello world from process 2 of 4 on host cn207-mic0 - Hello world from process 3 of 4 on host cn207-mic0 - Hello world from process 0 of 4 on host cn207-mic0 +```console +Hello world from process 1 of 4 on host r38u31n1000-mic0 +Hello world from process 2 of 4 on host r38u31n1000-mic0 +Hello world from process 3 of 4 on host r38u31n1000-mic0 +Hello world from process 0 of 4 on host r38u31n1000-mic0 ``` !!! hint @@ -720,166 +838,151 @@ The output should be again similar to: A simple test to see if the file is present is to execute: -```bash - $ ssh mic0 ls /bin/pmi_proxy - /bin/pmi_proxy +```console +$ ssh mic0 ls /bin/pmi_proxy + /bin/pmi_proxy ``` #### Execution on Host - MPI Processes Distributed Over Multiple Accelerators on Multiple Nodes To get access to multiple nodes with MIC accelerator, user has to use PBS to allocate the resources. To start interactive session, that allocates 2 compute nodes = 2 MIC accelerators run qsub command with following parameters: -```bash - $ qsub -I -q qmic -A NONE-0-0 -l select=2:ncpus=16 - - $ module load intel/13.5.192 impi/4.1.1.036 +```console +$ qsub -I -q qprod -l select=2:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ module load intel impi ``` This command connects user through ssh to one of the nodes immediately. To see the other nodes that have been allocated use: -```bash - $ cat $PBS_NODEFILE +```console +$ cat $PBS_NODEFILE ``` For example: -```bash - cn204.bullx - cn205.bullx +```console +r25u25n710.ib0.smc.salomon.it4i.cz +r25u26n711.ib0.smc.salomon.it4i.cz ``` -This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. +This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**r25u25n710-mic0**" and "**r25u26n711-mic0**" accelerators. !!! note At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - - to connect to the second node : `$ ssh cn205` - - to connect to the accelerator on the first node from the first node: `$ ssh cn204-mic0` or `$ ssh mic0` - - to connect to the accelerator on the second node from the first node: `$ ssh cn205-mic0` + - to connect to the second node : `$ ssh r25u26n711` + - to connect to the accelerator on the first node from the first node: `$ ssh r25u25n710-mic0` or `$ ssh mic0` + - to connect to the accelerator on the second node from the first node: `$ ssh r25u25n711-mic0` -At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" +At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC", don't forget to have correct FABRIC and PROVIDER defined. -```bash - $ export I_MPI_MIC=1 +```console +$ export I_MPI_MIC=1 +$ export I_MPI_FABRICS=shm:dapl +$ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 ``` The launch the MPI program use: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic - : -host cn205-mic0 -n 6 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \ + -host r25u25n710-mic0 -n 4 ~/mpi-test-mic \ +: -host r25u26n711-mic0 -n 6 ~/mpi-test-mic ``` or using mpirun: -```bash - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic - : -host cn205-mic0 -n 6 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH \ + -host r25u25n710-mic0 -n 4 ~/mpi-test-mic \ +: -host r25u26n711-mic0 -n 6 ~/mpi-test-mic ``` In this case four MPI processes are executed on accelerator cn204-mic and six processes are executed on accelerator cn205-mic0. The sample output (sorted after execution) is: ```bash - Hello world from process 0 of 10 on host cn204-mic0 - Hello world from process 1 of 10 on host cn204-mic0 - Hello world from process 2 of 10 on host cn204-mic0 - Hello world from process 3 of 10 on host cn204-mic0 - Hello world from process 4 of 10 on host cn205-mic0 - Hello world from process 5 of 10 on host cn205-mic0 - Hello world from process 6 of 10 on host cn205-mic0 - Hello world from process 7 of 10 on host cn205-mic0 - Hello world from process 8 of 10 on host cn205-mic0 - Hello world from process 9 of 10 on host cn205-mic0 +Hello world from process 0 of 10 on host r25u25n710-mic0 +Hello world from process 1 of 10 on host r25u25n710-mic0 +Hello world from process 2 of 10 on host r25u25n710-mic0 +Hello world from process 3 of 10 on host r25u25n710-mic0 +Hello world from process 4 of 10 on host r25u26n711-mic0 +Hello world from process 5 of 10 on host r25u26n711-mic0 +Hello world from process 6 of 10 on host r25u26n711-mic0 +Hello world from process 7 of 10 on host r25u26n711-mic0 +Hello world from process 8 of 10 on host r25u26n711-mic0 +Hello world from process 9 of 10 on host r25u26n711-mic0 ``` The same way MPI program can be executed on multiple hosts: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204 -n 4 ~/mpi-test - : -host cn205 -n 6 ~/mpi-test +```console +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \ + -host r25u25n710 -n 4 ~/mpi-test \ +: -host r25u26n711 -n 6 ~/mpi-test ``` -\###Symmetric model +### Symmetric model In a symmetric mode MPI programs are executed on both host computer(s) and MIC accelerator(s). Since MIC has a different architecture and requires different binary file produced by the Intel compiler two different files has to be compiled before MPI program is executed. In the previous section we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using mpiexec.hydra: -```bash - $ mpiexec.hydra - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -host cn205 -n 2 ~/mpi-test - : -host cn205-mic0 -n 2 ~/mpi-test-mic +```console +$ mpirun \ + -genv $MIC_LD_LIBRARY_PATH \ + -host r38u32n1001 -n 2 ~/mpi-test \ +: -host r38u32n1001-mic0 -n 2 ~/mpi-test-mic ``` -In this example the first two parameters (line 2 and 3) sets up required environment variables for execution. The third line specifies binary that is executed on host (here cn205) and the last line specifies the binary that is execute on the accelerator (here cn205-mic0). +In this example the first two parameters (line 2 and 3) sets up required environment variables for execution. The third line specifies binary that is executed on host (here r38u32n1001) and the last line specifies the binary that is execute on the accelerator (here r38u32n1001-mic0). The output of the program is: -```bash - Hello world from process 0 of 4 on host cn205 - Hello world from process 1 of 4 on host cn205 - Hello world from process 2 of 4 on host cn205-mic0 - Hello world from process 3 of 4 on host cn205-mic0 +```console +Hello world from process 0 of 4 on host r38u32n1001 +Hello world from process 1 of 4 on host r38u32n1001 +Hello world from process 2 of 4 on host r38u32n1001-mic0 +Hello world from process 3 of 4 on host r38u32n1001-mic0 ``` The execution procedure can be simplified by using the mpirun command with the machine file a a parameter. Machine file contains list of all nodes and accelerators that should used to execute MPI processes. -An example of a machine file that uses 2 >hosts (**cn205** and **cn206**) and 2 accelerators **(cn205-mic0** and **cn206-mic0**) to run 2 MPI processes on each of them: +An example of a machine file that uses 2 >hosts (**r38u32n1001** and **r38u32n1002**) and 2 accelerators **(r38u32n1001-mic0** and **r38u32n1002-mic0**) to run 2 MPI processes on each of them: -```bash - $ cat hosts_file_mix - cn205:2 - cn205-mic0:2 - cn206:2 - cn206-mic0:2 +```console +$ cat hosts_file_mix +r38u32n1001:2 +r38u32n1001-mic0:2 +r38u33n1002:2 +r38u33n1002-mic0:2 ``` In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not have to specify the names of booth binaries. In this case mpirun needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). -```bash - $ export I_MPI_MIC_POSTFIX=-mic +```console +$ export I_MPI_MIC_POSTFIX=-mic ``` To run the MPI code using mpirun and the machine file "hosts_file_mix" use: -```bash - $ mpirun - -genv I_MPI_FABRICS shm:tcp - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -machinefile hosts_file_mix - ~/mpi-test +```console +$ mpirun \ + -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \ + -machinefile hosts_file_mix \ + ~/mpi-test ``` A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: -```bash - Hello world from process 0 of 8 on host cn204 - Hello world from process 1 of 8 on host cn204 - Hello world from process 2 of 8 on host cn204-mic0 - Hello world from process 3 of 8 on host cn204-mic0 - Hello world from process 4 of 8 on host cn205 - Hello world from process 5 of 8 on host cn205 - Hello world from process 6 of 8 on host cn205-mic0 - Hello world from process 7 of 8 on host cn205-mic0 +```console +Hello world from process 0 of 8 on host r38u31n1000 +Hello world from process 1 of 8 on host r38u31n1000 +Hello world from process 2 of 8 on host r38u31n1000-mic0 +Hello world from process 3 of 8 on host r38u31n1000-mic0 +Hello world from process 4 of 8 on host r38u32n1001 +Hello world from process 5 of 8 on host r38u32n1001 +Hello world from process 6 of 8 on host r38u32n1001-mic0 +Hello world from process 7 of 8 on host r38u32n1001-mic0 ``` !!! note diff --git a/docs.it4i/salomon/software/java.md b/docs.it4i/salomon/software/java.md index 703e53fc1093cf28aeb5c80b985174784e54ad90..83c3738c0802e612ba84c25868771c44fa51a1ab 100644 --- a/docs.it4i/salomon/software/java.md +++ b/docs.it4i/salomon/software/java.md @@ -2,24 +2,24 @@ Java is available on the cluster. Activate java by loading the Java module -```bash - $ module load Java +```console +$ ml Java ``` Note that the Java module must be loaded on the compute nodes as well, in order to run java on compute nodes. Check for java version and path -```bash - $ java -version - $ which java +```console +$ java -version +$ which java ``` With the module loaded, not only the runtime environment (JRE), but also the development environment (JDK) with the compiler is available. -```bash - $ javac -version - $ which javac +```console +$ javac -version +$ which javac ``` Java applications may use MPI for inter-process communication, in conjunction with Open MPI. Read more on <http://www.open-mpi.org/faq/?category=java>. This functionality is currently not supported on Anselm cluster. In case you require the java interface to MPI, please contact [cluster support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md index 9aa54f09aa07ccde2daa1bfc5c6ff4daeab2b78b..e2633236ac6624c7a41ed56496bacb9795158901 100644 --- a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md @@ -10,16 +10,14 @@ Use the mpiexec to run the OpenMPI code. Example: -```bash - $ qsub -q qexp -l select=4:ncpus=24 -I +```console +$ qsub -q qexp -l select=4:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready - - $ pwd +$ pwd /home/username - - $ module load OpenMPI - $ mpiexec -pernode ./helloworld_mpi.x +$ ml OpenMPI +$ mpiexec -pernode ./helloworld_mpi.x Hello world! from rank 0 of 4 on host r1i0n17 Hello world! from rank 1 of 4 on host r1i0n5 Hello world! from rank 2 of 4 on host r1i0n6 @@ -33,11 +31,10 @@ Note that the executable helloworld_mpi.x must be available within the same path You need to preload the executable, if running on the local ramdisk /tmp filesystem -```bash - $ pwd +```console +$ pwd /tmp/pbs.15210.isrv5 - - $ mpiexec -pernode --preload-binary ./helloworld_mpi.x +$ mpiexec -pernode --preload-binary ./helloworld_mpi.x Hello world! from rank 0 of 4 on host r1i0n17 Hello world! from rank 1 of 4 on host r1i0n5 Hello world! from rank 2 of 4 on host r1i0n6 @@ -54,12 +51,10 @@ The mpiprocs and ompthreads parameters allow for selection of number of running Follow this example to run one MPI process per node, 24 threads per process. -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=1:ompthreads=24 -I - - $ module load OpenMPI - - $ mpiexec --bind-to-none ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=1:ompthreads=24 -I +$ ml OpenMPI +$ mpiexec --bind-to-none ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 24 threads per socket, on 4 nodes. @@ -68,12 +63,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpiexec. -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=2:ompthreads=12 -I - - $ module load OpenMPI - - $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=2:ompthreads=12 -I +$ ml OpenMPI +$ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 12 threads per socket, each process and its threads bound to a separate processor socket of the node, on 4 nodes @@ -82,12 +75,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run 24 MPI processes per node, 1 thread per process. Note the options to mpiexec. -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I - - $ module load OpenMPI - - $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I +$ ml OpenMPI +$ mpiexec -bycore -bind-to-core ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 24 MPI processes per node, single threaded. Each process is bound to separate processor core, on 4 nodes. @@ -99,21 +90,21 @@ In this example, we demonstrate recommended way to run an MPI application, using In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: -```bash - $ export GOMP_CPU_AFFINITY="0-23" +```console +$ export GOMP_CPU_AFFINITY="0-23" ``` or this one for Intel OpenMP: -```bash - $ export KMP_AFFINITY=granularity=fine,compact,1,0 +```console +$ export KMP_AFFINITY=granularity=fine,compact,1,0 ``` As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: -```bash - $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores +```console +$ export OMP_PROC_BIND=true +$ export OMP_PLACES=cores ``` ## OpenMPI Process Mapping and Binding @@ -126,7 +117,7 @@ MPI process mapping may be specified by a hostfile or rankfile input to the mpie Example hostfile -```bash +```console r1i0n17.smc.salomon.it4i.cz r1i0n5.smc.salomon.it4i.cz r1i0n6.smc.salomon.it4i.cz @@ -135,8 +126,8 @@ Example hostfile Use the hostfile to control process placement -```bash - $ mpiexec -hostfile hostfile ./helloworld_mpi.x +```console +$ mpiexec -hostfile hostfile ./helloworld_mpi.x Hello world! from rank 0 of 4 on host r1i0n17 Hello world! from rank 1 of 4 on host r1i0n5 Hello world! from rank 2 of 4 on host r1i0n6 @@ -153,7 +144,7 @@ Appropriate binding may boost performance of your application. Example rankfile -```bash +```console rank 0=r1i0n7.smc.salomon.it4i.cz slot=1:0,1 rank 1=r1i0n6.smc.salomon.it4i.cz slot=0:* rank 2=r1i0n5.smc.salomon.it4i.cz slot=1:1-2 @@ -170,7 +161,7 @@ rank 2 will be bounded to r1i0n5, socket1, core1 and core2 rank 3 will be bounded to r1i0n17, socket0 core1, socket1 core0, core1, core2 rank 4 will be bounded to r1i0n6, all cores on both sockets -```bash +```console $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x [r1i0n17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . . . . . .][B B B . . . . . . . . .] (slot list 0:1,1:0-2) [r1i0n7:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . . . . . .][B B . . . . . . . . . .] (slot list 1:0,1) @@ -192,10 +183,10 @@ It is users responsibility to provide correct number of ranks, sockets and cores In all cases, binding and threading may be verified by executing for example: -```bash - $ mpiexec -bysocket -bind-to-socket --report-bindings echo - $ mpiexec -bysocket -bind-to-socket numactl --show - $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS +```console +$ mpiexec -bysocket -bind-to-socket --report-bindings echo +$ mpiexec -bysocket -bind-to-socket numactl --show +$ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS ``` ## Changes in OpenMPI 1.8 diff --git a/docs.it4i/salomon/software/mpi/mpi.md b/docs.it4i/salomon/software/mpi/mpi.md index 411d54ddabae7b32ef32f894f2cc466e93eeb866..902fc4f7acfefd725006534a474d3b745ecddd48 100644 --- a/docs.it4i/salomon/software/mpi/mpi.md +++ b/docs.it4i/salomon/software/mpi/mpi.md @@ -15,8 +15,8 @@ MPI libraries are activated via the environment modules. Look up section modulefiles/mpi in module avail -```bash - $ module avail +```console +$ ml av ------------------------------ /apps/modules/mpi ------------------------------- impi/4.1.1.036-iccifort-2013.5.192 impi/4.1.1.036-iccifort-2013.5.192-GCC-4.8.3 @@ -35,16 +35,16 @@ There are default compilers associated with any particular MPI implementation. T Examples: -```bash - $ module load gompi/2015b +```console +$ ml gompi/2015b ``` In this example, we activate the latest OpenMPI with latest GNU compilers (OpenMPI 1.8.6 and GCC 5.1). Please see more information about toolchains in section [Environment and Modules](../../environment-and-modules/) . To use OpenMPI with the intel compiler suite, use -```bash - $ module load iompi/2015.03 +```console +$ module load iompi/2015.03 ``` In this example, the openmpi 1.8.6 using intel compilers is activated. It's used "iompi" toolchain. @@ -53,17 +53,17 @@ In this example, the openmpi 1.8.6 using intel compilers is activated. It's used After setting up your MPI environment, compile your program using one of the mpi wrappers -```bash - $ mpicc -v - $ mpif77 -v - $ mpif90 -v +```console +$ mpicc -v +$ mpif77 -v +$ mpif90 -v ``` When using Intel MPI, use the following MPI wrappers: -```bash - $ mpicc - $ mpiifort +```console +$ mpicc +$ mpiifort ``` Wrappers mpif90, mpif77 that are provided by Intel MPI are designed for gcc and gfortran. You might be able to compile MPI code by them even with Intel compilers, but you might run into problems (for example, native MIC compilation with -mmic does not work with mpif90). @@ -100,8 +100,8 @@ Example program: Compile the above example with -```bash - $ mpicc helloworld_mpi.c -o helloworld_mpi.x +```console +$ mpicc helloworld_mpi.c -o helloworld_mpi.x ``` ## Running MPI Programs diff --git a/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md b/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md index 160478b6ed3c4dbfaf7226759fab0fd8fb9ddc67..8b2a12823aee3f9ce87e8b1be3c26a4dea8d5e4e 100644 --- a/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md +++ b/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md @@ -14,28 +14,28 @@ On Anselm MPI4Py is available in standard Python modules. MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to load Python and OpenMPI modules. You can use toolchain, that loads Python and OpenMPI at once. -```bash - $ module load Python/2.7.9-foss-2015g +```console +$ ml Python/2.7.9-foss-2015g ``` ## Execution You need to import MPI to your python program. Include the following line to the python script: -```bash +```console from mpi4py import MPI ``` The MPI4Py enabled python programs [execute as any other OpenMPI](Running_OpenMPI/) code.The simpliest way is to run -```bash - $ mpiexec python <script>.py +```console +$ mpiexec python <script>.py ``` For example -```bash - $ mpiexec python hello_world.py +```console +$ mpiexec python hello_world.py ``` ## Examples @@ -83,12 +83,10 @@ For example Execute the above code as: -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I - - $ module load Python/2.7.9-foss-2015g - - $ mpiexec --map-by core --bind-to core python hello_world.py +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I +$ ml Python/2.7.9-foss-2015g + $ mpiexec --map-by core --bind-to core python hello_world.py ``` In this example, we run MPI4Py enabled code on 4 nodes, 24 cores per node (total of 96 processes), each python process is bound to a different core. More examples and documentation can be found on [MPI for Python webpage](https://pypi.python.org/pypi/mpi4py). diff --git a/docs.it4i/salomon/software/numerical-languages/introduction.md b/docs.it4i/salomon/software/numerical-languages/introduction.md index 50f083a91c52acc731fcbd0abe849904df757221..6f140ef9d0a1a33f0656a69af2c03e729b77c178 100644 --- a/docs.it4i/salomon/software/numerical-languages/introduction.md +++ b/docs.it4i/salomon/software/numerical-languages/introduction.md @@ -10,9 +10,9 @@ This section contains a collection of high-level interpreted languages, primaril MATLAB®^ is a high-level language and interactive environment for numerical computation, visualization, and programming. -```bash - $ module load MATLAB - $ matlab +```console +$ module load MATLAB +$ matlab ``` Read more at the [Matlab page](matlab/). @@ -21,9 +21,9 @@ Read more at the [Matlab page](matlab/). GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to Matlab so that most programs are easily portable. -```bash - $ module load Octave - $ octave +```console +$ module load Octave +$ octave ``` Read more at the [Octave page](octave/). @@ -32,9 +32,9 @@ Read more at the [Octave page](octave/). The R is an interpreted language and environment for statistical computing and graphics. -```bash - $ module load R - $ R +```console +$ module load R +$ R ``` Read more at the [R page](r/). diff --git a/docs.it4i/salomon/software/numerical-languages/matlab.md b/docs.it4i/salomon/software/numerical-languages/matlab.md index aec28baaedbec6491cfe8ba14a7442368dbdec17..31602bf0f5359ffc35365dcbdb867000c274b332 100644 --- a/docs.it4i/salomon/software/numerical-languages/matlab.md +++ b/docs.it4i/salomon/software/numerical-languages/matlab.md @@ -9,14 +9,14 @@ Matlab is available in versions R2015a and R2015b. There are always two variants To load the latest version of Matlab load the module -```bash - $ module load MATLAB +```console +$ module load MATLAB ``` By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use -```bash - $ module avail MATLAB +```console +$ module avail MATLAB ``` If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. @@ -27,14 +27,14 @@ Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so u To run Matlab with GUI, use -```bash - $ matlab +```console +$ matlab ``` To run Matlab in text mode, without the Matlab Desktop GUI environment, use -```bash - $ matlab -nodesktop -nosplash +```console +$ matlab -nodesktop -nosplash ``` plots, images, etc... will be still available. @@ -49,7 +49,7 @@ Delete previously used file mpiLibConf.m, we have observed crashes when using In To use Distributed Computing, you first need to setup a parallel profile. We have provided the profile for you, you can either import it in MATLAB command line: -```bash +```console > parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings') ans = @@ -67,10 +67,9 @@ With the new mode, MATLAB itself launches the workers via PBS, so you can either Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see [this page](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/). -```bash - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 +```console +$ xhost + +$ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 -l feature__matlab__MATLAB=1 ``` This qsub command example shows how to run Matlab on a single node. @@ -79,7 +78,7 @@ The second part of the command shows how to request all necessary licenses. In t Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: -```bash +```console r1i0n17$ module load MATLAB/2015a-EDU r1i0n17$ matlab & ``` @@ -115,15 +114,15 @@ This script may be submitted directly to the PBS workload manager via the qsub c Submit the jobscript using qsub -```bash - $ qsub ./jobscript +```console +$ qsub ./jobscript ``` ### Parallel Matlab Local Mode Program Example The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. -```bash +```console cluster = parcluster('local') ``` @@ -134,7 +133,7 @@ This script creates scheduler object "cluster" of type "local" that starts worke The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. -```bash +```console parpool(cluster,24); @@ -146,7 +145,7 @@ The last step is to start matlabpool with "cluster" object and correct number of The complete example showing how to use Distributed Computing Toolbox in local mode is shown here. -```bash +```console cluster = parcluster('local'); cluster @@ -179,7 +178,7 @@ This mode uses PBS scheduler to launch the parallel pool. It uses the SalomonPBS This is an example of m-script using PBS mode: -```bash +```console cluster = parcluster('SalomonPBSPro'); set(cluster, 'SubmitArguments', '-A OPEN-0-0'); set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=24'); @@ -220,7 +219,7 @@ For this method, you need to use SalomonDirect profile, import it using [the sam This is an example of m-script using direct mode: -```bash +```console parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonDirect.settings') cluster = parcluster('SalomonDirect'); set(cluster, 'NumWorkers', 48); diff --git a/docs.it4i/salomon/software/numerical-languages/octave.md b/docs.it4i/salomon/software/numerical-languages/octave.md index 6461bc4cc003b806d0f75320d58d5c9009ab5b8b..5c679dd1b87e587965d802f2845997b755254fa2 100644 --- a/docs.it4i/salomon/software/numerical-languages/octave.md +++ b/docs.it4i/salomon/software/numerical-languages/octave.md @@ -8,16 +8,16 @@ Two versions of octave are available on the cluster, via module | ---------- | ------------ | ------ | | **Stable** | Octave 3.8.2 | Octave | -```bash - $ module load Octave +```console +$ ml Octave ``` The octave on the cluster is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, octave would parallelize to 24 threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. To run octave interactively, log in with ssh -X parameter for X11 forwarding. Run octave: -```bash - $ octave +```console +$ octave ``` To run octave in batch mode, write an octave script, then write a bash jobscript and execute via the qsub command. By default, octave will use 16 threads when running MKL kernels. @@ -49,8 +49,8 @@ This script may be submitted directly to the PBS workload manager via the qsub c The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. -```bash - $ mkoctfile -v +```console +$ mkoctfile -v ``` Octave may use MPI for interprocess communication This functionality is currently not supported on the cluster cluster. In case you require the octave interface to MPI, please contact our [cluster support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/salomon/software/numerical-languages/r.md b/docs.it4i/salomon/software/numerical-languages/r.md index 6a01926e1b69bdd97d695d19b7a056419408acde..1563b4e5ae7a4feb3e4da6382b255ffb5d449080 100644 --- a/docs.it4i/salomon/software/numerical-languages/r.md +++ b/docs.it4i/salomon/software/numerical-languages/r.md @@ -21,8 +21,8 @@ The R version 3.1.1 is available on the cluster, along with GUI interface Rstudi | **R** | R 3.1.1 | R/3.1.1-intel-2015b | | **Rstudio** | Rstudio 0.98.1103 | Rstudio | -```bash - $ module load R +```console +$ ml R ``` ## Execution @@ -33,9 +33,9 @@ The R on Anselm is linked to highly optimized MKL mathematical library. This pro To run R interactively, using Rstudio GUI, log in with ssh -X parameter for X11 forwarding. Run rstudio: -```bash - $ module load Rstudio - $ rstudio +```console +$ module load Rstudio +$ rstudio ``` ### Batch Execution @@ -78,14 +78,14 @@ The package parallel provides support for parallel computation, including by for The package is activated this way: -```bash +```console $ R > library(parallel) ``` More information and examples may be obtained directly by reading the documentation available in R -```bash +```console > ?parallel > library(help = "parallel") > vignette("parallel") @@ -152,9 +152,9 @@ Read more on Rmpi at <http://cran.r-project.org/web/packages/Rmpi/>, reference m When using package Rmpi, both openmpi and R modules must be loaded -```bash - $ module load OpenMPI - $ module load R +```console +$ ml OpenMPI +$ ml R ``` Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI programm. In addition, there is Rslaves dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number π integration example, to illustrate all these concepts. @@ -211,8 +211,8 @@ Static Rmpi example: The above is the static MPI example for calculating the number π. Note the **library(Rmpi)** and **mpi.comm.dup()** function calls. Execute the example as: -```bash - $ mpirun R --slave --no-save --no-restore -f pi3.R +```console +$ mpirun R --slave --no-save --no-restore -f pi3.R ``` ### Dynamic Rmpi @@ -283,8 +283,8 @@ The above example is the dynamic MPI example for calculating the number π. Both Execute the example as: -```bash - $ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R +```console +$ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R ``` Note that this method uses MPI_Comm_spawn (Dynamic process feature of MPI-2) to start the slave processes - the master process needs to be launched with MPI. In general, Dynamic processes are not well supported among MPI implementations, some issues might arise. Also, environment variables are not propagated to spawned processes, so they will not see paths from modules. @@ -351,8 +351,8 @@ The above is the mpi.apply MPI example for calculating the number π. Only the s Execute the example as: -```bash - $ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R +```console +$ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R ``` ## Combining Parallel and Rmpi @@ -398,8 +398,8 @@ For more information about jobscripts and MPI execution refer to the [Job submis By leveraging MKL, R can accelerate certain computations, most notably linear algebra operations on the Xeon Phi accelerator by using Automated Offload. To use MKL Automated Offload, you need to first set this environment variable before R execution: -```bash - $ export MKL_MIC_ENABLE=1 +```console +$ export MKL_MIC_ENABLE=1 ``` [Read more about automatic offload](../intel-xeon-phi/) diff --git a/docs.it4i/salomon/storage.md b/docs.it4i/salomon/storage.md index d83dbc119e5a9803b947a8d508a36aba0f265870..b0e401cde014a3decb5fc4c7199796735d923cf8 100644 --- a/docs.it4i/salomon/storage.md +++ b/docs.it4i/salomon/storage.md @@ -65,14 +65,14 @@ There is default stripe configuration for Salomon Lustre file systems. However, Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. -```bash +```console $ lfs getstripe dir | filename $ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir | filename ``` Example: -```bash +```console $ lfs getstripe /scratch/work/user/username /scratch/work/user/username stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 @@ -87,7 +87,7 @@ In this example, we view current stripe setting of the /scratch/username/ direct Use lfs check OSTs to see the number and status of active OSTs for each file system on Salomon. Learn more by reading the man page -```bash +```console $ lfs check osts $ man lfs ``` @@ -112,13 +112,13 @@ Read more on <http://wiki.lustre.org/manual/LustreManual20_HTML/ManagingStriping User quotas on the Lustre file systems (SCRATCH) can be checked and reviewed using following command: -```bash +```console $ lfs quota dir ``` Example for Lustre SCRATCH directory: -```bash +```console $ lfs quota /scratch Disk quotas for user user001 (uid 1234): Filesystem kbytes quota limit grace files quota limit grace @@ -132,14 +132,14 @@ In this example, we view current quota size limit of 100TB and 8KB currently use HOME directory is mounted via NFS, so a different command must be used to obtain quota information: -```bash - $ quota +```console +$ quota ``` Example output: -```bash - $ quota +```console +$ quota Disk quotas for user vop999 (uid 1025): Filesystem blocks quota limit grace files quota limit grace home-nfs-ib.salomon.it4i.cz:/home @@ -148,13 +148,13 @@ Example output: To have a better understanding of where the space is exactly used, you can use following command to find out. -```bash +```console $ du -hs dir ``` Example for your HOME directory: -```bash +```console $ cd /home $ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr 258M cuda-samples @@ -168,11 +168,11 @@ This will list all directories which are having MegaBytes or GigaBytes of consum To have a better understanding of previous commands, you can read manpages. -```bash +```console $ man lfs ``` -```bash +```console $ man du ``` @@ -182,7 +182,7 @@ Extended ACLs provide another security mechanism beside the standard POSIX ACLs ACLs on a Lustre file system work exactly like ACLs on any Linux file system. They are manipulated with the standard tools in the standard manner. Below, we create a directory and allow a specific user access. -```bash +```console [vop999@login1.salomon ~]$ umask 027 [vop999@login1.salomon ~]$ mkdir test [vop999@login1.salomon ~]$ ls -ld test @@ -356,40 +356,40 @@ The SSHFS provides a very convenient way to access the CESNET Storage. The stora First, create the mount point -```bash - $ mkdir cesnet +```console +$ mkdir cesnet ``` Mount the storage. Note that you can choose among the ssh.du1.cesnet.cz (Plzen), ssh.du2.cesnet.cz (Jihlava), ssh.du3.cesnet.cz (Brno) Mount tier1_home **(only 5120M !)**: -```bash - $ sshfs username@ssh.du1.cesnet.cz:. cesnet/ +```console +$ sshfs username@ssh.du1.cesnet.cz:. cesnet/ ``` For easy future access from Anselm, install your public key -```bash - $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys +```console +$ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys ``` Mount tier1_cache_tape for the Storage VO: -```bash - $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ +```console +$ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ ``` View the archive, copy the files and directories in and out -```bash - $ ls cesnet/ - $ cp -a mydir cesnet/. - $ cp cesnet/myfile . +```console +$ ls cesnet/ +$ cp -a mydir cesnet/. +$ cp cesnet/myfile . ``` Once done, please remember to unmount the storage -```bash - $ fusermount -u cesnet +```console +$ fusermount -u cesnet ``` ### Rsync Access @@ -405,16 +405,16 @@ More about Rsync at [here](https://du.cesnet.cz/en/navody/rsync/start#pro_bezne_ Transfer large files to/from CESNET storage, assuming membership in the Storage VO -```bash - $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . +```console +$ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. +$ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . ``` Transfer large directories to/from CESNET storage, assuming membership in the Storage VO -```bash - $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . +```console +$ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. +$ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . ``` Transfer rates of about 28 MB/s can be expected.