diff --git a/README.md b/README.md index 374e8e393a7868a29e80fedae8880df5d39cb058..3ed6d55c0534aad18efde0e9a43c6811da460d5b 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ Mellanox ## Mathematical Formulae -Formulas are made with: +### Formulas are made with: * https://facelessuser.github.io/pymdown-extensions/extensions/arithmatex/ * https://www.mathjax.org/ diff --git a/docs.it4i/anselm/capacity-computing.md b/docs.it4i/anselm/capacity-computing.md index bae565e8590c5aed570e16a4b317737ad44c0d2f..b4a0c25b90aa93fccdf6a07d9c915d5da58411a1 100644 --- a/docs.it4i/anselm/capacity-computing.md +++ b/docs.it4i/anselm/capacity-computing.md @@ -41,7 +41,7 @@ Assume we have 900 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` @@ -78,7 +78,7 @@ If huge number of parallel multicore (in means of multinode multithread, e. g. M To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing/#array_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME -J 1-900 jobscript 12345[].dm2 ``` @@ -87,7 +87,7 @@ In this example, we submit a job array of 900 subjobs. Each subjob will run on f Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround: -```bash +```console $ qsub -N JOBNAME -J 9-10:2 jobscript ``` @@ -97,7 +97,7 @@ This will only choose the lower index (9 in this example) for submitting/running Check status of the job array by the qstat command. -```bash +```console $ qstat -a 12345[].dm2 dm2: @@ -110,7 +110,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time The status B means that some subjobs are already running. Check status of the first 100 subjobs by the qstat command. -```bash +```console $ qstat -a 12345[1-100].dm2 dm2: @@ -128,20 +128,20 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted. -```bash +```console $ qdel 12345[].dm2 ``` Deleting large job arrays may take a while. Display status information for all user's jobs, job arrays, and subjobs. -```bash +```console $ qstat -u $USER -t ``` Display status information for all user's subjobs. -```bash +```console $ qstat -u $USER -tJ ``` @@ -156,7 +156,7 @@ GNU parallel is a shell tool for executing jobs in parallel using one or more co For more information and examples see the parallel man page: -```bash +```console $ module add parallel $ man parallel ``` @@ -171,7 +171,7 @@ Assume we have 101 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file, listing all tasks - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` @@ -209,7 +209,7 @@ In this example, tasks from tasklist are executed via the GNU parallel. The jobs To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing/#gp_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME jobscript 12345.dm2 ``` @@ -239,13 +239,13 @@ Assume we have 992 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file, listing all tasks - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` Next we create a file, controlling how many tasks will be executed in one subjob -```bash +```console $ seq 32 > numtasks ``` @@ -294,7 +294,7 @@ When deciding this values, think about following guiding rules: To submit the job array, use the qsub -J command. The 992 tasks' job of the [example above](capacity-computing/#combined_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME -J 1-992:32 jobscript 12345[].dm2 ``` @@ -310,7 +310,7 @@ Download the examples in [capacity.zip](capacity.zip), illustrating the above li Unzip the archive in an empty directory on Anselm and follow the instructions in the README file -```bash +```console $ unzip capacity.zip $ cat README ``` diff --git a/docs.it4i/anselm/compute-nodes.md b/docs.it4i/anselm/compute-nodes.md index 57a6df29e675632b1c5d1951232a7c2807313f15..6df69cce1d57b11c172340ee24f845d954708ea6 100644 --- a/docs.it4i/anselm/compute-nodes.md +++ b/docs.it4i/anselm/compute-nodes.md @@ -85,7 +85,7 @@ Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23. -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16:cpu_freq=24 -I ``` @@ -93,8 +93,8 @@ In this example, we allocate 4 nodes, 16 cores at 2.4GHhz per node. Intel Turbo Boost Technology is used by default, you can disable it for all nodes of job by using resource attribute cpu_turbo_boost. -```bash - $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I +```console +$ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I ``` ## Memory Architecture diff --git a/docs.it4i/anselm/environment-and-modules.md b/docs.it4i/anselm/environment-and-modules.md index 21230e9e07911f4632105d0f4c8006aff7393254..d460fa7023c41f16c9be748205061e78f26da3a9 100644 --- a/docs.it4i/anselm/environment-and-modules.md +++ b/docs.it4i/anselm/environment-and-modules.md @@ -4,7 +4,9 @@ After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions and module loads in the .bashrc file -```bash +```console +$ cat ./bashrc + # ./bashrc # Source global definitions @@ -39,33 +41,33 @@ The modules may be loaded, unloaded and switched, according to momentary needs. To check available modules use -```bash -$ module avail +```console +$ module avail **or** ml av ``` To load a module, for example the octave module use -```bash -$ module load octave +```console +$ module load octave **or** ml octave ``` loading the octave module will set up paths and environment variables of your active shell such that you are ready to run the octave software To check loaded modules use -```bash -$ module list +```console +$ module list **or** ml ``` To unload a module, for example the octave module use -```bash -$ module unload octave +```console +$ module unload octave **or** ml -octave ``` Learn more on modules by reading the module man page -```bash +```console $ man module ``` @@ -79,7 +81,7 @@ PrgEnv-intel sets up the INTEL development environment in conjunction with the I All application modules on Salomon cluster (and further) will be build using tool called [EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). In case that you want to use some applications that are build by EasyBuild already, you have to modify your MODULEPATH environment variable. -```bash +```console export MODULEPATH=$MODULEPATH:/apps/easybuild/modules/all/ ``` diff --git a/docs.it4i/anselm/job-submission-and-execution.md b/docs.it4i/anselm/job-submission-and-execution.md index 31b41151379a6511d7a7d370231ad4c4bacaa85a..d6584a8a96256e5a5ca02747de8b00587671cd05 100644 --- a/docs.it4i/anselm/job-submission-and-execution.md +++ b/docs.it4i/anselm/job-submission-and-execution.md @@ -16,7 +16,7 @@ When allocating computational resources for the job, please specify Submit the job using the qsub command: -```bash +```console $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript ``` @@ -24,25 +24,25 @@ The qsub submits the job into the queue, in another words the qsub command creat ### Job Submission Examples -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16,walltime=03:00:00 ./myjob ``` In this example, we allocate 64 nodes, 16 cores per node, for 3 hours. We allocate these resources via the qprod queue, consumed resources will be accounted to the Project identified by Project ID OPEN-0-0. Jobscript myjob will be executed on the first node in the allocation. -```bash +```console $ qsub -q qexp -l select=4:ncpus=16 -I ``` In this example, we allocate 4 nodes, 16 cores per node, for 1 hour. We allocate these resources via the qexp queue. The resources will be available interactively -```bash +```console $ qsub -A OPEN-0-0 -q qnvidia -l select=10:ncpus=16 ./myjob ``` In this example, we allocate 10 nvidia accelerated nodes, 16 cores per node, for 24 hours. We allocate these resources via the qnvidia queue. Jobscript myjob will be executed on the first node in the allocation. -```bash +```console $ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=16 ./myjob ``` @@ -50,13 +50,13 @@ In this example, we allocate 10 nodes, 16 cores per node, for 12 hours. We alloc All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such a case, no options to qsub are needed. -```bash +```console $ qsub ./myjob ``` By default, the PBS batch system sends an e-mail only when the job is aborted. Disabling mail events completely can be done like this: -```bash +```console $ qsub -m n ``` @@ -66,8 +66,8 @@ $ qsub -m n Specific nodes may be allocated via the PBS -```bash -qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16:host=cn171+1:ncpus=16:host=cn172 -I +```console +$ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16:host=cn171+1:ncpus=16:host=cn172 -I ``` In this example, we allocate nodes cn171 and cn172, all 16 cores per node, for 24 hours. Consumed resources will be accounted to the Project identified by Project ID OPEN-0-0. The resources will be available interactively. @@ -81,7 +81,7 @@ Nodes equipped with Intel Xeon E5-2665 CPU have base clock frequency 2.4GHz, nod | Intel Xeon E5-2665 | 2.4GHz | cn[1-180], cn[208-209] | 24 | | Intel Xeon E5-2470 | 2.3GHz | cn[181-207] | 23 | -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16:cpu_freq=24 -I ``` @@ -95,8 +95,8 @@ Nodes sharing the same switch may be selected via the PBS resource attribute ibs We recommend allocating compute nodes of a single switch when best possible computational network performance is required to run the job efficiently: -```bash - qsub -A OPEN-0-0 -q qprod -l select=18:ncpus=16:ibswitch=isw11 ./myjob +```console +$ qsub -A OPEN-0-0 -q qprod -l select=18:ncpus=16:ibswitch=isw11 ./myjob ``` In this example, we request all the 18 nodes sharing the isw11 switch for 24 hours. Full chassis will be allocated. @@ -109,8 +109,8 @@ Intel Turbo Boost Technology is on by default. We strongly recommend keeping the If necessary (such as in case of benchmarking) you can disable the Turbo for all nodes of the job by using the PBS resource attribute cpu_turbo_boost -```bash - $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I +```console +$ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I ``` More about the Intel Turbo Boost in the TurboBoost section @@ -119,8 +119,8 @@ More about the Intel Turbo Boost in the TurboBoost section In the following example, we select an allocation for benchmarking a very special and demanding MPI program. We request Turbo off, 2 full chassis of compute nodes (nodes sharing the same IB switches) for 30 minutes: -```bash - $ qsub -A OPEN-0-0 -q qprod +```console +$ qsub -A OPEN-0-0 -q qprod -l select=18:ncpus=16:ibswitch=isw10:mpiprocs=1:ompthreads=16+18:ncpus=16:ibswitch=isw20:mpiprocs=16:ompthreads=1 -l cpu_turbo_boost=0,walltime=00:30:00 -N Benchmark ./mybenchmark @@ -135,7 +135,7 @@ Although this example is somewhat artificial, it demonstrates the flexibility of !!! note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands -```bash +```console $ qstat -a $ qstat -a -u username $ qstat -an -u username @@ -144,7 +144,7 @@ $ qstat -f 12345.srv11 Example: -```bash +```console $ qstat -a srv11: @@ -160,19 +160,17 @@ In this example user1 and user2 are running jobs named job1, job2 and job3x. The Check status of your jobs using check-pbs-jobs command. Check presence of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output. -```bash +```console $ check-pbs-jobs --check-all $ check-pbs-jobs --print-load --print-processes $ check-pbs-jobs --print-job-out --print-job-err - $ check-pbs-jobs --jobid JOBID --check-all --print-all - $ check-pbs-jobs --jobid JOBID --tailf-job-out ``` Examples: -```bash +```console $ check-pbs-jobs --check-all JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 Check session id: OK @@ -183,7 +181,7 @@ cn165: No process In this example we see that job 35141.dm2 currently runs no process on allocated node cn165, which may indicate an execution error. -```bash +```console $ check-pbs-jobs --print-load --print-processes JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 Print load @@ -199,7 +197,7 @@ cn164: 99.7 run-task In this example we see that job 35141.dm2 currently runs process run-task on node cn164, using one thread only, while node cn165 is empty, which may indicate an execution error. -```bash +```console $ check-pbs-jobs --jobid 35141.dm2 --print-job-out JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 Print job standard output: @@ -218,19 +216,19 @@ In this example, we see actual output (some iteration loops) of the job 35141.dm You may release your allocation at any time, using qdel command -```bash +```console $ qdel 12345.srv11 ``` You may kill a running job by force, using qsig command -```bash +```console $ qsig -s 9 12345.srv11 ``` Learn more by reading the pbs man page -```bash +```console $ man pbs_professional ``` @@ -246,7 +244,7 @@ The Jobscript is a user made script, controlling sequence of commands for execut !!! note The jobscript or interactive shell is executed on first of the allocated nodes. -```bash +```console $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username @@ -262,7 +260,7 @@ In this example, the nodes cn17, cn108, cn109 and cn110 were allocated for 1 hou The jobscript or interactive shell is by default executed in home directory -```bash +```console $ qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready @@ -280,7 +278,7 @@ The allocated nodes are accessible via ssh from login nodes. The nodes may acces Calculations on allocated nodes may be executed remotely via the MPI, ssh, pdsh or clush. You may find out which nodes belong to the allocation by reading the $PBS_NODEFILE file -```bash +```console qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready diff --git a/docs.it4i/anselm/network.md b/docs.it4i/anselm/network.md index a2af06f97a85472d327eeffc4a743d5eb70d6bb1..79c6f1a37f0d22f286e4de57dac097dcea8d19e8 100644 --- a/docs.it4i/anselm/network.md +++ b/docs.it4i/anselm/network.md @@ -19,7 +19,7 @@ The compute nodes may be accessed via the regular Gigabit Ethernet network inter ## Example -```bash +```console $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username Req'd Req'd Elap diff --git a/docs.it4i/anselm/prace.md b/docs.it4i/anselm/prace.md index 7657e263f073da5865f4a54d20169c78ed6c2a48..061cd0a0714075f3caca51152363e10ff795176f 100644 --- a/docs.it4i/anselm/prace.md +++ b/docs.it4i/anselm/prace.md @@ -36,14 +36,14 @@ Most of the information needed by PRACE users accessing the Anselm TIER-1 system Before you start to use any of the services don't forget to create a proxy certificate from your certificate: -```bash - $ grid-proxy-init +```console +$ grid-proxy-init ``` To check whether your proxy certificate is still valid (by default it's valid 12 hours), use: -```bash - $ grid-proxy-info +```console +$ grid-proxy-info ``` To access Anselm cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners). @@ -58,14 +58,14 @@ It is recommended to use the single DNS name anselm-prace.it4i.cz which is distr | login1-prace.anselm.it4i.cz | 2222 | gsissh | login1 | | login2-prace.anselm.it4i.cz | 2222 | gsissh | login2 | -```bash - $ gsissh -p 2222 anselm-prace.it4i.cz +```console +$ gsissh -p 2222 anselm-prace.it4i.cz ``` When logging from other PRACE system, the prace_service script can be used: -```bash - $ gsissh `prace_service -i -s anselm` +```console +$ gsissh `prace_service -i -s anselm` ``` #### Access From Public Internet: @@ -78,26 +78,26 @@ It is recommended to use the single DNS name anselm.it4i.cz which is distributed | login1.anselm.it4i.cz | 2222 | gsissh | login1 | | login2.anselm.it4i.cz | 2222 | gsissh | login2 | -```bash - $ gsissh -p 2222 anselm.it4i.cz +```console +$ gsissh -p 2222 anselm.it4i.cz ``` When logging from other PRACE system, the prace_service script can be used: -```bash - $ gsissh `prace_service -e -s anselm` +```console +$ gsissh `prace_service -e -s anselm` ``` Although the preferred and recommended file transfer mechanism is [using GridFTP](prace/#file-transfers), the GSI SSH implementation on Anselm supports also SCP, so for small files transfer gsiscp can be used: -```bash - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ +```console +$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ ``` ### Access to X11 Applications (VNC) @@ -106,8 +106,8 @@ If the user needs to run X11 based graphical application and does not have a X11 If the user uses GSI SSH based access, then the procedure is similar to the SSH based access, only the port forwarding must be done using GSI SSH: -```bash - $ gsissh -p 2222 anselm.it4i.cz -L 5961:localhost:5961 +```console +$ gsissh -p 2222 anselm.it4i.cz -L 5961:localhost:5961 ``` ### Access With SSH @@ -133,26 +133,26 @@ There's one control server and three backend servers for striping and/or backup Copy files **to** Anselm by running the following commands on your local machine: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ ``` Copy files **from** Anselm: -```bash - $ globus-url-copy gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` ### Access From Public Internet @@ -166,26 +166,26 @@ Or by using prace_service script: Copy files **to** Anselm by running the following commands on your local machine: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ ``` Copy files **from** Anselm: -```bash - $ globus-url-copy gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Generally both shared file systems are available through GridFTP: @@ -209,8 +209,8 @@ All system wide installed software on the cluster is made available to the users PRACE users can use the "prace" module to use the [PRACE Common Production Environment](http://www.prace-ri.eu/prace-common-production-environment/). -```bash - $ module load prace +```console +$ module load prace ``` ### Resource Allocation and Job Execution @@ -241,8 +241,8 @@ Users who have undergone the full local registration procedure (including signin !!! hint The **it4ifree** command is a part of it4i.portal.clients package, [located here](https://pypi.python.org/pypi/it4i.portal.clients). -```bash - $ it4ifree +```console +$ it4ifree Password: PID Total Used ...by me Free -------- ------- ------ -------- ------- @@ -252,9 +252,9 @@ Users who have undergone the full local registration procedure (including signin By default file system quota is applied. To check the current status of the quota use -```bash - $ lfs quota -u USER_LOGIN /home - $ lfs quota -u USER_LOGIN /scratch +```console +$ lfs quota -u USER_LOGIN /home +$ lfs quota -u USER_LOGIN /scratch ``` If the quota is insufficient, please contact the [support](prace/#help-and-support) and request an increase. diff --git a/docs.it4i/anselm/remote-visualization.md b/docs.it4i/anselm/remote-visualization.md index 7b0149fce735ac31592baa6f232cf3be2ffc5a54..e5a439b4654da5342101d15287212501b87c0df9 100644 --- a/docs.it4i/anselm/remote-visualization.md +++ b/docs.it4i/anselm/remote-visualization.md @@ -46,7 +46,7 @@ To have the OpenGL acceleration, **24 bit color depth must be used**. Otherwise This example defines desktop with dimensions 1200x700 pixels and 24 bit color depth. -```bash +```console $ module load turbovnc/1.2.2 $ vncserver -geometry 1200x700 -depth 24 @@ -58,7 +58,7 @@ Log file is /home/username/.vnc/login2:1.log #### 3. Remember Which Display Number Your VNC Server Runs (You Will Need It in the Future to Stop the Server) -```bash +```console $ vncserver -list TurboVNC server sessions: @@ -71,7 +71,7 @@ In this example the VNC server runs on display **:1**. #### 4. Remember the Exact Login Node, Where Your VNC Server Runs -```bash +```console $ uname -n login2 ``` @@ -82,7 +82,7 @@ In this example the VNC server runs on **login2**. To get the port you have to look to the log file of your VNC server. -```bash +```console $ grep -E "VNC.*port" /home/username/.vnc/login2:1.log 20/02/2015 14:46:41 Listening for VNC connections on TCP port 5901 ``` @@ -93,7 +93,7 @@ In this example the VNC server listens on TCP port **5901**. Tunnel the TCP port on which your VNC server is listenning. -```bash +```console $ ssh login2.anselm.it4i.cz -L 5901:localhost:5901 ``` @@ -109,7 +109,7 @@ Get it from: <http://sourceforge.net/projects/turbovnc/> Mind that you should connect through the SSH tunneled port. In this example it is 5901 on your workstation (localhost). -```bash +```console $ vncviewer localhost:5901 ``` @@ -123,7 +123,7 @@ Now you should have working TurboVNC session connected to your workstation. Don't forget to correctly shutdown your own VNC server on the login node! -```bash +```console $ vncserver -kill :1 ``` @@ -147,13 +147,13 @@ To access the visualization node, follow these steps: This step is necessary to allow you to proceed with next steps. -```bash +```console $ qsub -I -q qviz -A PROJECT_ID ``` In this example the default values for CPU cores and usage time are used. -```bash +```console $ qsub -I -q qviz -A PROJECT_ID -l select=1:ncpus=16 -l walltime=02:00:00 ``` @@ -163,7 +163,7 @@ In this example a whole node for 2 hours is requested. If there are free resources for your request, you will have a shell unning on an assigned node. Please remember the name of the node. -```bash +```console $ uname -n srv8 ``` @@ -174,7 +174,7 @@ In this example the visualization session was assigned to node **srv8**. Setup the VirtualGL connection to the node, which PBSPro allocated for our job. -```bash +```console $ vglconnect srv8 ``` @@ -182,19 +182,19 @@ You will be connected with created VirtualGL tunnel to the visualization ode, wh #### 3. Load the VirtualGL Module -```bash +```console $ module load virtualgl/2.4 ``` #### 4. Run Your Desired OpenGL Accelerated Application Using VirtualGL Script "Vglrun" -```bash +```console $ vglrun glxgears ``` If you want to run an OpenGL application which is vailable through modules, you need at first load the respective module. E.g. to run the **Mentat** OpenGL application from **MARC** software ackage use: -```bash +```console $ module load marc/2013.1 $ vglrun mentat ``` diff --git a/docs.it4i/anselm/resources-allocation-policy.md b/docs.it4i/anselm/resources-allocation-policy.md index 16cb7510d63075d413a19a9a9702ebbf23a4fb78..7ed577a23fbc25aa38487157915e482da168313e 100644 --- a/docs.it4i/anselm/resources-allocation-policy.md +++ b/docs.it4i/anselm/resources-allocation-policy.md @@ -43,13 +43,13 @@ Anselm users may check current queue configuration at <https://extranet.it4i.cz/ Display the queue status on Anselm: -```bash +```console $ qstat -q ``` The PBS allocation overview may be obtained also using the rspbs command. -```bash +```console $ rspbs Usage: rspbs [options] @@ -118,7 +118,7 @@ The resources that are currently subject to accounting are the core-hours. The c User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. -```bash +```console $ it4ifree Password: PID Total Used ...by me Free diff --git a/docs.it4i/anselm/shell-and-data-access.md b/docs.it4i/anselm/shell-and-data-access.md index 260945ed1b896b1740a98f5de44a5e2caa9910e3..e850c88133c723937ecdd17ec6e6eb08d7e7541f 100644 --- a/docs.it4i/anselm/shell-and-data-access.md +++ b/docs.it4i/anselm/shell-and-data-access.md @@ -22,13 +22,13 @@ Private key authentication: On **Linux** or **Mac**, use -```bash +```console local $ ssh -i /path/to/id_rsa username@anselm.it4i.cz ``` If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this command to set lower permissions to private key file. -```bash +```console local $ chmod 600 /path/to/id_rsa ``` @@ -36,7 +36,7 @@ On **Windows**, use [PuTTY ssh client](../general/accessing-the-clusters/shell-a After logging in, you will see the command prompt: -```bash +```console _ /\ | | / \ _ __ ___ ___| |_ __ ___ @@ -81,23 +81,23 @@ To achieve 160MB/s transfer rates, the end user must be connected by 10G line al On linux or Mac, use scp or sftp client to transfer the data to Anselm: -```bash +```console local $ scp -i /path/to/id_rsa my-local-file username@anselm.it4i.cz:directory/file ``` -```bash +```console local $ scp -i /path/to/id_rsa -r my-local-dir username@anselm.it4i.cz:directory ``` or -```bash +```console local $ sftp -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz ``` Very convenient way to transfer files in and out of the Anselm computer is via the fuse filesystem [sshfs](http://linux.die.net/man/1/sshfs) -```bash +```console local $ sshfs -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz:. mountpoint ``` @@ -105,7 +105,7 @@ Using sshfs, the users Anselm home directory will be mounted on your local compu Learn more on ssh, scp and sshfs by reading the manpages -```bash +```console $ man ssh $ man scp $ man sshfs @@ -142,7 +142,7 @@ It works by tunneling the connection from Anselm back to users workstation and f Pick some unused port on Anselm login node (for example 6000) and establish the port forwarding: -```bash +```console local $ ssh -R 6000:remote.host.com:1234 anselm.it4i.cz ``` @@ -152,7 +152,7 @@ Port forwarding may be done **using PuTTY** as well. On the PuTTY Configuration Port forwarding may be established directly to the remote host. However, this requires that user has ssh access to remote.host.com -```bash +```console $ ssh -L 6000:localhost:1234 remote.host.com ``` @@ -167,7 +167,7 @@ First, establish the remote port forwarding form the login node, as [described a Second, invoke port forwarding from the compute node to the login node. Insert following line into your jobscript or interactive shell -```bash +```console $ ssh -TN -f -L 6000:localhost:6000 login1 ``` @@ -182,7 +182,7 @@ Port forwarding is static, each single port is mapped to a particular port on re To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: -```bash +```console local $ ssh -D 1080 localhost ``` @@ -190,7 +190,7 @@ On Windows, install and run the free, open source [Sock Puppet](http://sockspupp Once the proxy server is running, establish ssh port forwarding from Anselm to the proxy server, port 1080, exactly as [described above](#port-forwarding-from-login-nodes). -```bash +```console local $ ssh -R 6000:localhost:1080 anselm.it4i.cz ``` diff --git a/docs.it4i/anselm/software/ansys/ansys-fluent.md b/docs.it4i/anselm/software/ansys/ansys-fluent.md index ff1f7cdd21a26283fd7522fc2cc286f00bde73a7..4521c758ed7def8e6795f9de97ecb0d698cd9dc9 100644 --- a/docs.it4i/anselm/software/ansys/ansys-fluent.md +++ b/docs.it4i/anselm/software/ansys/ansys-fluent.md @@ -44,7 +44,7 @@ Working directory has to be created before sending pbs job into the queue. Input Journal file with definition of the input geometry and boundary conditions and defined process of solution has e.g. the following structure: -```bash +```console /file/read-case aircraft_2m.cas.gz /solve/init init @@ -58,7 +58,7 @@ The appropriate dimension of the problem has to be set by parameter (2d/3d). ## Fast Way to Run Fluent From Command Line -```bash +```console fluent solver_version [FLUENT_options] -i journal_file -pbs ``` @@ -68,7 +68,7 @@ This syntax will start the ANSYS FLUENT job under PBS Professional using the qsu The sample script uses a configuration file called pbs_fluent.conf if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content of pbs_fluent.conf can be: -```bash +```console input="example_small.flin" case="Small-1.65m.cas" fluent_args="3d -pmyrinet" @@ -145,7 +145,7 @@ It runs the jobs out of the directory from which they are submitted (PBS_O_WORKD Fluent could be run in parallel only under Academic Research license. To do so this ANSYS Academic Research license must be placed before ANSYS CFD license in user preferences. To make this change anslic_admin utility should be run -```bash +```console /ansys_inc/shared_les/licensing/lic_admin/anslic_admin ``` diff --git a/docs.it4i/anselm/software/ansys/ansys.md b/docs.it4i/anselm/software/ansys/ansys.md index 16be5639d93fc6d14baaff251a5b09a1d0e31b62..24b8b1c09721168d11a214f00a2ee50a109e6c20 100644 --- a/docs.it4i/anselm/software/ansys/ansys.md +++ b/docs.it4i/anselm/software/ansys/ansys.md @@ -6,8 +6,8 @@ Anselm provides commercial as well as academic variants. Academic variants are d To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module: -```bash - $ module load ansys +```console +$ ml ansys ``` ANSYS supports interactive regime, but due to assumed solution of extremely difficult tasks it is not recommended. diff --git a/docs.it4i/anselm/software/chemistry/nwchem.md b/docs.it4i/anselm/software/chemistry/nwchem.md index 9f09fe794a121ddc173d3a037fe0e6e3e7101163..e4f84d49f9b8a38cba53f212d7db1bc6c8c8c7d2 100644 --- a/docs.it4i/anselm/software/chemistry/nwchem.md +++ b/docs.it4i/anselm/software/chemistry/nwchem.md @@ -17,8 +17,8 @@ The following versions are currently installed: For a current list of installed versions, execute: -```bash - module avail nwchem +```console +$ ml av nwchem ``` ## Running diff --git a/docs.it4i/anselm/software/compilers.md b/docs.it4i/anselm/software/compilers.md index d1e59f29fd5c7862e8ad28780c1355ba837f8da1..71e60499b1bb335ddb7a6919e22457aa70b68fa5 100644 --- a/docs.it4i/anselm/software/compilers.md +++ b/docs.it4i/anselm/software/compilers.md @@ -22,20 +22,19 @@ For compatibility reasons there are still available the original (old 4.4.6-4) v It is strongly recommended to use the up to date version (4.8.1) which comes with the module gcc: -```bash - $ module load gcc - $ gcc -v - $ g++ -v - $ gfortran -v +```console +$ ml gcc +$ gcc -v +$ g++ -v +$ gfortran -v ``` With the module loaded two environment variables are predefined. One for maximum optimizations on the Anselm cluster architecture, and the other for debugging purposes: -```bash - $ echo $OPTFLAGS +```console +$ echo $OPTFLAGS -O3 -march=corei7-avx - - $ echo $DEBUGFLAGS +$ echo $DEBUGFLAGS -O0 -g ``` @@ -52,16 +51,16 @@ For more information about the possibilities of the compilers, please see the ma To use the GNU UPC compiler and run the compiled binaries use the module gupc -```bash - $ module add gupc - $ gupc -v - $ g++ -v +```console +$ module add gupc +$ gupc -v +$ g++ -v ``` Simple program to test the compiler -```bash - $ cat count.upc +```console +$ cat count.upc /* hello.upc - a simple UPC example */ #include <upc.h> @@ -79,14 +78,14 @@ Simple program to test the compiler To compile the example use -```bash - $ gupc -o count.upc.x count.upc +```console +$ gupc -o count.upc.x count.upc ``` To run the example with 5 threads issue -```bash - $ ./count.upc.x -fupc-threads-5 +```console +$ ./count.upc.x -fupc-threads-5 ``` For more information see the man pages. @@ -95,9 +94,9 @@ For more information see the man pages. To use the Berkley UPC compiler and runtime environment to run the binaries use the module bupc -```bash - $ module add bupc - $ upcc -version +```console +$ module add bupc +$ upcc -version ``` As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only. @@ -109,8 +108,8 @@ For production runs, it is recommended to use the native Infiband implementation Example UPC code: -```bash - $ cat hello.upc +```console +$ cat hello.upc /* hello.upc - a simple UPC example */ #include <upc.h> @@ -128,22 +127,22 @@ Example UPC code: To compile the example with the "ibv" UPC network use -```bash - $ upcc -network=ibv -o hello.upc.x hello.upc +```console +$ upcc -network=ibv -o hello.upc.x hello.upc ``` To run the example with 5 threads issue -```bash - $ upcrun -n 5 ./hello.upc.x +```console +$ upcrun -n 5 ./hello.upc.x ``` To run the example on two compute nodes using all 32 cores, with 32 threads, issue -```bash - $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 - $ module add bupc - $ upcrun -n 32 ./hello.upc.x +```console +$ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 +$ module add bupc +$ upcrun -n 32 ./hello.upc.x ``` For more information see the man pages. diff --git a/docs.it4i/anselm/software/comsol-multiphysics.md b/docs.it4i/anselm/software/comsol-multiphysics.md index a23622f76c314ffb10dfa3ef98ae637e142270a9..74672428542f3643d754768b7d3c44ed22f22cb6 100644 --- a/docs.it4i/anselm/software/comsol-multiphysics.md +++ b/docs.it4i/anselm/software/comsol-multiphysics.md @@ -23,23 +23,23 @@ On the Anselm cluster COMSOL is available in the latest stable version. There ar To load the of COMSOL load the module -```bash - $ module load comsol +```console +$ ml comsol ``` By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use -```bash - $ module avail comsol +```console +$ ml av comsol ``` If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the Virtual Network Computing (VNC). -```bash - $ xhost + - $ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=16 - $ module load comsol - $ comsol +```console +$ xhost + +$ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=16 +$ ml comsol +$ comsol ``` To run COMSOL in batch mode, without the COMSOL Desktop GUI environment, user can utilized the default (comsol.pbs) job script and execute it via the qsub command. @@ -78,11 +78,11 @@ COMSOL is the software package for the numerical solution of the partial differe LiveLink for MATLAB is available in both **EDU** and **COM** **variant** of the COMSOL release. On Anselm 1 commercial (**COM**) license and the 5 educational (**EDU**) licenses of LiveLink for MATLAB (please see the [ISV Licenses](isv_licenses/)) are available. Following example shows how to start COMSOL model from MATLAB via LiveLink in the interactive mode. -```bash +```console $ xhost + $ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ncpus=16 -$ module load matlab -$ module load comsol +$ ml matlab +$ ml comsol $ comsol server matlab ``` diff --git a/docs.it4i/anselm/software/debuggers/allinea-ddt.md b/docs.it4i/anselm/software/debuggers/allinea-ddt.md index 6c1c664fb22163d3f9eadd023486494870f2a0a9..f85848417002cc5c9f15d54ea437410ca4585f11 100644 --- a/docs.it4i/anselm/software/debuggers/allinea-ddt.md +++ b/docs.it4i/anselm/software/debuggers/allinea-ddt.md @@ -24,20 +24,20 @@ In case of debugging on accelerators: Load all necessary modules to compile the code. For example: -```bash - $ module load intel - $ module load impi ... or ... module load openmpi/X.X.X-icc +```console +$ ml intel +$ ml impi ... or ... module load openmpi/X.X.X-icc ``` Load the Allinea DDT module: -```bash - $ module load Forge +```console +$ ml Forge ``` Compile the code: -```bash +```console $ mpicc -g -O0 -o test_debug test.c $ mpif90 -g -O0 -o test_debug test.f @@ -55,22 +55,22 @@ Before debugging, you need to compile your code with theses flags: Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -```bash - $ ssh -X username@anselm.it4i.cz +```console +$ ssh -X username@anselm.it4i.cz ``` Other options is to access login node using VNC. Please see the detailed information on how to [use graphic user interface on Anselm](/general/accessing-the-clusters/graphical-user-interface/x-window-system/) From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command: -```bash - $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +```console +$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 ``` Then launch the debugger with the ddt command followed by the name of the executable to debug: -```bash - $ ddt test_debug +```console +$ ddt test_debug ``` A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. Command line arguments to a program can be entered to the "Arguments " box. @@ -79,16 +79,16 @@ A submission window that appears have a prefilled path to the executable to debu To start the debugging directly without the submission window, user can specify the debugging and execution parameters from the command line. For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help". -```bash - ddt -start -np 4 ./hello_debug_impi +```console +ddt -start -np 4 ./hello_debug_impi ``` ## Documentation Users can find original User Guide after loading the DDT module: -```bash - $DDTPATH/doc/userguide.pdf +```console +$DDTPATH/doc/userguide.pdf ``` [1] Discipline, Magic, Inspiration and Science: Best Practice Debugging with Allinea DDT, Workshop conducted at LLNL by Allinea on May 10, 2013, [link](https://computing.llnl.gov/tutorials/allineaDDT/index.html) diff --git a/docs.it4i/anselm/software/debuggers/allinea-performance-reports.md b/docs.it4i/anselm/software/debuggers/allinea-performance-reports.md index 614e6277ba5fcb8401b9a68668626709aa143ede..a5399a61e7ae133d4c037391a1123b0170a132ec 100644 --- a/docs.it4i/anselm/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/anselm/software/debuggers/allinea-performance-reports.md @@ -12,8 +12,8 @@ Our license is limited to 64 MPI processes. Allinea Performance Reports version 6.0 is available -```bash - $ module load PerformanceReports/6.0 +```console +$ ml PerformanceReports/6.0 ``` The module sets up environment variables, required for using the Allinea Performance Reports. This particular command loads the default module, which is performance reports version 4.2. @@ -25,8 +25,8 @@ The module sets up environment variables, required for using the Allinea Perform Instead of [running your MPI program the usual way](../mpi/), use the the perf report wrapper: -```bash - $ perf-report mpirun ./mympiprog.x +```console +$ perf-report mpirun ./mympiprog.x ``` The mpi program will run as usual. The perf-report creates two additional files, in \*.txt and \*.html format, containing the performance report. Note that [demanding MPI codes should be run within the queue system](../../job-submission-and-execution/). @@ -37,23 +37,23 @@ In this example, we will be profiling the mympiprog.x MPI program, using Allinea First, we allocate some nodes via the express queue: -```bash - $ qsub -q qexp -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 -I +```console +$ qsub -q qexp -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 -I qsub: waiting for job 262197.dm2 to start qsub: job 262197.dm2 ready ``` Then we load the modules and run the program the usual way: -```bash - $ module load intel impi allinea-perf-report/4.2 - $ mpirun ./mympiprog.x +```console +$ ml intel impi allinea-perf-report/4.2 +$ mpirun ./mympiprog.x ``` Now lets profile the code: -```bash - $ perf-report mpirun ./mympiprog.x +```console +$ perf-report mpirun ./mympiprog.x ``` Performance report files [mympiprog_32p\*.txt](../../../src/mympiprog_32p_2014-10-15_16-56.txt) and [mympiprog_32p\*.html](../../../src/mympiprog_32p_2014-10-15_16-56.html) were created. We can see that the code is very efficient on MPI and is CPU bounded. diff --git a/docs.it4i/anselm/software/debuggers/debuggers.md b/docs.it4i/anselm/software/debuggers/debuggers.md index dd2bc60d833d9fa269c1df98d895fb969a601cd7..3d38fd6a59565a1814df261d6cc2383f9bef7c59 100644 --- a/docs.it4i/anselm/software/debuggers/debuggers.md +++ b/docs.it4i/anselm/software/debuggers/debuggers.md @@ -8,9 +8,9 @@ We provide state of the art programms and tools to develop, profile and debug HP The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use X display for running the GUI. -```bash - $ module load intel - $ idb +```console +$ ml intel +$ idb ``` Read more at the [Intel Debugger](intel-suite/intel-debugger/) page. @@ -19,9 +19,9 @@ Read more at the [Intel Debugger](intel-suite/intel-debugger/) page. Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process even if these processes are distributed across a cluster using an MPI implementation. -```bash - $ module load Forge - $ forge +```console +$ ml Forge +$ forge ``` Read more at the [Allinea DDT](debuggers/allinea-ddt/) page. @@ -30,9 +30,9 @@ Read more at the [Allinea DDT](debuggers/allinea-ddt/) page. Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. -```bash - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 +```console +$ ml PerformanceReports/6.0 +$ perf-report mpirun -n 64 ./my_application argument01 argument02 ``` Read more at the [Allinea Performance Reports](debuggers/allinea-performance-reports/) page. @@ -41,9 +41,9 @@ Read more at the [Allinea Performance Reports](debuggers/allinea-performance-rep TotalView is a source- and machine-level debugger for multi-process, multi-threaded programs. Its wide range of tools provides ways to analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. -```bash - $ module load totalview - $ totalview +```console +$ ml totalview +$ totalview ``` Read more at the [Totalview](debuggers/total-view/) page. @@ -52,9 +52,9 @@ Read more at the [Totalview](debuggers/total-view/) page. Vampir is a GUI trace analyzer for traces in OTF format. -```bash - $ module load Vampir/8.5.0 - $ vampir +```console +$ ml Vampir/8.5.0 +$ vampir ``` Read more at the [Vampir](vampir/) page. diff --git a/docs.it4i/anselm/software/debuggers/intel-performance-counter-monitor.md b/docs.it4i/anselm/software/debuggers/intel-performance-counter-monitor.md index f9e8e88dcaf2186ea59519f7a7b31305fd1287d6..b46b472b68577a3f0764199439de310a967a4bde 100644 --- a/docs.it4i/anselm/software/debuggers/intel-performance-counter-monitor.md +++ b/docs.it4i/anselm/software/debuggers/intel-performance-counter-monitor.md @@ -8,8 +8,8 @@ Intel PCM (Performance Counter Monitor) is a tool to monitor performance hardwar Currently installed version 2.6. To load the [module](../../environment-and-modules/), issue: -```bash - $ module load intelpcm +```console +$ ml intelpcm ``` ## Command Line Tools @@ -20,15 +20,15 @@ PCM provides a set of tools to monitor system/or application. Measures memory bandwidth of your application or the whole system. Usage: -```bash - $ pcm-memory.x <delay>|[external_program parameters] +```console +$ pcm-memory.x <delay>|[external_program parameters] ``` Specify either a delay of updates in seconds or an external program to monitor. If you get an error about PMU in use, respond "y" and relaunch the program. Sample output: -```bash +```console ---------------------------------------||--------------------------------------- -- Socket 0 --||-- Socket 1 -- ---------------------------------------||--------------------------------------- @@ -77,7 +77,7 @@ This command provides an overview of performance counters and memory usage. Usag Sample output : -```bash +```console $ pcm.x ./matrix Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100 ID=db05e43) @@ -246,14 +246,14 @@ Sample program using the API : Compile it with : -```bash - $ icc matrix.cpp -o matrix -lpthread -lpcm +```console +$ icc matrix.cpp -o matrix -lpthread -lpcm ``` Sample output: -```bash - $ ./matrix +```console +$ ./matrix Number of physical cores: 16 Number of logical cores: 16 Threads (logical cores) per physical core: 1 diff --git a/docs.it4i/anselm/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/anselm/software/debuggers/intel-vtune-amplifier.md index e9921046dd13f4b3b3b345f2666b426f2bd5ca9c..1d90aacfee0141246d4fbe41912ca8e3040b30db 100644 --- a/docs.it4i/anselm/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/anselm/software/debuggers/intel-vtune-amplifier.md @@ -16,14 +16,14 @@ Intel VTune Amplifier, part of Intel Parallel studio, is a GUI profiling tool de To launch the GUI, first load the module: -```bash - $ module add VTune/2016_update1 +```console +$ module add VTune/2016_update1 ``` and launch the GUI : -```bash - $ amplxe-gui +```console +$ amplxe-gui ``` !!! note @@ -39,8 +39,8 @@ VTune Amplifier also allows a form of remote analysis. In this mode, data for an The command line will look like this: -```bash - /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-and-callcount -mrte-mode=native -target-duration-type=veryshort -app-working-dir /home/sta545/test -- /home/sta545/test_pgsesv +```console +$ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-and-callcount -mrte-mode=native -target-duration-type=veryshort -app-working-dir /home/sta545/test -- /home/sta545/test_pgsesv ``` Copy the line to clipboard and then you can paste it in your jobscript or in command line. After the collection is run, open the GUI once again, click the menu button in the upper right corner, and select "_Open > Result..._". The GUI will load the results from the run. @@ -63,8 +63,8 @@ Note that we include source ~/.profile in the command to setup environment paths You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : -```bash - $ amplxe-cl -collect knc-hotspots -no-auto-finalize -- ssh mic0 +```console +$ amplxe-cl -collect knc-hotspots -no-auto-finalize -- ssh mic0 "export LD_LIBRARY_PATH=/apps/intel/composer_xe_2015.2.164/compiler/lib/mic/:/apps/intel/composer_xe_2015.2.164/mkl/lib/mic/; export KMP_AFFINITY=compact; /tmp/app.mic" ``` diff --git a/docs.it4i/anselm/software/debuggers/papi.md b/docs.it4i/anselm/software/debuggers/papi.md index bc36923e83e2d464b40e41b3b43ce4316289c3f4..d03dd8354769895e3b7f8454f5a0dd613a626bc3 100644 --- a/docs.it4i/anselm/software/debuggers/papi.md +++ b/docs.it4i/anselm/software/debuggers/papi.md @@ -12,8 +12,8 @@ PAPI can be used with parallel as well as serial programs. To use PAPI, load [module](../../environment-and-modules/) papi: -```bash - $ module load papi +```console +$ ml papi ``` This will load the default version. Execute module avail papi for a list of installed versions. @@ -26,8 +26,8 @@ The bin directory of PAPI (which is automatically added to $PATH upon loading t Prints which preset events are available on the current CPU. The third column indicated whether the preset event is available on the current CPU. -```bash - $ papi_avail +```console +$ papi_avail Available events and hardware information. -------------------------------------------------------------------------------- PAPI Version : 5.3.2.0 @@ -108,7 +108,7 @@ PAPI can be used to query some system infromation, such as CPU name and MHz. [Se The following example prints MFLOPS rate of a naive matrix-matrix multiplication: -```bash +```cpp #include <stdlib.h> #include <stdio.h> #include "papi.h" @@ -149,9 +149,9 @@ The following example prints MFLOPS rate of a naive matrix-matrix multiplication Now compile and run the example : -```bash - $ gcc matrix.c -o matrix -lpapi - $ ./matrix +```console +$ gcc matrix.c -o matrix -lpapi +$ ./matrix Real_time: 8.852785 Proc_time: 8.850000 Total flpins: 6012390908 @@ -160,9 +160,9 @@ Now compile and run the example : Let's try with optimizations enabled : -```bash - $ gcc -O3 matrix.c -o matrix -lpapi - $ ./matrix +```console +$ gcc -O3 matrix.c -o matrix -lpapi +$ ./matrix Real_time: 0.000020 Proc_time: 0.000000 Total flpins: 6 @@ -179,9 +179,9 @@ Now we see a seemingly strange result - the multiplication took no time and only Now the compiler won't remove the multiplication loop. (However it is still not that smart to see that the result won't ever be negative). Now run the code again: -```bash - $ gcc -O3 matrix.c -o matrix -lpapi - $ ./matrix +```console +$ gcc -O3 matrix.c -o matrix -lpapi +$ ./matrix Real_time: 8.795956 Proc_time: 8.790000 Total flpins: 18700983160 @@ -195,39 +195,39 @@ Now the compiler won't remove the multiplication loop. (However it is still not To use PAPI in [Intel Xeon Phi](../intel-xeon-phi/) native applications, you need to load module with " -mic" suffix, for example " papi/5.3.2-mic" : -```bash - $ module load papi/5.3.2-mic +```console +$ ml papi/5.3.2-mic ``` Then, compile your application in the following way: -```bash - $ module load intel - $ icc -mmic -Wl,-rpath,/apps/intel/composer_xe_2013.5.192/compiler/lib/mic matrix-mic.c -o matrix-mic -lpapi -lpfm +```console +$ ml intel +$ icc -mmic -Wl,-rpath,/apps/intel/composer_xe_2013.5.192/compiler/lib/mic matrix-mic.c -o matrix-mic -lpapi -lpfm ``` To execute the application on MIC, you need to manually set LD_LIBRARY_PATH: -```bash - $ qsub -q qmic -A NONE-0-0 -I - $ ssh mic0 - $ export LD_LIBRARY_PATH=/apps/tools/papi/5.4.0-mic/lib/ - $ ./matrix-mic +```console +$ qsub -q qmic -A NONE-0-0 -I +$ ssh mic0 +$ export LD_LIBRARY_PATH="/apps/tools/papi/5.4.0-mic/lib/" +$ ./matrix-mic ``` Alternatively, you can link PAPI statically (-static flag), then LD_LIBRARY_PATH does not need to be set. You can also execute the PAPI tools on MIC : -```bash - $ /apps/tools/papi/5.4.0-mic/bin/papi_native_avail +```console +$ /apps/tools/papi/5.4.0-mic/bin/papi_native_avail ``` To use PAPI in offload mode, you need to provide both host and MIC versions of PAPI: -```bash - $ module load papi/5.4.0 - $ icc matrix-offload.c -o matrix-offload -offload-option,mic,compiler,"-L$PAPI_HOME-mic/lib -lpapi" -lpapi +```console +$ ml papi/5.4.0 +$ icc matrix-offload.c -o matrix-offload -offload-option,mic,compiler,"-L$PAPI_HOME-mic/lib -lpapi" -lpapi ``` ## References diff --git a/docs.it4i/anselm/software/debuggers/scalasca.md b/docs.it4i/anselm/software/debuggers/scalasca.md index 19daec04e24247f40721c8ef61632d17290daa80..a7cd44b1d5236eb3e257a24f5a3cfbdb96e6b0f5 100644 --- a/docs.it4i/anselm/software/debuggers/scalasca.md +++ b/docs.it4i/anselm/software/debuggers/scalasca.md @@ -33,8 +33,8 @@ After the application is instrumented, runtime measurement can be performed with An example : -```bash - $ scalasca -analyze mpirun -np 4 ./mympiprogram +```console + $ scalasca -analyze mpirun -np 4 ./mympiprogram ``` Some notable Scalasca options are: @@ -51,13 +51,13 @@ For the analysis, you must have [Score-P](score-p/) and [CUBE](cube/) modules lo To launch the analysis, run : -```bash +```console scalasca -examine [options] <experiment_directory> ``` If you do not wish to launch the GUI tool, use the "-s" option : -```bash +```console scalasca -examine -s <experiment_directory> ``` diff --git a/docs.it4i/anselm/software/debuggers/score-p.md b/docs.it4i/anselm/software/debuggers/score-p.md index 929d971faa2a8b465754c5563b09fa32f554eef2..3295933c45e6c7f8b7275a5bede4cef5064bd49f 100644 --- a/docs.it4i/anselm/software/debuggers/score-p.md +++ b/docs.it4i/anselm/software/debuggers/score-p.md @@ -25,7 +25,7 @@ There are three ways to instrument your parallel applications in order to enable is the easiest method. Score-P will automatically add instrumentation to every routine entry and exit using compiler hooks, and will intercept MPI calls and OpenMP regions. This method might, however, produce a large number of data. If you want to focus on profiler a specific regions of your code, consider using the manual instrumentation methods. To use automated instrumentation, simply prepend scorep to your compilation command. For example, replace: -```bash +```console $ mpif90 -c foo.f90 $ mpif90 -c bar.f90 $ mpif90 -o myapp foo.o bar.o @@ -33,7 +33,7 @@ $ mpif90 -o myapp foo.o bar.o with: -```bash +```console $ scorep mpif90 -c foo.f90 $ scorep mpif90 -c bar.f90 $ scorep mpif90 -o myapp foo.o bar.o diff --git a/docs.it4i/anselm/software/debuggers/total-view.md b/docs.it4i/anselm/software/debuggers/total-view.md index b4f710675111efe35ea5779625ac53046bc2722b..de618ace58562f36720e41a5dbb603c9b2478c06 100644 --- a/docs.it4i/anselm/software/debuggers/total-view.md +++ b/docs.it4i/anselm/software/debuggers/total-view.md @@ -6,7 +6,7 @@ TotalView is a GUI-based source code multi-process, multi-thread debugger. On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. These limitation means that: -```bash +```console 1 user can debug up 64 processes, or 32 users can debug 2 processes, etc. ``` @@ -15,8 +15,8 @@ Debugging of GPU accelerated codes is also supported. You can check the status of the licenses here: -```bash - cat /apps/user/licenses/totalview_features_state.txt +```console +$ cat /apps/user/licenses/totalview_features_state.txt # totalview # ------------------------------------------------- @@ -33,24 +33,21 @@ You can check the status of the licenses here: Load all necessary modules to compile the code. For example: -```bash - module load intel - - module load impi ... or ... module load openmpi/X.X.X-icc +```console +$ ml intel **or** ml foss ``` Load the TotalView module: -```bash - module load totalview/8.12 +```console +$ ml totalview/8.12 ``` Compile the code: -```bash - mpicc -g -O0 -o test_debug test.c - - mpif90 -g -O0 -o test_debug test.f +```console +$ mpicc -g -O0 -o test_debug test.c +$ mpif90 -g -O0 -o test_debug test.f ``` ### Compiler Flags @@ -65,16 +62,16 @@ Before debugging, you need to compile your code with theses flags: Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -```bash - ssh -X username@anselm.it4i.cz +```console +local $ ssh -X username@anselm.it4i.cz ``` Other options is to access login node using VNC. Please see the detailed information on how to use graphic user interface on Anselm. From the login node an interactive session with X windows forwarding (-X option) can be started by following command: -```bash - qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +```console +$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 ``` Then launch the debugger with the totalview command followed by the name of the executable to debug. @@ -83,8 +80,8 @@ Then launch the debugger with the totalview command followed by the name of the To debug a serial code use: -```bash - totalview test_debug +```console +$ totalview test_debug ``` ### Debugging a Parallel Code - Option 1 @@ -94,7 +91,7 @@ To debug a parallel code compiled with **OpenMPI** you need to setup your TotalV !!! hint To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your `~/.tvdrc` file: -```bash +```console proc mpi_auto_run_starter {loaded_id} { set starter_programs {mpirun mpiexec orterun} set executable_name [TV::symbol get $loaded_id full_pathname] @@ -116,8 +113,8 @@ To debug a parallel code compiled with **OpenMPI** you need to setup your TotalV The source code of this function can be also found in -```bash - /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl +```console +$ /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl ``` !!! note @@ -128,8 +125,8 @@ You need to do this step only once. Now you can run the parallel debugger using: -```bash - mpirun -tv -n 5 ./test_debug +```console +$ mpirun -tv -n 5 ./test_debug ``` When following dialog appears click on "Yes" @@ -146,10 +143,10 @@ Other option to start new parallel debugging session from a command line is to l The following example shows how to start debugging session with Intel MPI: -```bash - module load intel/13.5.192 impi/4.1.1.036 totalview/8/13 - - totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi +```console +$ ml intel +$ ml totalview +$ totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi ``` After running previous command you will see the same window as shown in the screenshot above. diff --git a/docs.it4i/anselm/software/debuggers/valgrind.md b/docs.it4i/anselm/software/debuggers/valgrind.md index 2602fdbf24c9bdf16503740541ed81c536628b5a..0e381e945c86c1a53af181b8cb62194171535bee 100644 --- a/docs.it4i/anselm/software/debuggers/valgrind.md +++ b/docs.it4i/anselm/software/debuggers/valgrind.md @@ -48,9 +48,9 @@ For example, lets look at this C code, which has two problems : Now, compile it with Intel compiler : -```bash - $ module add intel - $ icc -g valgrind-example.c -o valgrind-example +```console +$ module add intel +$ icc -g valgrind-example.c -o valgrind-example ``` Now, lets run it with Valgrind. The syntax is : @@ -59,8 +59,8 @@ Now, lets run it with Valgrind. The syntax is : If no Valgrind options are specified, Valgrind defaults to running Memcheck tool. Please refer to the Valgrind documentation for a full description of command line options. -```bash - $ valgrind ./valgrind-example +```console +$ valgrind ./valgrind-example ==12652== Memcheck, a memory error detector ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info @@ -93,8 +93,8 @@ If no Valgrind options are specified, Valgrind defaults to running Memcheck tool In the output we can see that Valgrind has detected both errors - the off-by-one memory access at line 5 and a memory leak of 40 bytes. If we want a detailed analysis of the memory leak, we need to run Valgrind with --leak-check=full option : -```bash - $ valgrind --leak-check=full ./valgrind-example +```console +$ valgrind --leak-check=full ./valgrind-example ==23856== Memcheck, a memory error detector ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info @@ -135,13 +135,13 @@ Now we can see that the memory leak is due to the malloc() at line 6. Although Valgrind is not primarily a parallel debugger, it can be used to debug parallel applications as well. When launching your parallel applications, prepend the valgrind command. For example : -```bash - $ mpirun -np 4 valgrind myapplication +```console +$ mpirun -np 4 valgrind myapplication ``` The default version without MPI support will however report a large number of false errors in the MPI library, such as : -```bash +```console ==30166== Conditional jump or move depends on uninitialised value(s) ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) @@ -178,16 +178,16 @@ Lets look at this MPI example : There are two errors - use of uninitialized memory and invalid length of the buffer. Lets debug it with valgrind : -```bash - $ module add intel impi - $ mpicc -g valgrind-example-mpi.c -o valgrind-example-mpi - $ module add valgrind/3.9.0-impi - $ mpirun -np 2 -env LD_PRELOAD /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi +```console +$ module add intel impi +$ mpicc -g valgrind-example-mpi.c -o valgrind-example-mpi +$ module add valgrind/3.9.0-impi +$ mpirun -np 2 -env LD_PRELOAD /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi ``` Prints this output : (note that there is output printed for every launched MPI process) -```bash +```console ==31318== Memcheck, a memory error detector ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info diff --git a/docs.it4i/anselm/software/debuggers/vampir.md b/docs.it4i/anselm/software/debuggers/vampir.md index 1c3009c8a4fe820473b812ec0067a83e3d1922d7..1dfa23e7b8eed6c9deaf04439df6b01ed6358480 100644 --- a/docs.it4i/anselm/software/debuggers/vampir.md +++ b/docs.it4i/anselm/software/debuggers/vampir.md @@ -8,9 +8,9 @@ Vampir is a commercial trace analysis and visualization tool. It can work with t Version 8.5.0 is currently installed as module Vampir/8.5.0 : -```bash - $ module load Vampir/8.5.0 - $ vampir & +```console +$ ml Vampir/8.5.0 +$ vampir & ``` ## User Manual diff --git a/docs.it4i/anselm/software/gpi2.md b/docs.it4i/anselm/software/gpi2.md index ec96e2653a3bfeb9614be13b969ff3273b3ee255..09241e15a96f7412f2e7652efda091d7868cd5d1 100644 --- a/docs.it4i/anselm/software/gpi2.md +++ b/docs.it4i/anselm/software/gpi2.md @@ -10,8 +10,8 @@ The GPI-2 library ([www.gpi-site.com/gpi2/](http://www.gpi-site.com/gpi2/)) impl The GPI-2, version 1.0.2 is available on Anselm via module gpi2: -```bash - $ module load gpi2 +```console +$ ml gpi2 ``` The module sets up environment variables, required for linking and running GPI-2 enabled applications. This particular command loads the default module, which is gpi2/1.0.2 @@ -25,18 +25,18 @@ Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link y ### Compiling and Linking With Intel Compilers -```bash - $ module load intel - $ module load gpi2 - $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs +```console +$ ml intel +$ ml gpi2 +$ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs ``` ### Compiling and Linking With GNU Compilers -```bash - $ module load gcc - $ module load gpi2 - $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs +```console +$ ml gcc +$ ml gpi2 +$ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs ``` ## Running the GPI-2 Codes @@ -46,19 +46,19 @@ Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link y The gaspi_run utility is used to start and run GPI-2 applications: -```bash - $ gaspi_run -m machinefile ./myprog.x +```console +$ gaspi_run -m machinefile ./myprog.x ``` A machine file (** machinefile **) with the hostnames of nodes where the application will run, must be provided. The machinefile lists all nodes on which to run, one entry per node per process. This file may be hand created or obtained from standard $PBS_NODEFILE: -```bash - $ cut -f1 -d"." $PBS_NODEFILE > machinefile +```console +$ cut -f1 -d"." $PBS_NODEFILE > machinefile ``` machinefile: -```bash +```console cn79 cn80 ``` @@ -67,7 +67,7 @@ This machinefile will run 2 GPI-2 processes, one on node cn79 other on node cn80 machinefle: -```bash +```console cn79 cn79 cn80 @@ -81,8 +81,8 @@ This machinefile will run 4 GPI-2 processes, 2 on node cn79 o 2 on node cn80. Example: -```bash - $ qsub -A OPEN-0-0 -q qexp -l select=2:ncpus=16:mpiprocs=16 -I +```console +$ qsub -A OPEN-0-0 -q qexp -l select=2:ncpus=16:mpiprocs=16 -I ``` This example will produce $PBS_NODEFILE with 16 entries per node. @@ -137,29 +137,28 @@ Following is an example GPI-2 enabled code: Load modules and compile: -```bash - $ module load gcc gpi2 - $ gcc helloworld_gpi.c -o helloworld_gpi.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs +```console +$ ml gcc gpi2 +$ gcc helloworld_gpi.c -o helloworld_gpi.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs ``` Submit the job and run the GPI-2 application -```bash - $ qsub -q qexp -l select=2:ncpus=1:mpiprocs=1,place=scatter,walltime=00:05:00 -I +```console +$ qsub -q qexp -l select=2:ncpus=1:mpiprocs=1,place=scatter,walltime=00:05:00 -I qsub: waiting for job 171247.dm2 to start qsub: job 171247.dm2 ready - - cn79 $ module load gpi2 - cn79 $ cut -f1 -d"." $PBS_NODEFILE > machinefile - cn79 $ gaspi_run -m machinefile ./helloworld_gpi.x +cn79 $ ml gpi2 +cn79 $ cut -f1 -d"." $PBS_NODEFILE > machinefile +cn79 $ gaspi_run -m machinefile ./helloworld_gpi.x Hello from rank 0 of 2 ``` At the same time, in another session, you may start the gaspi logger: -```bash - $ ssh cn79 - cn79 $ gaspi_logger +```console +$ ssh cn79 +cn79 $ gaspi_logger GASPI Logger (v1.1) [cn80:0] Hello from rank 1 of 2 ``` diff --git a/docs.it4i/anselm/software/intel-suite/intel-compilers.md b/docs.it4i/anselm/software/intel-suite/intel-compilers.md index 66de3b77a06d7333464336ada10d68cd3a899aa8..d446655d915833a139353d5c76015f70db9a9645 100644 --- a/docs.it4i/anselm/software/intel-suite/intel-compilers.md +++ b/docs.it4i/anselm/software/intel-suite/intel-compilers.md @@ -2,28 +2,28 @@ The Intel compilers version 13.1.1 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. -```bash - $ module load intel - $ icc -v - $ ifort -v +```console +$ ml intel +$ icc -v +$ ifort -v ``` The intel compilers provide for vectorization of the code, via the AVX instructions and support threading parallelization via OpenMP For maximum performance on the Anselm cluster, compile your programs using the AVX instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance -```bash - $ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x +```console +$ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x ``` In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggressive loop optimizations (-O3) and vectorization (-vec -xAVX) The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch. -```bash - $ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x +```console +$ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x ``` Read more at <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm> diff --git a/docs.it4i/anselm/software/intel-suite/intel-debugger.md b/docs.it4i/anselm/software/intel-suite/intel-debugger.md index f13086df7431676a95a75b5258a10667a3464c57..d3a5807fca1a0051c4424a5613f3faa57c26895a 100644 --- a/docs.it4i/anselm/software/intel-suite/intel-debugger.md +++ b/docs.it4i/anselm/software/intel-suite/intel-debugger.md @@ -4,30 +4,30 @@ The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use X display for running the GUI. -```bash - $ module load intel - $ idb +```baconsolesh +$ ml intel +$ idb ``` The debugger may run in text mode. To debug in text mode, use -```bash - $ idbc +```console +$ idbc ``` To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in the GUI section Example: -```bash - $ qsub -q qexp -l select=1:ncpus=16 -X -I +```console +$ qsub -q qexp -l select=1:ncpus=16 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19654.srv11 ready - $ module load intel - $ module load java - $ icc -O0 -g myprog.c -o myprog.x - $ idb ./myprog.x +$ ml intel +$ ml java +$ icc -O0 -g myprog.c -o myprog.x +$ idb ./myprog.x ``` In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger interactively on the myprog.x executable. The GUI access is via X11 port forwarding provided by the PBS workload manager. @@ -40,13 +40,13 @@ Intel debugger is capable of debugging multithreaded and MPI parallel programs a For debugging small number of MPI ranks, you may execute and debug each rank in separate xterm terminal (do not forget the X display. Using Intel MPI, this may be done in following way: -```bash - $ qsub -q qexp -l select=2:ncpus=16 -X -I +```console +$ qsub -q qexp -l select=2:ncpus=16 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready - $ module load intel impi - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x +$ ml intel +$ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x ``` In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of mympiprog.x application. The xterm will pop up for each rank, with idb prompt ready. The example is not limited to use of Intel MPI @@ -55,13 +55,13 @@ In this example, we allocate 2 full compute node, run xterm on each node and sta Run the idb debugger from within the MPI debug option. This will cause the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI: -```bash +```console $ qsub -q qexp -l select=2:ncpus=16 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready - $ module load intel impi - $ mpirun -n 32 -idb ./mympiprog.x +$ ml intel +$ mpirun -n 32 -idb ./mympiprog.x ``` ### Debugging Multithreaded Application diff --git a/docs.it4i/anselm/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/anselm/software/intel-suite/intel-integrated-performance-primitives.md index b92f8d05f62d9305f9624e592d388cf2744b5081..8e0451c69a082275e114c92acd223e3514317389 100644 --- a/docs.it4i/anselm/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/anselm/software/intel-suite/intel-integrated-performance-primitives.md @@ -7,8 +7,8 @@ Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector !!! note Check out IPP before implementing own math functions for data processing, it is likely already there. -```bash - $ module load ipp +```console +$ ml ipp ``` The module sets up environment variables, required for linking and running ipp enabled applications. @@ -58,20 +58,20 @@ The module sets up environment variables, required for linking and running ipp e Compile above example, using any compiler and the ipp module. -```bash - $ module load intel - $ module load ipp +```console +$ ml intel +$ ml ipp - $ icc testipp.c -o testipp.x -lippi -lipps -lippcore +$ icc testipp.c -o testipp.x -lippi -lipps -lippcore ``` You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable -```bash - $ module load intel - $ module load ipp +```console +$ ml intel +$ ml ipp - $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore +$ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore ``` ## Code Samples and Documentation diff --git a/docs.it4i/anselm/software/intel-suite/intel-mkl.md b/docs.it4i/anselm/software/intel-suite/intel-mkl.md index aed92ae69da6f721f676fa5e4180945711fe5fba..6594f8193b800fa1fb269b8611456c6311adafcf 100644 --- a/docs.it4i/anselm/software/intel-suite/intel-mkl.md +++ b/docs.it4i/anselm/software/intel-suite/intel-mkl.md @@ -15,10 +15,10 @@ Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, e For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm). -Intel MKL version 13.5.192 is available on Anselm +Intel MKL is available on Anselm -```bash - $ module load mkl +```console +$ ml imkl ``` The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES @@ -41,8 +41,8 @@ Linking MKL libraries may be complex. Intel [mkl link line advisor](http://softw You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line: -```bash - $ icc .... -Wl,-rpath=$LIBRARY_PATH ... +```console +$ icc .... -Wl,-rpath=$LIBRARY_PATH ... ``` ### Threading @@ -52,9 +52,9 @@ You will need the mkl module loaded to run the mkl enabled executable. This may For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS -```bash - $ export OMP_NUM_THREADS=16 - $ export KMP_AFFINITY=granularity=fine,compact,1,0 +```console +$ export OMP_NUM_THREADS=16 +$ export KMP_AFFINITY=granularity=fine,compact,1,0 ``` The application will run with 16 threads with affinity optimized for fine grain parallelization. @@ -65,50 +65,42 @@ Number of examples, demonstrating use of the MKL library and its linking is avai ### Working With Examples -```bash - $ module load intel - $ module load mkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ make sointel64 function=cblas_dgemm +```console +$ ml intel +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ make sointel64 function=cblas_dgemm ``` In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on Anselm. ### Example: MKL and Intel Compiler -```bash - $ module load intel - $ module load mkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - $ - $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x - $ ./cblas_dgemmx.x data/cblas_dgemmx.d +```console +$ ml intel +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x +$ ./cblas_dgemmx.x data/cblas_dgemmx.d ``` In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to: -```bash - $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 +```console +$ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 ``` In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation. ### Example: MKL and GNU Compiler -```bash - $ module load gcc - $ module load mkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm - - $ ./cblas_dgemmx.x data/cblas_dgemmx.d +```console +$ ml gcc +$ ml imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm +$ ./cblas_dgemmx.x data/cblas_dgemmx.d ``` In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation. diff --git a/docs.it4i/anselm/software/intel-suite/intel-tbb.md b/docs.it4i/anselm/software/intel-suite/intel-tbb.md index 3c2495ba8c0592df6556ab7c41c078dd3cedf5af..497b26f5e46a62604b7eb542bd0579b2c7fbd358 100644 --- a/docs.it4i/anselm/software/intel-suite/intel-tbb.md +++ b/docs.it4i/anselm/software/intel-suite/intel-tbb.md @@ -7,8 +7,8 @@ be offloaded to [MIC accelerator](../intel-xeon-phi/). Intel TBB version 4.1 is available on Anselm -```bash - $ module load tbb +```console +$ ml tbb ``` The module sets up environment variables, required for linking and running tbb enabled applications. @@ -20,21 +20,21 @@ The module sets up environment variables, required for linking and running tbb e Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory. -```bash - $ module load intel - $ module load tbb - $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ - $ cd /tmp/parallel_reduce/primes - $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb - $ ./primes.x +```console +$ ml intel +$ ml tbb +$ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ +$ cd /tmp/parallel_reduce/primes +$ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb +$ ./primes.x ``` In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime numbers. You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable. -```bash - $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb +```console +$ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb ``` ## Further Reading diff --git a/docs.it4i/anselm/software/intel-suite/introduction.md b/docs.it4i/anselm/software/intel-suite/introduction.md index f9f6f4093a1ed659c7cd4ed63bea944b4dd40ffe..879389f3f119e873d375b585da4e56f0dcfa5a79 100644 --- a/docs.it4i/anselm/software/intel-suite/introduction.md +++ b/docs.it4i/anselm/software/intel-suite/introduction.md @@ -12,10 +12,10 @@ The Anselm cluster provides following elements of the Intel Parallel Studio XE The Intel compilers version 13.1.3 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. -```bash - $ module load intel - $ icc -v - $ ifort -v +```console +$ ml intel +$ icc -v +$ ifort -v ``` Read more at the [Intel Compilers](intel-compilers/) page. @@ -24,9 +24,9 @@ Read more at the [Intel Compilers](intel-compilers/) page. The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use X display for running the GUI. -```bash - $ module load intel - $ idb +```console +$ ml intel +$ idb ``` Read more at the [Intel Debugger](intel-debugger/) page. @@ -35,8 +35,8 @@ Read more at the [Intel Debugger](intel-debugger/) page. Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. -```bash - $ module load mkl +```console +$ ml imkl ``` Read more at the [Intel MKL](intel-mkl/) page. @@ -45,8 +45,8 @@ Read more at the [Intel MKL](intel-mkl/) page. Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. -```bash - $ module load ipp +```console +$ ml ipp ``` Read more at the [Intel IPP](intel-integrated-performance-primitives/) page. @@ -55,8 +55,8 @@ Read more at the [Intel IPP](intel-integrated-performance-primitives/) page. Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. -```bash - $ module load tbb +```console +$ ml tbb ``` Read more at the [Intel TBB](intel-tbb/) page. diff --git a/docs.it4i/anselm/software/intel-xeon-phi.md b/docs.it4i/anselm/software/intel-xeon-phi.md index 31177009ef077be52ab41e93f3279fd79d651363..f22027b8151eae3f3155a2e7725bbbacebb09fb3 100644 --- a/docs.it4i/anselm/software/intel-xeon-phi.md +++ b/docs.it4i/anselm/software/intel-xeon-phi.md @@ -8,25 +8,25 @@ Intel Xeon Phi can be programmed in several modes. The default mode on Anselm is To get access to a compute node with Intel Xeon Phi accelerator, use the PBS interactive session -```bash +```console $ qsub -I -q qmic -A NONE-0-0 ``` To set up the environment module "Intel" has to be loaded -```bash -$ module load intel/13.5.192 +```console +$ ml intel ``` Information about the hardware can be obtained by running the micinfo program on the host. -```bash +```console $ /usr/bin/micinfo ``` The output of the "micinfo" utility executed on one of the Anselm node is as follows. (note: to get PCIe related details the command has to be run with root privileges) -```bash +```console MicInfo Utility Log Created Mon Jul 22 00:23:50 2013 @@ -92,14 +92,14 @@ The output of the "micinfo" utility executed on one of the Anselm node is as fol To compile a code for Intel Xeon Phi a MPSS stack has to be installed on the machine where compilation is executed. Currently the MPSS stack is only installed on compute nodes equipped with accelerators. -```bash +```console $ qsub -I -q qmic -A NONE-0-0 -$ module load intel/13.5.192 +$ ml intel ``` For debugging purposes it is also recommended to set environment variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher number means more debugging information. -```bash +```console export OFFLOAD_REPORT=3 ``` @@ -108,8 +108,8 @@ A very basic example of code that employs offload programming technique is shown !!! note This code is sequential and utilizes only single core of the accelerator. -```bash - $ vim source-offload.cpp +```console +$ vim source-offload.cpp #include <iostream> @@ -130,22 +130,22 @@ A very basic example of code that employs offload programming technique is shown To compile a code using Intel compiler run -```bash - $ icc source-offload.cpp -o bin-offload +```console +$ icc source-offload.cpp -o bin-offload ``` To execute the code, run the following command on the host -```bash - ./bin-offload +```console +$ ./bin-offload ``` ### Parallelization in Offload Mode Using OpenMP One way of paralelization a code for Xeon Phi is using OpenMP directives. The following example shows code for parallel vector addition. -```bash - $ vim ./vect-add +```console +$ vim ./vect-add #include <stdio.h> @@ -224,10 +224,9 @@ One way of paralelization a code for Xeon Phi is using OpenMP directives. The fo During the compilation Intel compiler shows which loops have been vectorized in both host and accelerator. This can be enabled with compiler option "-vec-report2". To compile and execute the code run -```bash - $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add - - $ ./vect-add +```console +$ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add +$ ./vect-add ``` Some interesting compiler flags useful not only for code debugging are: @@ -255,8 +254,8 @@ The Automatic Offload may be enabled by either an MKL function call within the c or by setting environment variable -```bash - $ export MKL_MIC_ENABLE=1 +```console +$ export MKL_MIC_ENABLE=1 ``` To get more information about automatic offload please refer to "[Using Intel® MKL Automatic Offload on Intel ® Xeon Phiâ„¢ Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)" white paper or [Intel MKL documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). @@ -265,15 +264,15 @@ To get more information about automatic offload please refer to "[Using Intel® At first get an interactive PBS session on a node with MIC accelerator and load "intel" module that automatically loads "mkl" module as well. -```bash - $ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 - $ module load intel +```console +$ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 +$ module load intel ``` Following example show how to automatically offload an SGEMM (single precision - general matrix multiply) function to MIC coprocessor. The code can be copied to a file and compiled without any necessary modification. -```bash - $ vim sgemm-ao-short.c +```console +$ vim sgemm-ao-short.c #include <stdio.h> #include <stdlib.h> @@ -334,19 +333,19 @@ Following example show how to automatically offload an SGEMM (single precision - To compile a code using Intel compiler use: -```bash - $ icc -mkl sgemm-ao-short.c -o sgemm +```console +$ icc -mkl sgemm-ao-short.c -o sgemm ``` For debugging purposes enable the offload report to see more information about automatic offloading. -```bash - $ export OFFLOAD_REPORT=2 +```console +$ export OFFLOAD_REPORT=2 ``` The output of a code should look similar to following listing, where lines starting with [MKL] are generated by offload reporting: -```bash +```console Computing SGEMM on the host Enabling Automatic Offload Automatic Offload enabled: 1 MIC devices present @@ -366,10 +365,9 @@ In the native mode a program is executed directly on Intel Xeon Phi without invo To compile a code user has to be connected to a compute with MIC and load Intel compilers module. To get an interactive session on a compute node with an Intel Xeon Phi and load the module use following commands: -```bash - $ qsub -I -q qmic -A NONE-0-0 - - $ module load intel/13.5.192 +```console +$ qsub -I -q qmic -A NONE-0-0 +$ ml intel ``` !!! note @@ -377,20 +375,20 @@ To compile a code user has to be connected to a compute with MIC and load Intel To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: -```bash - $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host +```console +$ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host ``` To run this code on host, use: -```bash - $ ./vect-add-host +```console +$ ./vect-add-host ``` The second example shows how to compile the same code for Intel Xeon Phi: -```bash - $ icc -mmic -fopenmp vect-add.c -o vect-add-mic +```console +$ icc -mmic -fopenmp vect-add.c -o vect-add-mic ``` ### Execution of the Program in Native Mode on Intel Xeon Phi @@ -399,20 +397,20 @@ The user access to the Intel Xeon Phi is through the SSH. Since user home direct To connect to the accelerator run: -```bash - $ ssh mic0 +```console +$ ssh mic0 ``` If the code is sequential, it can be executed directly: -```bash - mic0 $ ~/path_to_binary/vect-add-seq-mic +```console +mic0 $ ~/path_to_binary/vect-add-seq-mic ``` If the code is parallelized using OpenMP a set of additional libraries is required for execution. To locate these libraries new path has to be added to the LD_LIBRARY_PATH environment variable prior to the execution: -```bash - mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH +```console +mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` !!! note @@ -431,8 +429,8 @@ For your information the list of libraries and their location required for execu Finally, to run the compiled code use: -```bash - $ ~/path_to_binary/vect-add-mic +```console +$ ~/path_to_binary/vect-add-mic ``` ## OpenCL @@ -441,42 +439,42 @@ OpenCL (Open Computing Language) is an open standard for general-purpose paralle On Anselm OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. -```bash - module load opencl-sdk opencl-rt +```console +module load opencl-sdk opencl-rt ``` Always load "opencl-sdk" (providing devel files like headers) and "opencl-rt" (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load "opencl-rt" for running your compiled code. There are two basic examples of OpenCL code in the following directory: -```bash - /apps/intel/opencl-examples/ +```console +/apps/intel/opencl-examples/ ``` First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC, and prints basic information about the capabilities of it. -```bash - /apps/intel/opencl-examples/CapsBasic/capsbasic +```console +/apps/intel/opencl-examples/CapsBasic/capsbasic ``` To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Anselm. -```bash - $ cp /apps/intel/opencl-examples/CapsBasic/* . - $ qsub -I -q qmic -A NONE-0-0 - $ make +```console +$ cp /apps/intel/opencl-examples/CapsBasic/* . +$ qsub -I -q qmic -A NONE-0-0 +$ make ``` The compilation command for this example is: -```bash - $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ +```console +$ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ ``` After executing the complied binary file, following output should be displayed. -```bash - ./capsbasic +```console +$ ./capsbasic Number of available platforms: 1 Platform names: @@ -506,22 +504,22 @@ After executing the complied binary file, following output should be displayed. The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. -```bash - $ cp -r /apps/intel/opencl-examples/* . - $ qsub -I -q qmic -A NONE-0-0 - $ cd GEMM - $ make +```console +$ cp -r /apps/intel/opencl-examples/* . +$ qsub -I -q qmic -A NONE-0-0 +$ cd GEMM +$ make ``` The compilation command for this example is: -```bash - $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ +```console +$ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ ``` To see the performance of Intel Xeon Phi performing the DGEMM run the example as follows: -```bash +```console ./gemm -d 1 Platforms (1): [0] Intel(R) OpenCL [Selected] @@ -550,26 +548,26 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as Again an MPI code for Intel Xeon Phi has to be compiled on a compute node with accelerator and MPSS software stack installed. To get to a compute node with accelerator use: -```bash - $ qsub -I -q qmic -A NONE-0-0 +```console +$ qsub -I -q qmic -A NONE-0-0 ``` The only supported implementation of MPI standard for Intel Xeon Phi is Intel MPI. To setup a fully functional development environment a combination of Intel compiler and Intel MPI has to be used. On a host load following modules before compilation: -```bash - $ module load intel/13.5.192 impi/4.1.1.036 +```console +$ module load intel ``` To compile an MPI code for host use: -````bash - $ mpiicc -xhost -o mpi-test mpi-test.c - ```bash +````console +$ mpiicc -xhost -o mpi-test mpi-test.c +``` - To compile the same code for Intel Xeon Phi architecture use: +To compile the same code for Intel Xeon Phi architecture use: - ```bash - $ mpiicc -mmic -o mpi-test-mic mpi-test.c +```console +$ mpiicc -mmic -o mpi-test-mic mpi-test.c ```` An example of basic MPI version of "hello-world" example in C language, that can be executed on both host and Xeon Phi is (can be directly copy and pasted to a .c file) @@ -614,13 +612,13 @@ Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: In this case all environment variables are set by modules, so to execute the compiled MPI program on a single node, use: -```bash - $ mpirun -np 4 ./mpi-test +```console +$ mpirun -np 4 ./mpi-test ``` The output should be similar to: -```bash +```console Hello world from process 1 of 4 on host cn207 Hello world from process 3 of 4 on host cn207 Hello world from process 2 of 4 on host cn207 @@ -636,8 +634,8 @@ coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a host. Similarly to execution of OpenMP programs in native mode, since the environmental module are not supported on MIC, user has to setup paths to Intel MPI libraries and binaries manually. One time setup can be done by creating a "**.profile**" file in user's home directory. This file sets up the environment on the MIC automatically once user access to the accelerator through the SSH. -```bash - $ vim ~/.profile +```console +$ vim ~/.profile PS1='[u@h W]$ ' export PATH=/usr/bin:/usr/sbin:/bin:/sbin @@ -656,25 +654,25 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta To access a MIC accelerator located on a node that user is currently connected to, use: -```bash - $ ssh mic0 +```console +$ ssh mic0 ``` or in case you need specify a MIC accelerator on a particular node, use: -```bash - $ ssh cn207-mic0 +```console +$ ssh cn207-mic0 ``` To run the MPI code in parallel on multiple core of the accelerator, use: -```bash - $ mpirun -np 4 ./mpi-test-mic +```console +$ mpirun -np 4 ./mpi-test-mic ``` The output should be similar to: -```bash +```console Hello world from process 1 of 4 on host cn207-mic0 Hello world from process 2 of 4 on host cn207-mic0 Hello world from process 3 of 4 on host cn207-mic0 @@ -687,20 +685,20 @@ If the MPI program is launched from host instead of the coprocessor, the environ First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the environmental variable "I_MPI_MIC" -```bash - $ export I_MPI_MIC=1 +```console +$ export I_MPI_MIC=1 ``` Now the MPI program can be executed as: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +```console +$ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` or using mpirun -```bash - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` !!! note @@ -709,7 +707,7 @@ or using mpirun The output should be again similar to: -```bash +```console Hello world from process 1 of 4 on host cn207-mic0 Hello world from process 2 of 4 on host cn207-mic0 Hello world from process 3 of 4 on host cn207-mic0 @@ -721,8 +719,8 @@ The output should be again similar to: A simple test to see if the file is present is to execute: -```bash - $ ssh mic0 ls /bin/pmi_proxy +```console +$ ssh mic0 ls /bin/pmi_proxy /bin/pmi_proxy ``` @@ -730,21 +728,20 @@ A simple test to see if the file is present is to execute: To get access to multiple nodes with MIC accelerator, user has to use PBS to allocate the resources. To start interactive session, that allocates 2 compute nodes = 2 MIC accelerators run qsub command with following parameters: -```bash - $ qsub -I -q qmic -A NONE-0-0 -l select=2:ncpus=16 - - $ module load intel/13.5.192 impi/4.1.1.036 +```console +$ qsub -I -q qmic -A NONE-0-0 -l select=2:ncpus=16 +$ ml intel/13.5.192 impi/4.1.1.036 ``` This command connects user through ssh to one of the nodes immediately. To see the other nodes that have been allocated use: -```bash - $ cat $PBS_NODEFILE +```console +$ cat $PBS_NODEFILE ``` For example: -```bash +```console cn204.bullx cn205.bullx ``` @@ -759,14 +756,14 @@ This output means that the PBS allocated nodes cn204 and cn205, which means that At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" -```bash - $ export I_MPI_MIC=1 +```console +$ export I_MPI_MIC=1 ``` The launch the MPI program use: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ +```console +$ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -genv I_MPI_FABRICS_LIST tcp -genv I_MPI_FABRICS shm:tcp -genv I_MPI_TCP_NETMASK=10.1.0.0/16 @@ -776,8 +773,8 @@ The launch the MPI program use: or using mpirun: -```bash - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ +```console +$ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -genv I_MPI_FABRICS_LIST tcp -genv I_MPI_FABRICS shm:tcp -genv I_MPI_TCP_NETMASK=10.1.0.0/16 @@ -787,7 +784,7 @@ or using mpirun: In this case four MPI processes are executed on accelerator cn204-mic and six processes are executed on accelerator cn205-mic0. The sample output (sorted after execution) is: -```bash +```console Hello world from process 0 of 10 on host cn204-mic0 Hello world from process 1 of 10 on host cn204-mic0 Hello world from process 2 of 10 on host cn204-mic0 @@ -802,8 +799,8 @@ In this case four MPI processes are executed on accelerator cn204-mic and six pr The same way MPI program can be executed on multiple hosts: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ +```console +$ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -genv I_MPI_FABRICS_LIST tcp -genv I_MPI_FABRICS shm:tcp -genv I_MPI_TCP_NETMASK=10.1.0.0/16 @@ -818,8 +815,8 @@ architecture and requires different binary file produced by the Intel compiler t In the previous section we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using mpiexec.hydra: -```bash - $ mpiexec.hydra +```console +$ mpiexec.hydra -genv I_MPI_FABRICS_LIST tcp -genv I_MPI_FABRICS shm:tcp -genv I_MPI_TCP_NETMASK=10.1.0.0/16 @@ -832,7 +829,7 @@ In this example the first two parameters (line 2 and 3) sets up required environ The output of the program is: -```bash +```console Hello world from process 0 of 4 on host cn205 Hello world from process 1 of 4 on host cn205 Hello world from process 2 of 4 on host cn205-mic0 @@ -843,8 +840,8 @@ The execution procedure can be simplified by using the mpirun command with the m An example of a machine file that uses 2 >hosts (**cn205** and **cn206**) and 2 accelerators **(cn205-mic0** and **cn206-mic0**) to run 2 MPI processes on each of them: -```bash - $ cat hosts_file_mix +```console +$ cat hosts_file_mix cn205:2 cn205-mic0:2 cn206:2 @@ -853,14 +850,14 @@ An example of a machine file that uses 2 >hosts (**cn205** and **cn206**) and 2 In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not have to specify the names of booth binaries. In this case mpirun needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). -```bash - $ export I_MPI_MIC_POSTFIX=-mic +```console +$ export I_MPI_MIC_POSTFIX=-mic ``` To run the MPI code using mpirun and the machine file "hosts_file_mix" use: -```bash - $ mpirun +```console +$ mpirun -genv I_MPI_FABRICS shm:tcp -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -genv I_MPI_FABRICS_LIST tcp @@ -872,7 +869,7 @@ To run the MPI code using mpirun and the machine file "hosts_file_mix" use: A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: -```bash +```console Hello world from process 0 of 8 on host cn204 Hello world from process 1 of 8 on host cn204 Hello world from process 2 of 8 on host cn204-mic0 diff --git a/docs.it4i/anselm/software/isv_licenses.md b/docs.it4i/anselm/software/isv_licenses.md index 56270b51feca30fe2ec4f297da6cb0d6ee62d6e7..f26319ec1c0bcbe64bc4ca0ae92975a60572cabd 100644 --- a/docs.it4i/anselm/software/isv_licenses.md +++ b/docs.it4i/anselm/software/isv_licenses.md @@ -15,8 +15,7 @@ If an ISV application was purchased for educational (research) purposes and also ### Web Interface -For each license there is a table, which provides the information about the name, number of available (purchased/licensed), number of used and number of free license features -<https://extranet.it4i.cz/anselm/licenses> +For each license there is a table, which provides the information about the name, number of available (purchased/licensed), number of used and number of free license features <https://extranet.it4i.cz/anselm/licenses> ### Text Interface @@ -34,8 +33,8 @@ The file has a header which serves as a legend. All the info in the legend start Example of the Commercial Matlab license state: -```bash - $ cat /apps/user/licenses/matlab_features_state.txt +```console +$ cat /apps/user/licenses/matlab_features_state.txt # matlab # ------------------------------------------------- # FEATURE TOTAL USED AVAIL @@ -99,8 +98,8 @@ Resource names in PBS Pro are case sensitive. Run an interactive PBS job with 1 Matlab EDU license, 1 Distributed Computing Toolbox and 32 Distributed Computing Engines (running on 32 cores): -```bash - $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 -l feature__matlab-edu__MATLAB=1 -l feature__matlab-edu__Distrib_Computing_Toolbox=1 -l feature__matlab-edu__MATLAB_Distrib_Comp_Engine=32 +```console +$ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 -l feature__matlab-edu__MATLAB=1 -l feature__matlab-edu__Distrib_Computing_Toolbox=1 -l feature__matlab-edu__MATLAB_Distrib_Comp_Engine=32 ``` The license is used and accounted only with the real usage of the product. So in this example, the general Matlab is used after Matlab is run by the user and not at the time, when the shell of the interactive job is started. Also the Distributed Computing licenses are used at the time, when the user uses the distributed parallel computation in Matlab (e. g. issues pmode start, matlabpool, etc.). diff --git a/docs.it4i/anselm/software/java.md b/docs.it4i/anselm/software/java.md index ddf032eb4eef469e8c68de98f16965696b153c72..a9de126760592f8fdb983242eb397ebf00c80c42 100644 --- a/docs.it4i/anselm/software/java.md +++ b/docs.it4i/anselm/software/java.md @@ -4,24 +4,24 @@ Java is available on Anselm cluster. Activate java by loading the java module -```bash - $ module load java +```console +$ ml Java ``` Note that the java module must be loaded on the compute nodes as well, in order to run java on compute nodes. Check for java version and path -```bash - $ java -version - $ which java +```console +$ java -version +$ which java ``` With the module loaded, not only the runtime environment (JRE), but also the development environment (JDK) with the compiler is available. -```bash - $ javac -version - $ which javac +```console +$ javac -version +$ which javac ``` Java applications may use MPI for inter-process communication, in conjunction with OpenMPI. Read more on <http://www.open-mpi.org/faq/?category=java>. This functionality is currently not supported on Anselm cluster. In case you require the java interface to MPI, please contact [Anselm support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/anselm/software/mpi/Running_OpenMPI.md b/docs.it4i/anselm/software/mpi/Running_OpenMPI.md index 8e11a3c163bcac6a711e18c4232a98a6acb5a16f..4974eb5b16625faa930a69cded916948257d00a5 100644 --- a/docs.it4i/anselm/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/anselm/software/mpi/Running_OpenMPI.md @@ -11,16 +11,14 @@ The OpenMPI programs may be executed only via the PBS Workload manager, by enter Example: -```bash - $ qsub -q qexp -l select=4:ncpus=16 -I +```console +$ qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready - - $ pwd +$ pwd /home/username - - $ module load openmpi - $ mpiexec -pernode ./helloworld_mpi.x +$ ml OpenMPI +$ mpiexec -pernode ./helloworld_mpi.x Hello world! from rank 0 of 4 on host cn17 Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 @@ -35,11 +33,10 @@ same path on all nodes. This is automatically fulfilled on the /home and /scratc You need to preload the executable, if running on the local scratch /lscratch filesystem -```bash - $ pwd +```console +$ pwd /lscratch/15210.srv11 - - $ mpiexec -pernode --preload-binary ./helloworld_mpi.x +$ mpiexec -pernode --preload-binary ./helloworld_mpi.x Hello world! from rank 0 of 4 on host cn17 Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 @@ -57,12 +54,10 @@ The mpiprocs and ompthreads parameters allow for selection of number of running Follow this example to run one MPI process per node, 16 threads per process. -```bash - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I - - $ module load openmpi - - $ mpiexec --bind-to-none ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I +$ ml OpenMPI +$ mpiexec --bind-to-none ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 16 threads per socket, on 4 nodes. @@ -71,12 +66,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpiexec. -```bash - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I - - $ module load openmpi - - $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I +$ ml openmpi +$ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 8 threads per socket, each process and its threads bound to a separate processor socket of the node, on 4 nodes @@ -85,12 +78,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run 16 MPI processes per node, 1 thread per process. Note the options to mpiexec. -```bash - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I - - $ module load openmpi - - $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I +$ ml OpenMPI +$ mpiexec -bycore -bind-to-core ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 16 MPI processes per node, single threaded. Each process is bound to separate processor core, on 4 nodes. @@ -102,19 +93,19 @@ In this example, we demonstrate recommended way to run an MPI application, using In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: -```bash - $ export GOMP_CPU_AFFINITY="0-15" +```console +$ export GOMP_CPU_AFFINITY="0-15" ``` or this one for Intel OpenMP: -```bash +```console $ export KMP_AFFINITY=granularity=fine,compact,1,0 ``` As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: -```bash +```console $ export OMP_PROC_BIND=true $ export OMP_PLACES=cores ``` @@ -129,7 +120,7 @@ MPI process mapping may be specified by a hostfile or rankfile input to the mpie Example hostfile -```bash +```console cn110.bullx cn109.bullx cn108.bullx @@ -138,8 +129,8 @@ Example hostfile Use the hostfile to control process placement -```bash - $ mpiexec -hostfile hostfile ./helloworld_mpi.x +```console +$ mpiexec -hostfile hostfile ./helloworld_mpi.x Hello world! from rank 0 of 4 on host cn110 Hello world! from rank 1 of 4 on host cn109 Hello world! from rank 2 of 4 on host cn108 @@ -157,7 +148,7 @@ Exact control of MPI process placement and resource binding is provided by speci Example rankfile -```bash +```console rank 0=cn110.bullx slot=1:0,1 rank 1=cn109.bullx slot=0:* rank 2=cn108.bullx slot=1:1-2 @@ -174,8 +165,8 @@ rank 2 will be bounded to cn108, socket1, core1 and core2 rank 3 will be bounded to cn17, socket0 core1, socket1 core0, core1, core2 rank 4 will be bounded to cn109, all cores on both sockets -```bash - $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x +```console +$ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x [cn17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . .][B B B . . . . .] (slot list 0:1,1:0-2) [cn110:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . .][B B . . . . . .] (slot list 1:0,1) [cn109:10395] MCW rank 1 bound to socket 0[core 0-7]: [B B B B B B B B][. . . . . . . .] (slot list 0:*) @@ -196,10 +187,10 @@ It is users responsibility to provide correct number of ranks, sockets and cores In all cases, binding and threading may be verified by executing for example: -```bash - $ mpiexec -bysocket -bind-to-socket --report-bindings echo - $ mpiexec -bysocket -bind-to-socket numactl --show - $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS +```console +$ mpiexec -bysocket -bind-to-socket --report-bindings echo +$ mpiexec -bysocket -bind-to-socket numactl --show +$ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS ``` ## Changes in OpenMPI 1.8 diff --git a/docs.it4i/anselm/software/mpi/mpi.md b/docs.it4i/anselm/software/mpi/mpi.md index bc60afb16ebee9968d942c0e4189f79705118276..4313bf513d5262a4b3eba0f1ef10380142f3a2ef 100644 --- a/docs.it4i/anselm/software/mpi/mpi.md +++ b/docs.it4i/anselm/software/mpi/mpi.md @@ -14,10 +14,8 @@ The Anselm cluster provides several implementations of the MPI library: MPI libraries are activated via the environment modules. -Look up section modulefiles/mpi in module avail - -```bash - $ module avail +```console +$ ml av mpi/ ------------------------- /opt/modules/modulefiles/mpi ------------------------- bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc impi/4.0.3.008 openmpi/1.6.5-gcc(default) @@ -43,17 +41,17 @@ There are default compilers associated with any particular MPI implementation. T Examples: -```bash - $ module load openmpi +```console +$ ml OpenMPI **or** ml openmpi **for older versions** ``` In this example, we activate the latest openmpi with latest GNU compilers To use openmpi with the intel compiler suite, use -```bash - $ module load intel - $ module load openmpi/1.6.5-icc +```console +$ ml intel +$ ml openmpi/1.6.5-icc ``` In this example, the openmpi 1.6.5 using intel compilers is activated @@ -63,10 +61,10 @@ In this example, the openmpi 1.6.5 using intel compilers is activated !!! note After setting up your MPI environment, compile your program using one of the mpi wrappers -```bash - $ mpicc -v - $ mpif77 -v - $ mpif90 -v +```console +$ mpicc -v +$ mpif77 -v +$ mpif90 -v ``` Example program: @@ -101,8 +99,8 @@ Example program: Compile the above example with -```bash - $ mpicc helloworld_mpi.c -o helloworld_mpi.x +```console +$ mpicc helloworld_mpi.c -o helloworld_mpi.x ``` ## Running MPI Programs diff --git a/docs.it4i/anselm/software/mpi/mpi4py-mpi-for-python.md b/docs.it4i/anselm/software/mpi/mpi4py-mpi-for-python.md index 9625ed53e88575101548ddbe48687829ac18414c..eeb15b3d0f71098d8daa128fbe93f20a47a56edd 100644 --- a/docs.it4i/anselm/software/mpi/mpi4py-mpi-for-python.md +++ b/docs.it4i/anselm/software/mpi/mpi4py-mpi-for-python.md @@ -12,9 +12,9 @@ On Anselm MPI4Py is available in standard Python modules. MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to load Python and OpenMPI modules. -```bash - $ module load python - $ module load openmpi +```console +$ ml Python +$ ml OpenMPI ``` ## Execution @@ -27,14 +27,14 @@ You need to import MPI to your python program. Include the following line to the The MPI4Py enabled python programs [execute as any other OpenMPI](Running_OpenMPI/) code.The simpliest way is to run -```bash - $ mpiexec python <script>.py +```console +$ mpiexec python <script>.py ``` For example -```bash - $ mpiexec python hello_world.py +```console +$ mpiexec python hello_world.py ``` ## Examples @@ -82,12 +82,11 @@ For example Execute the above code as: -```bash - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I - - $ module load python openmpi - - $ mpiexec -bycore -bind-to-core python hello_world.py +```console +$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I +$ ml Python +$ ml OpenMPI +$ mpiexec -bycore -bind-to-core python hello_world.py ``` In this example, we run MPI4Py enabled code on 4 nodes, 16 cores per node (total of 64 processes), each python process is bound to a different core. More examples and documentation can be found on [MPI for Python webpage](https://pypi.python.org/pypi/mpi4py). diff --git a/docs.it4i/anselm/software/mpi/running-mpich2.md b/docs.it4i/anselm/software/mpi/running-mpich2.md index 64d3c620fddf82b25339d535fb984067924ef29a..7b37a811802ffe6aa142cad5773cfc20e842b6fd 100644 --- a/docs.it4i/anselm/software/mpi/running-mpich2.md +++ b/docs.it4i/anselm/software/mpi/running-mpich2.md @@ -11,14 +11,12 @@ The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS Example: -```bash - $ qsub -q qexp -l select=4:ncpus=16 -I +```console +$ qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready - - $ module load impi - - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x +$ ml impi +$ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x Hello world! from rank 0 of 4 on host cn17 Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 @@ -30,11 +28,11 @@ Note that the executable helloworld_mpi.x must be available within the same path You need to preload the executable, if running on the local scratch /lscratch filesystem -```bash - $ pwd +```console +$ pwd /lscratch/15210.srv11 - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE cp /home/username/helloworld_mpi.x . - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x +$ mpirun -ppn 1 -hostfile $PBS_NODEFILE cp /home/username/helloworld_mpi.x . +$ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x Hello world! from rank 0 of 4 on host cn17 Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 @@ -52,12 +50,10 @@ The mpiprocs and ompthreads parameters allow for selection of number of running Follow this example to run one MPI process per node, 16 threads per process. Note that no options to mpirun are needed -```bash - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I - - $ module load mvapich2 - - $ mpirun ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I +$ ml mvapich2 +$ mpirun ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 16 threads per socket, on 4 nodes. @@ -66,12 +62,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpirun for mvapich2. No options are needed for impi. -```bash - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I - - $ module load mvapich2 - - $ mpirun -bind-to numa ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I +$ ml mvapich2 +$ mpirun -bind-to numa ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 8 threads per socket, each process and its threads bound to a separate processor socket of the node, on 4 nodes @@ -80,12 +74,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run 16 MPI processes per node, 1 thread per process. Note the options to mpirun for mvapich2. No options are needed for impi. -```bash - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I - - $ module load mvapich2 - - $ mpirun -bind-to core ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I +$ ml mvapich2 +$ mpirun -bind-to core ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 16 MPI processes per node, single threaded. Each process is bound to separate processor core, on 4 nodes. @@ -97,21 +89,21 @@ In this example, we demonstrate recommended way to run an MPI application, using In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: -```bash - $ export GOMP_CPU_AFFINITY="0-15" +```console +$ export GOMP_CPU_AFFINITY="0-15" ``` or this one for Intel OpenMP: -```bash - $ export KMP_AFFINITY=granularity=fine,compact,1,0 +```console +$ export KMP_AFFINITY=granularity=fine,compact,1,0 ``` As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: -```bash - $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores +```console +$ export OMP_PROC_BIND=true +$ export OMP_PLACES=cores ``` ## MPICH2 Process Mapping and Binding @@ -124,7 +116,7 @@ Process mapping may be controlled by specifying a machinefile input to the mpiru Example machinefile -```bash +```console cn110.bullx cn109.bullx cn108.bullx @@ -134,8 +126,8 @@ Example machinefile Use the machinefile to control process placement -```bash - $ mpirun -machinefile machinefile helloworld_mpi.x +```console +$ mpirun -machinefile machinefile helloworld_mpi.x Hello world! from rank 0 of 5 on host cn110 Hello world! from rank 1 of 5 on host cn109 Hello world! from rank 2 of 5 on host cn108 @@ -153,9 +145,9 @@ The Intel MPI automatically binds each process and its threads to the correspond In all cases, binding and threading may be verified by executing -```bash - $ mpirun -bindto numa numactl --show - $ mpirun -bindto numa echo $OMP_NUM_THREADS +```console +$ mpirun -bindto numa numactl --show +$ mpirun -bindto numa echo $OMP_NUM_THREADS ``` ## Intel MPI on Xeon Phi diff --git a/docs.it4i/anselm/software/numerical-languages/introduction.md b/docs.it4i/anselm/software/numerical-languages/introduction.md index 67493f1f7d099c0c9a8986b2118bff77aa4dd38b..8646fe6fed34038028fdab9dbcde98840d204944 100644 --- a/docs.it4i/anselm/software/numerical-languages/introduction.md +++ b/docs.it4i/anselm/software/numerical-languages/introduction.md @@ -10,9 +10,9 @@ This section contains a collection of high-level interpreted languages, primaril MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. -```bash - $ module load MATLAB/2015b-EDU - $ matlab +```console +$ ml MATLAB/2015b-EDU +$ matlab ``` Read more at the [Matlab page](matlab/). @@ -21,9 +21,9 @@ Read more at the [Matlab page](matlab/). GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to Matlab so that most programs are easily portable. -```bash - $ module load Octave - $ octave +```console +$ ml Octave +$ octave ``` Read more at the [Octave page](octave/). @@ -32,9 +32,9 @@ Read more at the [Octave page](octave/). The R is an interpreted language and environment for statistical computing and graphics. -```bash - $ module load R - $ R +```console +$ ml R +$ R ``` Read more at the [R page](r/). diff --git a/docs.it4i/anselm/software/numerical-languages/matlab.md b/docs.it4i/anselm/software/numerical-languages/matlab.md index d7c3d907452ca38deea8f07235170ead3114c1eb..ac1b0cc5e6b5728f0079b57b771ec17a219f4d8d 100644 --- a/docs.it4i/anselm/software/numerical-languages/matlab.md +++ b/docs.it4i/anselm/software/numerical-languages/matlab.md @@ -9,14 +9,14 @@ Matlab is available in versions R2015a and R2015b. There are always two variants To load the latest version of Matlab load the module -```bash - $ module load MATLAB +```console +$ ml MATLAB ``` By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use -```bash - $ module avail MATLAB +```console +$ ml av MATLAB ``` If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. @@ -27,14 +27,14 @@ Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so u To run Matlab with GUI, use -```bash - $ matlab +```console +$ matlab ``` To run Matlab in text mode, without the Matlab Desktop GUI environment, use -```bash - $ matlab -nodesktop -nosplash +```console +$ matlab -nodesktop -nosplash ``` plots, images, etc... will be still available. @@ -50,7 +50,7 @@ Delete previously used file mpiLibConf.m, we have observed crashes when using In To use Distributed Computing, you first need to setup a parallel profile. We have provided the profile for you, you can either import it in MATLAB command line: -```bash +```console >> parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonPBSPro.settings') ans = @@ -71,10 +71,9 @@ With the new mode, MATLAB itself launches the workers via PBS, so you can either Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see [this page](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-system/). -```bash - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 +```console +$ xhost + +$ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 -l feature__matlab__MATLAB=1 ``` This qsub command example shows how to run Matlab on a single node. @@ -83,9 +82,9 @@ The second part of the command shows how to request all necessary licenses. In t Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: -```bash - r1i0n17$ module load MATLAB/2015b-EDU - r1i0n17$ matlab & +```console +r1i0n17$ ml MATLAB/2015b-EDU +r1i0n17$ matlab & ``` ### Parallel Matlab Batch Job in Local Mode @@ -119,15 +118,15 @@ This script may be submitted directly to the PBS workload manager via the qsub c Submit the jobscript using qsub -```bash - $ qsub ./jobscript +```console +$ qsub ./jobscript ``` ### Parallel Matlab Local Mode Program Example The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. -```bash +```console cluster = parcluster('local') ``` @@ -138,7 +137,7 @@ This script creates scheduler object "cluster" of type "local" that starts worke The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. -```bash +```console parpool(cluster,16); @@ -150,7 +149,7 @@ The last step is to start matlabpool with "cluster" object and correct number of The complete example showing how to use Distributed Computing Toolbox in local mode is shown here. -```bash +```console cluster = parcluster('local'); cluster @@ -183,7 +182,7 @@ This mode uses PBS scheduler to launch the parallel pool. It uses the SalomonPBS This is an example of m-script using PBS mode: -```bash +```console cluster = parcluster('SalomonPBSPro'); set(cluster, 'SubmitArguments', '-A OPEN-0-0'); set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=16'); @@ -224,7 +223,7 @@ For this method, you need to use SalomonDirect profile, import it using [the sam This is an example of m-script using direct mode: -```bash +```console parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonDirect.settings') cluster = parcluster('SalomonDirect'); set(cluster, 'NumWorkers', 48); diff --git a/docs.it4i/anselm/software/numerical-languages/matlab_1314.md b/docs.it4i/anselm/software/numerical-languages/matlab_1314.md index 8c1012531c67f272907e154addb5f336e636eaf6..41dca05619875b20806beb1a8dde7c255347bd89 100644 --- a/docs.it4i/anselm/software/numerical-languages/matlab_1314.md +++ b/docs.it4i/anselm/software/numerical-languages/matlab_1314.md @@ -12,14 +12,14 @@ Matlab is available in the latest stable version. There are always two variants To load the latest version of Matlab load the module -```bash - $ module load matlab +```console +$ ml matlab ``` By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use -```bash - $ module avail matlab +```console +$ ml matlab ``` If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. @@ -30,13 +30,13 @@ Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so u To run Matlab with GUI, use -```bash +```console $ matlab ``` To run Matlab in text mode, without the Matlab Desktop GUI environment, use -```bash +```console $ matlab -nodesktop -nosplash ``` @@ -50,11 +50,9 @@ Recommended parallel mode for running parallel Matlab on Anselm is MPIEXEC mode. For the performance reasons Matlab should use system MPI. On Anselm the supported MPI implementation for Matlab is Intel MPI. To switch to system MPI user has to override default Matlab setting by creating new configuration file in its home directory. The path and file name has to be exactly the same as in the following listing: -```bash +```console $ vim ~/matlab/mpiLibConf.m -``` -```bash function [lib, extras] = mpiLibConf %MATLAB MPI Library overloading for Infiniband Networks @@ -78,10 +76,9 @@ System MPI library allows Matlab to communicate through 40 Gbit/s InfiniBand QDR Once this file is in place, user can request resources from PBS. Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see. -```bash - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=4:ncpus=16:mpiprocs=16 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 +```console +$ xhost + +$ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=4:ncpus=16:mpiprocs=16 -l walltime=00:30:00 -l feature__matlab__MATLAB=1 ``` This qsub command example shows how to run Matlab with 32 workers in following configuration: 2 nodes (use all 16 cores per node) and 16 workers = mpirocs per node (-l select=2:ncpus=16:mpiprocs=16). If user requires to run smaller number of workers per node then the "mpiprocs" parameter has to be changed. @@ -90,9 +87,9 @@ The second part of the command shows how to request all necessary licenses. In t Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: -```bash - cn79$ module load matlab/R2013a-EDU - cn79$ module load impi/4.1.1.036 +```console + cn79$ ml matlab/R2013a-EDU + cn79$ ml impi/4.1.1.036 cn79$ matlab & ``` @@ -128,7 +125,7 @@ This script may be submitted directly to the PBS workload manager via the qsub c Submit the jobscript using qsub -```bash +```console $ qsub ./jobscript ``` @@ -136,7 +133,7 @@ $ qsub ./jobscript The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. -```bash +```console sched = findResource('scheduler', 'type', 'mpiexec'); set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun'); set(sched, 'EnvironmentSetMethod', 'setenv'); @@ -149,7 +146,7 @@ This script creates scheduler object "sched" of type "mpiexec" that starts worke The last step is to start matlabpool with "sched" object and correct number of workers. In this case qsub asked for total number of 32 cores, therefore the number of workers is also set to 32. -```bash +```console matlabpool(sched,32); @@ -161,7 +158,7 @@ matlabpool close The complete example showing how to use Distributed Computing Toolbox is show here. -```bash +```console sched = findResource('scheduler', 'type', 'mpiexec'); set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun') set(sched, 'EnvironmentSetMethod', 'setenv') diff --git a/docs.it4i/anselm/software/numerical-languages/octave.md b/docs.it4i/anselm/software/numerical-languages/octave.md index 19142eb0f6b9150df56c553ba395d385c4b92a47..4fbb52979a38da23ec3a9a3c93e456383f99ab22 100644 --- a/docs.it4i/anselm/software/numerical-languages/octave.md +++ b/docs.it4i/anselm/software/numerical-languages/octave.md @@ -6,7 +6,7 @@ GNU Octave is a high-level interpreted language, primarily intended for numerica Two versions of octave are available on Anselm, via module -| Version | module | +| Version | module | | ----------------------------------------------------- | ------------------------- | | Octave 3.8.2, compiled with GCC and Multithreaded MKL | Octave/3.8.2-gimkl-2.11.5 | | Octave 4.0.1, compiled with GCC and Multithreaded MKL | Octave/4.0.1-gimkl-2.11.5 | @@ -14,14 +14,16 @@ Two versions of octave are available on Anselm, via module ## Modules and Execution - $ module load Octave +```console +$ ml Octave +``` The octave on Anselm is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, octave would parallelize to 16 threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. To run octave interactively, log in with ssh -X parameter for X11 forwarding. Run octave: -```bash - $ octave +```console +$ octave ``` To run octave in batch mode, write an octave script, then write a bash jobscript and execute via the qsub command. By default, octave will use 16 threads when running MKL kernels. @@ -52,8 +54,8 @@ This script may be submitted directly to the PBS workload manager via the qsub c The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. -```bash - $ mkoctfile -v +```console +$ mkoctfile -v ``` Octave may use MPI for interprocess communication This functionality is currently not supported on Anselm cluster. In case you require the octave interface to MPI, please contact [Anselm support](https://support.it4i.cz/rt/). @@ -68,11 +70,11 @@ Octave can accelerate BLAS type operations (in particular the Matrix Matrix mult Example -```bash - $ export OFFLOAD_REPORT=2 - $ export MKL_MIC_ENABLE=1 - $ module load octave - $ octave -q +```console +$ export OFFLOAD_REPORT=2 +$ export MKL_MIC_ENABLE=1 +$ ml octave +$ octave -q octave:1> A=rand(10000); B=rand(10000); octave:2> tic; C=A*B; toc [MKL] [MIC --] [AO Function] DGEMM @@ -101,8 +103,8 @@ variable. To use Octave on a node with Xeon Phi: -```bash - $ ssh mic0 # login to the MIC card - $ source /apps/tools/octave/3.8.2-mic/bin/octave-env.sh # set up environment variables - $ octave -q /apps/tools/octave/3.8.2-mic/example/test0.m # run an example +```console +$ ssh mic0 # login to the MIC card +$ source /apps/tools/octave/3.8.2-mic/bin/octave-env.sh # set up environment variables +$ octave -q /apps/tools/octave/3.8.2-mic/example/test0.m # run an example ``` diff --git a/docs.it4i/anselm/software/numerical-languages/r.md b/docs.it4i/anselm/software/numerical-languages/r.md index d70ea9026f50ed82ff789a232a21de97b7b472cb..8916ccb7cc21a1e9bf7de6bda24d1a38bdf82263 100644 --- a/docs.it4i/anselm/software/numerical-languages/r.md +++ b/docs.it4i/anselm/software/numerical-languages/r.md @@ -21,8 +21,8 @@ The R version 3.0.1 is available on Anselm, along with GUI interface Rstudio | **R** | R 3.0.1 | R | | **Rstudio** | Rstudio 0.97 | Rstudio | -```bash - $ module load R +```console +$ ml R ``` ## Execution @@ -33,9 +33,9 @@ The R on Anselm is linked to highly optimized MKL mathematical library. This pro To run R interactively, using Rstudio GUI, log in with ssh -X parameter for X11 forwarding. Run rstudio: -```bash - $ module load Rstudio - $ rstudio +```console +$ ml Rstudio +$ rstudio ``` ### Batch Execution @@ -78,14 +78,14 @@ The package parallel provides support for parallel computation, including by for The package is activated this way: -```bash - $ R +```console +$ R > library(parallel) ``` More information and examples may be obtained directly by reading the documentation available in R -```bash +```console > ?parallel > library(help = "parallel") > vignette("parallel") @@ -104,7 +104,7 @@ The forking is the most simple to use. Forking family of functions provide paral Forking example: -```bash +```r library(parallel) #integrand function @@ -138,8 +138,8 @@ Forking example: The above example is the classic parallel example for calculating the number Ï€. Note the **detectCores()** and **mclapply()** functions. Execute the example as: -```bash - $ R --slave --no-save --no-restore -f pi3p.R +```console +$ R --slave --no-save --no-restore -f pi3p.R ``` Every evaluation of the integrad function runs in parallel on different process. @@ -155,9 +155,9 @@ Read more on Rmpi at <http://cran.r-project.org/web/packages/Rmpi/>, reference m When using package Rmpi, both openmpi and R modules must be loaded -```bash - $ module load openmpi - $ module load R +```console +$ ml OpenMPI +$ ml R ``` Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI programm. In addition, there is Rslaves dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number Ï€ integration example, to illustrate all these concepts. @@ -168,7 +168,7 @@ Static Rmpi programs are executed via mpiexec, as any other MPI programs. Number Static Rmpi example: -```cpp +```r library(Rmpi) #integrand function @@ -216,8 +216,8 @@ The above is the static MPI example for calculating the number Ï€. Note the **li Execute the example as: -```bash - $ mpiexec R --slave --no-save --no-restore -f pi3.R +```console +$ mpiexec R --slave --no-save --no-restore -f pi3.R ``` ### Dynamic Rmpi @@ -226,7 +226,7 @@ Dynamic Rmpi programs are executed by calling the R directly. openmpi module mus Dynamic Rmpi example: -```cpp +```r #integrand function f <- function(i,h) { x <- h*(i-0.5) @@ -288,8 +288,8 @@ The above example is the dynamic MPI example for calculating the number Ï€. Both Execute the example as: -```bash - $ R --slave --no-save --no-restore -f pi3Rslaves.R +```console +$ R --slave --no-save --no-restore -f pi3Rslaves.R ``` ### mpi.apply Rmpi @@ -303,7 +303,7 @@ Execution is identical to other dynamic Rmpi programs. mpi.apply Rmpi example: -```bash +```r #integrand function f <- function(i,h) { x <- h*(i-0.5) @@ -355,8 +355,8 @@ The above is the mpi.apply MPI example for calculating the number Ï€. Only the s Execute the example as: -```bash - $ R --slave --no-save --no-restore -f pi3parSapply.R +```console +$ R --slave --no-save --no-restore -f pi3parSapply.R ``` ## Combining Parallel and Rmpi diff --git a/docs.it4i/anselm/software/numerical-libraries/fftw.md b/docs.it4i/anselm/software/numerical-libraries/fftw.md index 038e1223a44cde79a37f2f7fe59fab9f7e5a8e8e..7345a811672a725f3916d601d4164e377580b3ab 100644 --- a/docs.it4i/anselm/software/numerical-libraries/fftw.md +++ b/docs.it4i/anselm/software/numerical-libraries/fftw.md @@ -17,8 +17,8 @@ Two versions, **3.3.3** and **2.1.5** of FFTW are available on Anselm, each comp | FFTW2 gcc2.1.5 | OpenMPI | fftw2-mpi/2.1.5-gcc | -lfftw_mpi | | FFTW2 gcc2.1.5 | IntelMPI | fftw2-mpi/2.1.5-gcc | -lfftw_mpi | -```bash - $ module load fftw3 +```console +$ ml fftw3 **or** ml FFTW ``` The module sets up environment variables, required for linking and running FFTW enabled applications. Make sure that the choice of FFTW module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. @@ -62,11 +62,10 @@ The module sets up environment variables, required for linking and running FFTW Load modules and compile: -```bash - $ module load impi intel - $ module load fftw3-mpi - - $ mpicc testfftw3mpi.c -o testfftw3mpi.x -Wl,-rpath=$LIBRARY_PATH -lfftw3_mpi +```console +$ ml intel +$ ml fftw3-mpi +$ mpicc testfftw3mpi.c -o testfftw3mpi.x -Wl,-rpath=$LIBRARY_PATH -lfftw3_mpi ``` Run the example as [Intel MPI program](../mpi/running-mpich2/). diff --git a/docs.it4i/anselm/software/numerical-libraries/gsl.md b/docs.it4i/anselm/software/numerical-libraries/gsl.md index 6b5308df3dabbbfe12a8763a955562e311eff35a..3299492ddbe6270c70a1ee1fbc4228b4e3ca5c15 100644 --- a/docs.it4i/anselm/software/numerical-libraries/gsl.md +++ b/docs.it4i/anselm/software/numerical-libraries/gsl.md @@ -51,8 +51,8 @@ The GSL 1.16 is available on Anselm, compiled for GNU and Intel compiler. These | gsl/1.16-gcc | gcc 4.8.6 | | gsl/1.16-icc(default) | icc | -```bash - $ module load gsl +```console +$ ml gsl ``` The module sets up environment variables, required for linking and running GSL enabled applications. This particular command loads the default module, which is gsl/1.16-icc @@ -63,19 +63,19 @@ Load an appropriate gsl module. Link using **-lgsl** switch to link your code ag ### Compiling and Linking With Intel Compilers -```bash - $ module load intel - $ module load gsl - $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl +```console +$ ml intel +$ ml gsl +$ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl ``` ### Compiling and Linking With GNU Compilers -```bash - $ module load gcc - $ module load mkl - $ module load gsl/1.16-gcc - $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl +```console +$ ml gcc +$ ml imkl **or** ml mkl +$ ml gsl/1.16-gcc +$ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl ``` ## Example @@ -136,9 +136,10 @@ Following is an example of discrete wavelet transform implemented by GSL: Load modules and compile: -```bash - $ module load intel gsl - icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl +```console +$ ml intel +$ ml gsl +$ icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl ``` In this example, we compile the dwt.c code using the Intel compiler and link it to the MKL and GSL library, note the -mkl and -lgsl options. The library search path is compiled in, so that no modules are necessary to run the code. diff --git a/docs.it4i/anselm/software/numerical-libraries/hdf5.md b/docs.it4i/anselm/software/numerical-libraries/hdf5.md index d9abd72c405ab3ff867203fbe7c9408e9e7c5d7c..13f626264cab05dd93d091b0752d1a4a8df2dcf5 100644 --- a/docs.it4i/anselm/software/numerical-libraries/hdf5.md +++ b/docs.it4i/anselm/software/numerical-libraries/hdf5.md @@ -16,8 +16,9 @@ Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, comp | HDF5 gcc parallel MPI | pthread, OpenMPI 1.6.5, gcc 4.8.1 | hdf5-parallel/1.8.13-gcc | $HDF5_INC $HDF5_SHLIB | Not supported | $HDF5_INC $HDF5_F90_LIB | | HDF5 gcc parallel MPI | pthread, OpenMPI 1.8.1, gcc 4.9.0 | hdf5-parallel/1.8.13-gcc49 | $HDF5_INC $HDF5_SHLIB | Not supported | $HDF5_INC $HDF5_F90_LIB | -```bash - $ module load hdf5-parallel +```console + +$ ml hdf5-parallel ``` The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. @@ -77,11 +78,10 @@ The module sets up environment variables, required for linking and running HDF5 Load modules and compile: -```bash - $ module load intel impi - $ module load hdf5-parallel - - $ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB +```console +$ ml intel +$ ml hdf5-parallel +$ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB ``` Run the example as [Intel MPI program](../mpi/running-mpich2/). diff --git a/docs.it4i/anselm/software/numerical-libraries/intel-numerical-libraries.md b/docs.it4i/anselm/software/numerical-libraries/intel-numerical-libraries.md index 8a79b9961d7f158bb369dc65f6ea6e21896b09ac..5f3834ffa84ee0b1fb73d01dfa0aa1a2106566b0 100644 --- a/docs.it4i/anselm/software/numerical-libraries/intel-numerical-libraries.md +++ b/docs.it4i/anselm/software/numerical-libraries/intel-numerical-libraries.md @@ -6,8 +6,8 @@ Intel libraries for high performance in numerical computing Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. -```bash - $ module load mkl +```console +$ ml mkl **or** ml imkl ``` Read more at the [Intel MKL](../intel-suite/intel-mkl/) page. @@ -16,8 +16,8 @@ Read more at the [Intel MKL](../intel-suite/intel-mkl/) page. Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. -```bash - $ module load ipp +```console +$ ml ipp ``` Read more at the [Intel IPP](../intel-suite/intel-integrated-performance-primitives/) page. @@ -26,8 +26,8 @@ Read more at the [Intel IPP](../intel-suite/intel-integrated-performance-primiti Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. -```bash - $ module load tbb +```console +$ ml tbb ``` Read more at the [Intel TBB](../intel-suite/intel-tbb/) page. diff --git a/docs.it4i/anselm/software/numerical-libraries/magma-for-intel-xeon-phi.md b/docs.it4i/anselm/software/numerical-libraries/magma-for-intel-xeon-phi.md index 8ce0b79e0ce63aff1cfea48f72e009ad111a79a1..64c443796b11a378345e9aa93da94af791cf5e5a 100644 --- a/docs.it4i/anselm/software/numerical-libraries/magma-for-intel-xeon-phi.md +++ b/docs.it4i/anselm/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -6,8 +6,8 @@ Next generation dense algebra library for heterogeneous systems with accelerator To be able to compile and link code with MAGMA library user has to load following module: -```bash - $ module load magma/1.3.0-mic +```console +$ ml magma/1.3.0-mic ``` To make compilation more user friendly module also sets these two environment variables: @@ -20,10 +20,9 @@ To make compilation more user friendly module also sets these two environment va Compilation example: -```bash - $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall $MAGMA_INC -c testing_dgetrf_mic.cpp -o testing_dgetrf_mic.o - - $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS +```console +$ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall $MAGMA_INC -c testing_dgetrf_mic.cpp -o testing_dgetrf_mic.o +$ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS ``` ### Running MAGMA Code @@ -44,12 +43,10 @@ MAGMA implementation for Intel MIC requires a MAGMA server running on accelerato To test if the MAGMA server runs properly we can run one of examples that are part of the MAGMA installation: -```bash - [user@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic - - [user@cn204 ~]$ export OMP_NUM_THREADS=16 - - [lriha@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic +```console +[user@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic +[user@cn204 ~]$ export OMP_NUM_THREADS=16 +[lriha@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic Usage: /apps/libs/magma-mic/magmamic-1.3.0/testing/testing_dgetrf_mic [options] [-h|--help] M N CPU GFlop/s (sec) MAGMA GFlop/s (sec) ||PA-LU||/(||A||*N) diff --git a/docs.it4i/anselm/software/numerical-libraries/petsc.md b/docs.it4i/anselm/software/numerical-libraries/petsc.md index 528d13ddbcaffdc9f8b0a80bee379b05602317d7..214e4074ae075aec5ce70bfb3705bab3e7600b50 100644 --- a/docs.it4i/anselm/software/numerical-libraries/petsc.md +++ b/docs.it4i/anselm/software/numerical-libraries/petsc.md @@ -18,9 +18,9 @@ PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of bu You can start using PETSc on Anselm by loading the PETSc module. Module names obey this pattern: -```bash - # module load petsc/version-compiler-mpi-blas-variant, e.g. - module load petsc/3.4.4-icc-impi-mkl-opt +```console +$# ml petsc/version-compiler-mpi-blas-variant, e.g. +$ ml petsc/3.4.4-icc-impi-mkl-opt ``` where `variant` is replaced by one of `{dbg, opt, threads-dbg, threads-opt}`. The `opt` variant is compiled without debugging information (no `-g` option) and with aggressive compiler optimizations (`-O3 -xAVX`). This variant is suitable for performance measurements and production runs. In all other cases use the debug (`dbg`) variant, because it contains debugging information, performs validations and self-checks, and provides a clear stack trace and message in case of an error. The other two variants `threads-dbg` and `threads-opt` are `dbg` and `opt`, respectively, built with [OpenMP and pthreads threading support](https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html). diff --git a/docs.it4i/anselm/software/numerical-libraries/trilinos.md b/docs.it4i/anselm/software/numerical-libraries/trilinos.md index 42f8bc0dc4ca5318cca883193e5fc61eb207b9b1..36688e989a9b83b657707d988472109144e02226 100644 --- a/docs.it4i/anselm/software/numerical-libraries/trilinos.md +++ b/docs.it4i/anselm/software/numerical-libraries/trilinos.md @@ -28,22 +28,22 @@ Currently, Trilinos in version 11.2.3 compiled with Intel Compiler is installed First, load the appropriate module: -```bash - $ module load trilinos +```console +$ ml trilinos ``` For the compilation of CMake-aware project, Trilinos provides the FIND_PACKAGE( Trilinos ) capability, which makes it easy to build against Trilinos, including linking against the correct list of libraries. For details, see <http://trilinos.sandia.gov/Finding_Trilinos.txt> For compiling using simple makefiles, Trilinos provides Makefile.export system, which allows users to include important Trilinos variables directly into their makefiles. This can be done simply by inserting the following line into the makefile: -```bash - include Makefile.export.Trilinos +```cpp +include Makefile.export.Trilinos ``` or -```bash - include Makefile.export.<package> +```cpp +include Makefile.export.<package> ``` if you are interested only in a specific Trilinos package. This will give you access to the variables such as Trilinos_CXX_COMPILER, Trilinos_INCLUDE_DIRS, Trilinos_LIBRARY_DIRS etc. For the detailed description and example makefile see <http://trilinos.sandia.gov/Export_Makefile.txt>. diff --git a/docs.it4i/anselm/software/nvidia-cuda.md b/docs.it4i/anselm/software/nvidia-cuda.md index 392811efa72e5275307c29c34a13b462c688827e..6b06d9384302e0e023f807dcb2eb983a11b3b73a 100644 --- a/docs.it4i/anselm/software/nvidia-cuda.md +++ b/docs.it4i/anselm/software/nvidia-cuda.md @@ -6,48 +6,49 @@ Guide to NVIDIA CUDA Programming and GPU Usage The default programming model for GPU accelerators on Anselm is Nvidia CUDA. To set up the environment for CUDA use -```bash - $ module load cuda +```console +$ ml av cuda +$ ml cuda **or** ml CUDA ``` If the user code is hybrid and uses both CUDA and MPI, the MPI environment has to be set up as well. One way to do this is to use the PrgEnv-gnu module, which sets up correct combination of GNU compiler and MPI library. -```bash - $ module load PrgEnv-gnu +```console +$ ml PrgEnv-gnu ``` CUDA code can be compiled directly on login1 or login2 nodes. User does not have to use compute nodes with GPU accelerator for compilation. To compile a CUDA source code, use nvcc compiler. -```bash - $ nvcc --version +```console +$ nvcc --version ``` CUDA Toolkit comes with large number of examples, that can be helpful to start with. To compile and test these examples user should copy them to its home directory -```bash - $ cd ~ - $ mkdir cuda-samples - $ cp -R /apps/nvidia/cuda/6.5.14/samples/* ~/cuda-samples/ +```console +$ cd ~ +$ mkdir cuda-samples +$ cp -R /apps/nvidia/cuda/6.5.14/samples/* ~/cuda-samples/ ``` To compile an examples, change directory to the particular example (here the example used is deviceQuery) and run "make" to start the compilation -```bash - $ cd ~/cuda-samples/1_Utilities/deviceQuery - $ make +```console +$ cd ~/cuda-samples/1_Utilities/deviceQuery +$ make ``` To run the code user can use PBS interactive session to get access to a node from qnvidia queue (note: use your project name with parameter -A in the qsub command) and execute the binary file -```bash - $ qsub -I -q qnvidia -A OPEN-0-0 - $ module load cuda - $ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery +```console +$ qsub -I -q qnvidia -A OPEN-0-0 +$ ml cuda +$ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery ``` Expected output of the deviceQuery example executed on a node with Tesla K20m is -```bash +```console CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) @@ -90,8 +91,8 @@ Expected output of the deviceQuery example executed on a node with Tesla K20m is In this section we provide a basic CUDA based vector addition code example. You can directly copy and paste the code to test it. -```bash - $ vim test.cu +```console +$ vim test.cu #define N (2048*2048) #define THREADS_PER_BLOCK 512 @@ -180,16 +181,16 @@ In this section we provide a basic CUDA based vector addition code example. You This code can be compiled using following command -```bash - $ nvcc test.cu -o test_cuda +```console +$ nvcc test.cu -o test_cuda ``` To run the code use interactive PBS session to get access to one of the GPU accelerated nodes -```bash - $ qsub -I -q qnvidia -A OPEN-0-0 - $ module load cuda - $ ./test.cuda +```console +$ qsub -I -q qnvidia -A OPEN-0-0 +$ ml cuda +$ ./test.cuda ``` ## CUDA Libraries @@ -287,21 +288,22 @@ SAXPY function multiplies the vector x by the scalar alpha and adds it to the ve To compile the code using NVCC compiler a "-lcublas" compiler flag has to be specified: -```bash - $ module load cuda - $ nvcc -lcublas test_cublas.cu -o test_cublas_nvcc +```console +$ ml cuda +$ nvcc -lcublas test_cublas.cu -o test_cublas_nvcc ``` To compile the same code with GCC: -```bash - $ module load cuda - $ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart +```console +$ ml cuda +$ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart ``` To compile the same code with Intel compiler: -```bash - $ module load cuda intel - $ icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart +```console +$ ml cuda +$ ml intel +$ icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart ``` diff --git a/docs.it4i/anselm/software/omics-master/overview.md b/docs.it4i/anselm/software/omics-master/overview.md index 8d3eb3d3ea5368b1b0d09cec9ec8ca7006fbf1c4..d09a0030cf06246720287c6d0ffad4bfd11825a6 100644 --- a/docs.it4i/anselm/software/omics-master/overview.md +++ b/docs.it4i/anselm/software/omics-master/overview.md @@ -175,16 +175,16 @@ resources. We successfully solved the problem of storing data released in BioPAX First of all, we should load ngsPipeline module: -```bash - $ module load ngsPipeline +```console +$ ml ngsPipeline ``` This command will load python/2.7.5 module and all the required modules (hpg-aligner, gatk, etc) If we launch ngsPipeline with ‘-h’, we will get the usage help: -```bash - $ ngsPipeline -h +```console +$ ngsPipeline -h Usage: ngsPipeline.py [-h] -i INPUT -o OUTPUT -p PED --project PROJECT --queue QUEUE [--stages-path STAGES_PATH] [--email EMAIL] [--prefix PREFIX] [-s START] [-e END] --log @@ -211,7 +211,7 @@ If we launch ngsPipeline with ‘-h’, we will get the usage help: Let us see a brief description of the arguments: -```bash +```console -h --help. Show the help. -i, --input. The input data directory. This directory must to have a special structure. We have to create one folder per sample (with the same name). These folders will host the fastq files. These fastq files must have the following pattern “sampleName†+ “_†+ “1 or 2†+ “.fqâ€. 1 for the first pair (in paired-end sequences), and 2 for the @@ -242,7 +242,7 @@ This is an example usage of NGSpipeline: We have a folder with the following structure in -```bash +```console /apps/bio/omics/1.0/sample_data/ >: /apps/bio/omics/1.0/sample_data @@ -258,7 +258,7 @@ We have a folder with the following structure in The ped file ( file.ped) contains the following info: -```bash +```console #family_ID sample_ID parental_ID maternal_ID sex phenotype FAM sample_A 0 0 1 1 FAM sample_B 0 0 2 2 @@ -266,24 +266,24 @@ The ped file ( file.ped) contains the following info: Now, lets load the NGSPipeline module and copy the sample data to a [scratch directory](../../storage/storage/): -```bash - $ module load ngsPipeline - $ mkdir -p /scratch/$USER/omics/results - $ cp -r /apps/bio/omics/1.0/sample_data /scratch/$USER/omics/ +```console +$ ml ngsPipeline +$ mkdir -p /scratch/$USER/omics/results +$ cp -r /apps/bio/omics/1.0/sample_data /scratch/$USER/omics/ ``` Now, we can launch the pipeline (replace OPEN-0-0 with your Project ID): -```bash - $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped --project OPEN-0-0 --queue qprod +```console +$ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped --project OPEN-0-0 --queue qprod ``` This command submits the processing [jobs to the queue](../../job-submission-and-execution/). If we want to re-launch the pipeline from stage 4 until stage 20 we should use the next command: -```bash - $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped -s 4 -e 20 --project OPEN-0-0 --queue qprod +```console +$ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped -s 4 -e 20 --project OPEN-0-0 --queue qprod ``` ## Details on the Pipeline diff --git a/docs.it4i/anselm/software/openfoam.md b/docs.it4i/anselm/software/openfoam.md index a2c98e3f2d84e11b0e73b3b6c7d9c083422101bb..865f054d326d17591cf623d0ed9d492d342e01ed 100644 --- a/docs.it4i/anselm/software/openfoam.md +++ b/docs.it4i/anselm/software/openfoam.md @@ -31,13 +31,13 @@ openfoam\<VERSION\>-\<COMPILER\>\<openmpiVERSION\>-\<PRECISION\> To check available modules use -```bash - $ module avail +```console +$ ml av ``` In /opt/modules/modulefiles/engineering you can see installed engineering softwares: -```bash +```console ------------------------------------ /opt/modules/modulefiles/engineering ------------------------------------------------------------- ansys/14.5.x matlab/R2013a-COM openfoam/2.2.1-icc-impi4.1.1.036-DP comsol/43b-COM matlab/R2013a-EDU openfoam/2.2.1-icc-openmpi1.6.5-DP @@ -51,10 +51,9 @@ For information how to use modules please [look here](../environment-and-modules To create OpenFOAM environment on ANSELM give the commands: -```bash - $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP - - $ source $FOAM_BASHRC +```console +$ ml openfoam/2.2.1-icc-openmpi1.6.5-DP +$ source $FOAM_BASHRC ``` !!! note @@ -62,28 +61,28 @@ To create OpenFOAM environment on ANSELM give the commands: Create a project directory within the $HOME/OpenFOAM directory named \<USER\>-\<OFversion\> and create a directory named run within it, e.g. by typing: -```bash - $ mkdir -p $FOAM_RUN +```console +$ mkdir -p $FOAM_RUN ``` Project directory is now available by typing: -```bash - $ cd /home/<USER>/OpenFOAM/<USER>-<OFversion>/run +```console +$ cd /home/<USER>/OpenFOAM/<USER>-<OFversion>/run ``` \<OFversion\> - for example \<2.2.1\> or -```bash - $ cd $FOAM_RUN +```console +$ cd $FOAM_RUN ``` Copy the tutorial examples directory in the OpenFOAM distribution to the run directory: -```bash - $ cp -r $FOAM_TUTORIALS $FOAM_RUN +```console +$ cp -r $FOAM_TUTORIALS $FOAM_RUN ``` Now you can run the first case for example incompressible laminar flow in a cavity. @@ -108,8 +107,8 @@ Create a Bash script test.sh Job submission -```bash - $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh +```console +$ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh ``` For information about job submission please [look here](../job-submission-and-execution/). @@ -139,8 +138,8 @@ First we must run serial application bockMesh and decomposePar for preparation o Job submission -```bash - $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh +```console +$ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh ``` This job create simple block mesh and domain decomposition. Check your decomposition, and submit parallel computation: @@ -174,38 +173,38 @@ nproc – number of subdomains Job submission -```bash - $ qsub testParallel.pbs +```console +$ qsub testParallel.pbs ``` ## Compile Your Own Solver Initialize OpenFOAM environment before compiling your solver -```bash - $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP - $ source $FOAM_BASHRC - $ cd $FOAM_RUN/ +```console +$ ml openfoam/2.2.1-icc-openmpi1.6.5-DP +$ source $FOAM_BASHRC +$ cd $FOAM_RUN/ ``` Create directory applications/solvers in user directory -```bash - $ mkdir -p applications/solvers - $ cd applications/solvers +```console +$ mkdir -p applications/solvers +$ cd applications/solvers ``` Copy icoFoam solver’s source files -```bash - $ cp -r $FOAM_SOLVERS/incompressible/icoFoam/ My_icoFoam - $ cd My_icoFoam +```console +$ cp -r $FOAM_SOLVERS/incompressible/icoFoam/ My_icoFoam +$ cd My_icoFoam ``` Rename icoFoam.C to My_icoFOAM.C -```bash - $ mv icoFoam.C My_icoFoam.C +```console +$ mv icoFoam.C My_icoFoam.C ``` Edit _files_ file in _Make_ directory: @@ -224,6 +223,6 @@ and change to: In directory My_icoFoam give the compilation command: -```bash - $ wmake +```console +$ wmake ``` diff --git a/docs.it4i/anselm/software/paraview.md b/docs.it4i/anselm/software/paraview.md index 7007369800f88b5c672640ee8c32952ca73d4df7..0fbd7af21a22517171a4046693624cf6b558adcc 100644 --- a/docs.it4i/anselm/software/paraview.md +++ b/docs.it4i/anselm/software/paraview.md @@ -22,22 +22,22 @@ On Anselm, ParaView is to be used in client-server mode. A parallel ParaView ser To launch the server, you must first allocate compute nodes, for example -```bash - $ qsub -I -q qprod -A OPEN-0-0 -l select=2 +```console +$ qsub -I -q qprod -A OPEN-0-0 -l select=2 ``` to launch an interactive session on 2 nodes. Refer to [Resource Allocation and Job Execution](../job-submission-and-execution/) for details. After the interactive session is opened, load the ParaView module : -```bash - $ module add paraview +```console +$ module add paraview ``` Now launch the parallel server, with number of nodes times 16 processes: -```bash - $ mpirun -np 32 pvserver --use-offscreen-rendering +```console +$ mpirun -np 32 pvserver --use-offscreen-rendering Waiting for client... Connection URL: cs://cn77:11111 Accepting connection(s): cn77:11111 @@ -49,8 +49,8 @@ Note the that the server is listening on compute node cn77 in this case, we shal Because a direct connection is not allowed to compute nodes on Anselm, you must establish a SSH tunnel to connect to the server. Choose a port number on your PC to be forwarded to ParaView server, for example 12345. If your PC is running Linux, use this command to establish a SSH tunnel: -```bash - ssh -TN -L 12345:cn77:11111 username@anselm.it4i.cz +```console +ssh -TN -L 12345:cn77:11111 username@anselm.it4i.cz ``` replace username with your login and cn77 with the name of compute node your ParaView server is running on (see previous step). If you use PuTTY on Windows, load Anselm connection configuration, t>hen go to Connection-> SSH>->Tunnels to set up the port forwarding. Click Remote radio button. Insert 12345 to Source port textbox. Insert cn77:11111. Click Add button, then Open. @@ -64,8 +64,8 @@ Port : 12345 Click Configure, Save, the configuration is now saved for later use. Now click Connect to connect to the ParaView server. In your terminal where you have interactive session with ParaView server launched, you should see: -```bash - Client connected. +```console +Client connected. ``` You can now use Parallel ParaView. diff --git a/docs.it4i/anselm/software/virtualization.md b/docs.it4i/anselm/software/virtualization.md index a5c7c95aa5f2c1df601606ecc42ed2c8398fb249..109a771b0c5307471a0131e61298eae9e242467f 100644 --- a/docs.it4i/anselm/software/virtualization.md +++ b/docs.it4i/anselm/software/virtualization.md @@ -154,7 +154,7 @@ Create job script according recommended Example job for Windows virtual machine: -```bash +```bat #/bin/sh JOB_DIR=/scratch/$USER/win/${PBS_JOBID} @@ -192,7 +192,7 @@ Job script links application data (win), input data (data) and run script (run.b Example run script (run.bat) for Windows virtual machine: -```bash +```doscon z: cd winappl call application.bat z:data z:output @@ -210,40 +210,37 @@ Virtualization is enabled only on compute nodes, virtualization does not work on Load QEMU environment module: -```bash - $ module add qemu +```console +$ module add qemu ``` Get help -```bash - $ man qemu +```console +$ man qemu ``` Run virtual machine (simple) -```bash - $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -vnc :0 - - $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -vnc :0 +```console +$ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -vnc :0 +$ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -vnc :0 ``` You can access virtual machine by VNC viewer (option -vnc) connecting to IP address of compute node. For VNC you must use VPN network. Install virtual machine from ISO file -```bash - $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -cdrom linux-install.iso -boot d -vnc :0 - - $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -cdrom win-install.iso -boot d -vnc :0 +```console +$ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -cdrom linux-install.iso -boot d -vnc :0 +$ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -cdrom win-install.iso -boot d -vnc :0 ``` Run virtual machine using optimized devices, user network back-end with sharing and port forwarding, in snapshot mode -```bash - $ qemu-system-x86_64 -drive file=linux.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 -vnc :0 -snapshot - - $ qemu-system-x86_64 -drive file=win.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 -vnc :0 -snapshot +```console +$ qemu-system-x86_64 -drive file=linux.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 -vnc :0 -snapshot +$ qemu-system-x86_64 -drive file=win.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 -vnc :0 -snapshot ``` Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 for SSH). You must use VPN network). @@ -259,22 +256,22 @@ In default configuration IP network 10.0.2.0/24 is used, host has IP address 10. Simple network setup -```bash - $ qemu-system-x86_64 ... -net nic -net user +```console +$ qemu-system-x86_64 ... -net nic -net user ``` (It is default when no -net options are given.) Simple network setup with sharing and port forwarding (obsolete but simpler syntax, lower performance) -```bash - $ qemu-system-x86_64 ... -net nic -net user,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 +```console +$ qemu-system-x86_64 ... -net nic -net user,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 ``` Optimized network setup with sharing and port forwarding -```bash - $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 +```console +$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 ``` ### Advanced Networking @@ -285,40 +282,40 @@ Sometime your virtual machine needs access to internet (install software, update Load VDE enabled QEMU environment module (unload standard QEMU module first if necessary). -```bash - $ module add qemu/2.1.2-vde2 +```console +$ module add qemu/2.1.2-vde2 ``` Create virtual network switch. -```bash - $ vde_switch -sock /tmp/sw0 -mgmt /tmp/sw0.mgmt -daemon +```console +$ vde_switch -sock /tmp/sw0 -mgmt /tmp/sw0.mgmt -daemon ``` Run SLIRP daemon over SSH tunnel on login node and connect it to virtual network switch. -```bash - $ dpipe vde_plug /tmp/sw0 = ssh login1 $VDE2_DIR/bin/slirpvde -s - --dhcp & +```console +$ dpipe vde_plug /tmp/sw0 = ssh login1 $VDE2_DIR/bin/slirpvde -s - --dhcp & ``` Run qemu using vde network back-end, connect to created virtual switch. Basic setup (obsolete syntax) -```bash - $ qemu-system-x86_64 ... -net nic -net vde,sock=/tmp/sw0 +```console +$ qemu-system-x86_64 ... -net nic -net vde,sock=/tmp/sw0 ``` Setup using virtio device (obsolete syntax) -```bash - $ qemu-system-x86_64 ... -net nic,model=virtio -net vde,sock=/tmp/sw0 +```console +$ qemu-system-x86_64 ... -net nic,model=virtio -net vde,sock=/tmp/sw0 ``` Optimized setup -```bash - $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev vde,id=net0,sock=/tmp/sw0 +```console +$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev vde,id=net0,sock=/tmp/sw0 ``` #### TAP Interconnect @@ -329,9 +326,8 @@ Cluster Anselm provides TAP device tap0 for your job. TAP interconnect does not Run qemu with TAP network back-end: -```bash - $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net1 - -netdev tap,id=net1,ifname=tap0,script=no,downscript=no +```console +$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net1 -netdev tap,id=net1,ifname=tap0,script=no,downscript=no ``` Interface tap0 has IP address 192.168.1.1 and network mask 255.255.255.0 (/24). In virtual machine use IP address from range 192.168.1.2-192.168.1.254. For your convenience some ports on tap0 interface are redirected to higher numbered ports, so you as non-privileged user can provide services on these ports. @@ -344,15 +340,17 @@ Redirected ports: You can configure IP address of virtual machine statically or dynamically. For dynamic addressing provide your DHCP server on port 3067 of tap0 interface, you can also provide your DNS server on port 3053 of tap0 interface for example: -```bash - $ dnsmasq --interface tap0 --bind-interfaces -p 3053 --dhcp-alternate-port=3067,68 --dhcp-range=192.168.1.15,192.168.1.32 --dhcp-leasefile=/tmp/dhcp.leasefile +```console +$ dnsmasq --interface tap0 --bind-interfaces -p 3053 --dhcp-alternate-port=3067,68 --dhcp-range=192.168.1.15,192.168.1.32 --dhcp-leasefile=/tmp/dhcp.leasefile ``` You can also provide your SMB services (on ports 3139, 3445) to obtain high performance data sharing. Example smb.conf (not optimized) -```bash +```console +$ cat smb.conf + [global] socket address=192.168.1.1 smb ports = 3445 3139 @@ -387,8 +385,8 @@ Example smb.conf (not optimized) Run SMB services -```bash - smbd -s /tmp/qemu-smb/smb.conf +```console +$ smbd -s /tmp/qemu-smb/smb.conf ``` Virtual machine can of course have more than one network interface controller, virtual machine can use more than one network back-end. So, you can combine for example use network back-end and TAP interconnect. @@ -397,15 +395,15 @@ Virtual machine can of course have more than one network interface controller, v In snapshot mode image is not written, changes are written to temporary file (and discarded after virtual machine exits). **It is strongly recommended mode for running your jobs.** Set TMPDIR environment variable to local scratch directory for placement temporary files. -```bash - $ export TMPDIR=/lscratch/${PBS_JOBID} - $ qemu-system-x86_64 ... -snapshot +```console +$ export TMPDIR=/lscratch/${PBS_JOBID} +$ qemu-system-x86_64 ... -snapshot ``` ### Windows Guests For Windows guests we recommend these options, life will be easier: -```bash - $ qemu-system-x86_64 ... -localtime -usb -usbdevice tablet +```console +$ qemu-system-x86_64 ... -localtime -usb -usbdevice tablet ``` diff --git a/docs.it4i/anselm/storage.md b/docs.it4i/anselm/storage.md index 7beb9678fb422baa514d9393af5d94539c8f000d..d4265438e67489452d1748e7149ad01d2c9e1b5d 100644 --- a/docs.it4i/anselm/storage.md +++ b/docs.it4i/anselm/storage.md @@ -31,14 +31,14 @@ There is default stripe configuration for Anselm Lustre filesystems. However, us Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. -```bash +```console $ lfs getstripe dir|filename $ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir|filename ``` Example: -```bash +```console $ lfs getstripe /scratch/username/ /scratch/username/ stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 @@ -53,7 +53,7 @@ In this example, we view current stripe setting of the /scratch/username/ direct Use lfs check OSTs to see the number and status of active OSTs for each filesystem on Anselm. Learn more by reading the man page -```bash +```console $ lfs check osts $ man lfs ``` @@ -98,7 +98,7 @@ The architecture of Lustre on Anselm is composed of two metadata servers (MDS) a * 2 groups of 5 disks in RAID5 * 2 hot-spare disks -\###HOME +### HOME The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 320TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 250GB per user. If 250GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. @@ -127,14 +127,14 @@ Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for t | Default stripe count | 1 | | Number of OSTs | 22 | -\###SCRATCH +### SCRATCH The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. !!! note The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. - >Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. + Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. Files on the SCRATCH filesystem that are **not accessed for more than 90 days** will be automatically **deleted**. @@ -157,13 +157,13 @@ The SCRATCH filesystem is realized as Lustre parallel filesystem and is availabl User quotas on the file systems can be checked and reviewed using following command: -```bash +```console $ lfs quota dir ``` Example for Lustre HOME directory: -```bash +```console $ lfs quota /home Disk quotas for user user001 (uid 1234): Filesystem kbytes quota limit grace files quota limit grace @@ -177,7 +177,7 @@ In this example, we view current quota size limit of 250GB and 300MB currently u Example for Lustre SCRATCH directory: -```bash +```console $ lfs quota /scratch Disk quotas for user user001 (uid 1234): Filesystem kbytes quota limit grace files quota limit grace @@ -191,13 +191,13 @@ In this example, we view current quota size limit of 100TB and 8KB currently use To have a better understanding of where the space is exactly used, you can use following command to find out. -```bash +```console $ du -hs dir ``` Example for your HOME directory: -```bash +```console $ cd /home $ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr 258M cuda-samples @@ -211,11 +211,11 @@ This will list all directories which are having MegaBytes or GigaBytes of consum To have a better understanding of previous commands, you can read manpages. -```bash +```console $ man lfs ``` -```bash +```console $ man du ``` @@ -225,7 +225,7 @@ Extended ACLs provide another security mechanism beside the standard POSIX ACLs ACLs on a Lustre file system work exactly like ACLs on any Linux file system. They are manipulated with the standard tools in the standard manner. Below, we create a directory and allow a specific user access. -```bash +```console [vop999@login1.anselm ~]$ umask 027 [vop999@login1.anselm ~]$ mkdir test [vop999@login1.anselm ~]$ ls -ld test @@ -353,40 +353,40 @@ The SSHFS provides a very convenient way to access the CESNET Storage. The stora First, create the mount point -```bash - $ mkdir cesnet +```console +$ mkdir cesnet ``` Mount the storage. Note that you can choose among the ssh.du1.cesnet.cz (Plzen), ssh.du2.cesnet.cz (Jihlava), ssh.du3.cesnet.cz (Brno) Mount tier1_home **(only 5120M !)**: -```bash - $ sshfs username@ssh.du1.cesnet.cz:. cesnet/ +```console +$ sshfs username@ssh.du1.cesnet.cz:. cesnet/ ``` For easy future access from Anselm, install your public key -```bash - $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys +```console +$ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys ``` Mount tier1_cache_tape for the Storage VO: -```bash - $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ +```console +$ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ ``` View the archive, copy the files and directories in and out -```bash - $ ls cesnet/ - $ cp -a mydir cesnet/. - $ cp cesnet/myfile . +```console +$ ls cesnet/ +$ cp -a mydir cesnet/. +$ cp cesnet/myfile . ``` Once done, please remember to unmount the storage -```bash - $ fusermount -u cesnet +```console +$ fusermount -u cesnet ``` ### Rsync Access @@ -402,16 +402,16 @@ Rsync finds files that need to be transferred using a "quick check" algorithm (b Transfer large files to/from CESNET storage, assuming membership in the Storage VO -```bash - $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . +```console +$ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. +$ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . ``` Transfer large directories to/from CESNET storage, assuming membership in the Storage VO -```bash - $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . +```console +$ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. +$ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . ``` Transfer rates of about 28 MB/s can be expected. diff --git a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md index f064b2e6a89dc4b2c8290a0b552eac82ca973941..1778c429ab128fa94fc6e847fd7861755b45bef4 100644 --- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md +++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md @@ -9,7 +9,7 @@ The recommended clients are [TightVNC](http://www.tightvnc.com) or [TigerVNC](ht !!! note Local VNC password should be set before the first login. Do use a strong password. -```bash +```console [username@login2 ~]$ vncpasswd Password: Verify: @@ -24,16 +24,16 @@ Verify: You can find ports which are already occupied. Here you can see that ports " /usr/bin/Xvnc :79" and " /usr/bin/Xvnc :60" are occupied. -```bash +```console [username@login2 ~]$ ps aux | grep Xvnc -username 5971 0.0 0.0 201072 92564 ? SN Sep22 4:19 /usr/bin/Xvnc :79 -desktop login2:79 (username) -auth /home/gre196/.Xauthority -geometry 1024x768 -rfbwait 30000 -rfbauth /home/username/.vnc/passwd -rfbport 5979 -fp catalogue:/etc/X11/fontpath.d -pn -username 10296 0.0 0.0 131772 21076 pts/29 SN 13:01 0:01 /usr/bin/Xvnc :60 -desktop login2:61 (username) -auth /home/username/.Xauthority -geometry 1600x900 -depth 16 -rfbwait 30000 -rfbauth /home/jir13/.vnc/passwd -rfbport 5960 -fp catalogue:/etc/X11/fontpath.d -pn +username 5971 0.0 0.0 201072 92564 ? SN Sep22 4:19 /usr/bin/Xvnc :79 -desktop login2:79 (username) -auth /home/vop999/.Xauthority -geometry 1024x768 -rfbwait 30000 -rfbauth /home/username/.vnc/passwd -rfbport 5979 -fp catalogue:/etc/X11/fontpath.d -pn +username 10296 0.0 0.0 131772 21076 pts/29 SN 13:01 0:01 /usr/bin/Xvnc :60 -desktop login2:61 (username) -auth /home/vop999/.Xauthority -geometry 1600x900 -depth 16 -rfbwait 30000 -rfbauth /home/vop999/.vnc/passwd -rfbport 5960 -fp catalogue:/etc/X11/fontpath.d -pn ..... ``` Choose free port e.g. 61 and start your VNC server: -```bash +```console [username@login2 ~]$ vncserver :61 -geometry 1600x900 -depth 16 New 'login2:1 (username)' desktop is login2:1 @@ -44,7 +44,7 @@ Log file is /home/username/.vnc/login2:1.log Check if VNC server is started on the port (in this example 61): -```bash +```console [username@login2 .vnc]$ vncserver -list TigerVNC server sessions: @@ -55,10 +55,10 @@ X DISPLAY # PROCESS ID Another command: -```bash +```console [username@login2 .vnc]$ ps aux | grep Xvnc -username 10296 0.0 0.0 131772 21076 pts/29 SN 13:01 0:01 /usr/bin/Xvnc :61 -desktop login2:61 (username) -auth /home/jir13/.Xauthority -geometry 1600x900 -depth 16 -rfbwait 30000 -rfbauth /home/username/.vnc/passwd -rfbport 5961 -fp catalogue:/etc/X11/fontpath.d -pn +username 10296 0.0 0.0 131772 21076 pts/29 SN 13:01 0:01 /usr/bin/Xvnc :61 -desktop login2:61 (username) -auth /home/vop999/.Xauthority -geometry 1600x900 -depth 16 -rfbwait 30000 -rfbauth /home/username/.vnc/passwd -rfbport 5961 -fp catalogue:/etc/X11/fontpath.d -pn ``` To access the VNC server you have to create a tunnel between the login node using TCP **port 5961** and your machine using a free TCP port (for simplicity the very same, in this case). @@ -70,13 +70,13 @@ To access the VNC server you have to create a tunnel between the login node usin At your machine, create the tunnel: -```bash +```console local $ ssh -TN -f username@login2.cluster-name.it4i.cz -L 5961:localhost:5961 ``` Issue the following command to check the tunnel is established (please note the PID 2022 in the last column, you'll need it for closing the tunnel): -```bash +```console local $ netstat -natp | grep 5961 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) @@ -86,14 +86,14 @@ tcp6 0 0 ::1:5961 :::* LISTEN Or on Mac OS use this command: -```bash +```console local-mac $ lsof -n -i4TCP:5961 | grep LISTEN ssh 75890 sta545 7u IPv4 0xfb062b5c15a56a3b 0t0 TCP 127.0.0.1:5961 (LISTEN) ``` Connect with the VNC client: -```bash +```console local $ vncviewer 127.0.0.1:5961 ``` @@ -101,7 +101,7 @@ In this example, we connect to VNC server on port 5961, via the ssh tunnel. The You have to destroy the SSH tunnel which is still running at the background after you finish the work. Use the following command (PID 2022 in this case, see the netstat command above): -```bash +```console kill 2022 ``` @@ -113,7 +113,7 @@ Start vncserver using command vncserver described above. Search for the localhost and port number (in this case 127.0.0.1:5961). -```bahs +```console [username@login2 .vnc]$ netstat -tanp | grep Xvnc (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) @@ -160,7 +160,7 @@ Uncheck both options below the slider: If the screen gets locked you have to kill the screensaver. Do not to forget to disable the screensaver then. -```bash +```console [username@login2 .vnc]$ ps aux | grep screen username 1503 0.0 0.0 103244 892 pts/4 S+ 14:37 0:00 grep screen username 24316 0.0 0.0 270564 3528 ? Ss 14:12 0:00 gnome-screensaver @@ -172,7 +172,7 @@ username 24316 0.0 0.0 270564 3528 ? Ss 14:12 0:00 gnome-screensa You should kill your VNC server using command: -```bash +```console [username@login2 .vnc]$ vncserver -kill :61 Killing Xvnc process ID 7074 Xvnc process ID 7074 already killed @@ -180,7 +180,7 @@ Xvnc process ID 7074 already killed Or this way: -```bash +```console [username@login2 .vnc]$ pkill vnc ``` @@ -194,19 +194,19 @@ Open a Terminal (Applications -> System Tools -> Terminal). Run all the next com Allow incoming X11 graphics from the compute nodes at the login node: -```bash +```console $ xhost + ``` Get an interactive session on a compute node (for more detailed info [look here](../../../anselm/job-submission-and-execution/)). Use the **-v DISPLAY** option to propagate the DISPLAY on the compute node. In this example, we want a complete node (24 cores in this example) from the production queue: -```bash +```console $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A PROJECT_ID -q qprod -l select=1:ncpus=24 ``` Test that the DISPLAY redirection into your VNC session works, by running a X11 application (e. g. XTerm) on the assigned compute node: -```bash +```console $ xterm ``` diff --git a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md index b9c6951295a6b4d96fceb53c6d383464bee6d5c1..961123f511f779edc6e508aec5e6461f5506f06d 100644 --- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md +++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md @@ -9,7 +9,7 @@ The X Window system is a principal way to get GUI access to the clusters. The ** In order to display graphical user interface GUI of various software tools, you need to enable the X display forwarding. On Linux and Mac, log in using the -X option tho ssh client: -```bash +```console local $ ssh -X username@cluster-name.it4i.cz ``` @@ -19,13 +19,13 @@ On Windows use the PuTTY client to enable X11 forwarding. In PuTTY menu, go to C To verify the forwarding, type -```bash +```console $ echo $DISPLAY ``` if you receive something like -```bash +```console localhost:10.0 ``` @@ -44,8 +44,8 @@ Mac OS users need to install [XQuartz server](https://www.xquartz.org). There are variety of X servers available for Windows environment. The commercial Xwin32 is very stable and rich featured. The Cygwin environment provides fully featured open-source XWin X server. For simplicity, we recommend open-source X server by the [Xming project](http://sourceforge.net/projects/xming/). For stability and full features we recommend the [XWin](http://x.cygwin.com/) X server by Cygwin -| How to use Xwin | How to use Xming | -| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | +| How to use Xwin | How to use Xming | +|--- | --- | | [Install Cygwin](http://x.cygwin.com/) Find and execute XWin.exe to start the X server on Windows desktop computer.[If no able to forward X11 using PuTTY to CygwinX](#if-no-able-to-forward-x11-using-putty-to-cygwinx) | Use Xlaunch to configure the Xming. Run Xming to start the X server on Windows desktop computer. | Read more on [http://www.math.umn.edu/systems_guide/putty_xwin32.html](http://www.math.umn.edu/systems_guide/putty_xwin32.shtml) @@ -57,12 +57,12 @@ Read more on [http://www.math.umn.edu/systems_guide/putty_xwin32.html](http://ww Then launch the application as usual. Use the & to run the application in background. -```bash -$ module load intel (idb and gvim not installed yet) +```console +$ ml intel (idb and gvim not installed yet) $ gvim & ``` -```bash +```console $ xterm ``` @@ -72,7 +72,7 @@ In this example, we activate the intel programing environment tools, then start Allocate the compute nodes using -X option on the qsub command -```bash +```console $ qsub -q qexp -l select=2:ncpus=24 -X -I ``` @@ -80,7 +80,7 @@ In this example, we allocate 2 nodes via qexp queue, interactively. We request X **Better performance** is obtained by logging on the allocated compute node via ssh, using the -X option. -```bash +```console $ ssh -X r24u35n680 ``` @@ -95,13 +95,13 @@ The Gnome 2.28 GUI environment is available on the clusters. We recommend to use To run the remote Gnome session in a window on Linux/OS X computer, you need to install Xephyr. Ubuntu package is xserver-xephyr, on OS X it is part of [XQuartz](http://xquartz.macosforge.org/landing/). First, launch Xephyr on local machine: -```bash +```console local $ Xephyr -ac -screen 1024x768 -br -reset -terminate :1 & ``` This will open a new X window with size 1024 x 768 at DISPLAY :1. Next, ssh to the cluster with DISPLAY environment variable set and launch gnome-session -```bash +```console local $ DISPLAY=:1.0 ssh -XC yourname@cluster-name.it4i.cz -i ~/.ssh/path_to_your_key ... cluster-name MOTD... yourname@login1.cluster-namen.it4i.cz $ gnome-session & @@ -109,7 +109,7 @@ yourname@login1.cluster-namen.it4i.cz $ gnome-session & On older systems where Xephyr is not available, you may also try Xnest instead of Xephyr. Another option is to launch a new X server in a separate console, via: -```bash +```console xinit /usr/bin/ssh -XT -i .ssh/path_to_your_key yourname@cluster-namen.it4i.cz gnome-session -- :1 vt12 ``` @@ -122,7 +122,7 @@ Use Xlaunch to start the Xming server or run the XWin.exe. Select the "One windo Log in to the cluster, using PuTTY. On the cluster, run the gnome-session command. -```bash +```console $ gnome-session & ``` @@ -132,7 +132,7 @@ Use System-Log Out to close the gnome-session ### if No Able to Forward X11 Using PuTTY to CygwinX -```bash +```console [usename@login1.anselm ~]$ gnome-session & [1] 23691 [usename@login1.anselm ~]$ PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused diff --git a/docs.it4i/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md b/docs.it4i/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md index a2a4d429fc06d4943a0ab89df247f410ccdc4bd2..5a952ea24c738ad59acf7b94bed9fe23602b83e9 100644 --- a/docs.it4i/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md +++ b/docs.it4i/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md @@ -4,9 +4,9 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys file: -```bash - $ cd /home/username/ - $ ls -la .ssh/ +```console +$ cd /home/username/ +$ ls -la .ssh/ total 24 drwx------ 2 username username 4096 May 13 15:12 . drwxr-x---22 username username 4096 May 13 07:22 .. @@ -21,18 +21,18 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys ## Access Privileges on .ssh Folder -* .ssh directory: 700 (drwx------) -* Authorized_keys, known_hosts and public key (.pub file): 644 (-rw-r--r--) -* Private key (id_rsa/id_rsa.ppk): 600 (-rw-------) - -```bash - cd /home/username/ - chmod 700 .ssh/ - chmod 644 .ssh/authorized_keys - chmod 644 .ssh/id_rsa.pub - chmod 644 .ssh/known_hosts - chmod 600 .ssh/id_rsa - chmod 600 .ssh/id_rsa.ppk +* .ssh directory: `700 (drwx------)` +* Authorized_keys, known_hosts and public key (.pub file): `644 (-rw-r--r--)` +* Private key (id_rsa/id_rsa.ppk): `600 (-rw-------)` + +```console +$ cd /home/username/ +$ chmod 700 .ssh/ +$ chmod 644 .ssh/authorized_keys +$ chmod 644 .ssh/id_rsa.pub +$ chmod 644 .ssh/known_hosts +$ chmod 600 .ssh/id_rsa +$ chmod 600 .ssh/id_rsa.ppk ``` ## Private Key @@ -40,11 +40,11 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys !!! note The path to a private key is usually /home/username/.ssh/ -Private key file in "id_rsa" or `*.ppk` format is used to authenticate with the servers. Private key is present locally on local side and used for example in SSH agent Pageant (for Windows users). The private key should always be kept in a safe place. +Private key file in `id_rsa` or `*.ppk` format is used to authenticate with the servers. Private key is present locally on local side and used for example in SSH agent Pageant (for Windows users). The private key should always be kept in a safe place. An example of private key format: -```bash +```console -----BEGIN RSA PRIVATE KEY----- MIIEpAIBAAKCAQEAqbo7jokygnBpG2wYa5NB45ns6+UKTNLMLHF0BO3zmRtKEElE aGqXfbYwvXlcuRb2d9/Y5dVpCZHV0kbY3NhtVOcEIe+1ROaiU9BEsUAhMNEvgiLV @@ -76,11 +76,11 @@ An example of private key format: ## Public Key -Public key file in "\*.pub" format is used to verify a digital signature. Public key is present on the remote side and allows access to the owner of the matching private key. +Public key file in `*.pub` format is used to verify a digital signature. Public key is present on the remote side and allows access to the owner of the matching private key. An example of public key format: -```bash +```console ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCpujuOiTKCcGkbbBhrk0Hjmezr5QpM0swscXQE7fOZG0oQSURoapd9tjC9eVy5FvZ339jl1WkJkdXSRtjc2G1U5wQh77VE5qJT0ESxQCEw0S+CItWBKqXhC9E7gFY+UyP5YBZcOneh6gGHyCVfK6H215vzKr3x+/WvWl5gZGtbf+zhX6o4RJDRdjZPutYJhEsg/qtMxcCtMjfm/dZTnXeafuebV8nug3RCBUflvRb1XUrJuiX28gsd4xfG/P6L/mNMR8s4kmJEZhlhxpj8Th0iIc+XciVtXuGWQrbddcVRLxAmvkYAPGnVVOQeNj69pqAR/GXaFAhvjYkseEowQao1 username@organization.example.com ``` @@ -88,8 +88,8 @@ ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCpujuOiTKCcGkbbBhrk0Hjmezr5QpM0swscXQE7fOZ First, generate a new keypair of your public and private key: -```bash - local $ ssh-keygen -C 'username@organization.example.com' -f additional_key +```console +local $ ssh-keygen -C 'username@organization.example.com' -f additional_key ``` !!! note @@ -99,12 +99,12 @@ You can insert additional public key into authorized_keys file for authenticatio Example: -```bash - $ cat additional_key.pub > ~/.ssh/authorized_keys +```console +$ cat additional_key.pub > ~/.ssh/authorized_keys ``` In this example, we add an additional public key, stored in file additional_key.pub into the authorized_keys. Next time we log in, we will be able to use the private addtional_key key to log in. ## How to Remove Your Own Key -Removing your key from authorized_keys can be done simply by deleting the corresponding public key which can be identified by a comment at the end of line (eg. _username@organization.example.com_). +Removing your key from authorized_keys can be done simply by deleting the corresponding public key which can be identified by a comment at the end of line (eg. `username@organization.example.com`). diff --git a/docs.it4i/general/obtaining-login-credentials/certificates-faq.md b/docs.it4i/general/obtaining-login-credentials/certificates-faq.md index bf0b5c5acc85d611237908cfecf5f8e73b07afd5..af6fb236b5204d44c067c103a796156f5d0ede8b 100644 --- a/docs.it4i/general/obtaining-login-credentials/certificates-faq.md +++ b/docs.it4i/general/obtaining-login-credentials/certificates-faq.md @@ -57,7 +57,7 @@ It is worth noting that gsissh-term and DART automatically updates their CA cert Lastly, if you need the CA certificates for a personal Globus 5 installation, then you can install the CA certificates from a MyProxy server with the following command. -```bash +```console myproxy-get-trustroots -s myproxy-prace.lrz.de ``` @@ -77,14 +77,14 @@ The following examples are for Unix/Linux operating systems only. To convert from PEM to p12, enter the following command: -```bash +```console openssl pkcs12 -export -in usercert.pem -inkey userkey.pem -out username.p12 ``` To convert from p12 to PEM, type the following _four_ commands: -```bash +```console openssl pkcs12 -in username.p12 -out usercert.pem -clcerts -nokeys openssl pkcs12 -in username.p12 -out userkey.pem -nocerts chmod 444 usercert.pem @@ -93,14 +93,14 @@ To convert from p12 to PEM, type the following _four_ commands: To check your Distinguished Name (DN), enter the following command: -```bash +```console openssl x509 -in usercert.pem -noout -subject -nameopt RFC2253 ``` To check your certificate (e.g., DN, validity, issuer, public key algorithm, etc.), enter the following command: -```bash +```console openssl x509 -in usercert.pem -text -noout ``` @@ -110,7 +110,7 @@ To download openssl if not pre-installed, [please visit](https://www.openssl.org IT4innovations recommends the java based keytool utility to create and manage keystores, which themselves are stores of keys and certificates. For example if you want to convert your pkcs12 formatted key pair into a java keystore you can use the following command. -```bash +```console keytool -importkeystore -srckeystore $my_p12_cert -destkeystore $my_keystore -srcstoretype pkcs12 -deststoretype jks -alias $my_nickname -destalias $my_nickname @@ -120,7 +120,7 @@ where $my_p12_cert is the name of your p12 (pkcs12) certificate, $my_keystore is You also can import CA certificates into your java keystore with the tool, e.g.: -```bash +```console keytool -import -trustcacerts -alias $mydomain -file $mydomain.crt -keystore $my_keystore ``` diff --git a/docs.it4i/general/obtaining-login-credentials/obtaining-login-credentials.md b/docs.it4i/general/obtaining-login-credentials/obtaining-login-credentials.md index 5ddfcb18640ac6b8755df7e704dbd48b66e8c12a..b5202bb65bbd85cce248d61da6d9a4c10d0a9a29 100644 --- a/docs.it4i/general/obtaining-login-credentials/obtaining-login-credentials.md +++ b/docs.it4i/general/obtaining-login-credentials/obtaining-login-credentials.md @@ -40,7 +40,7 @@ In order to authorize a Collaborator to utilize the allocated resources, the PI Example (except the subject line which must be in English, you may use Czech or Slovak language for communication with us): -```bash +```console Subject: Authorization to IT4Innovations Dear support, @@ -72,7 +72,7 @@ Once authorized by PI, every person (PI or Collaborator) wishing to access the c Example (except the subject line which must be in English, you may use Czech or Slovak language for communication with us): -```bash +```console Subject: Access to IT4Innovations Dear support, @@ -100,7 +100,7 @@ The clusters are accessed by the [private key](../accessing-the-clusters/shell-a On Linux, use -```bash +```console local $ ssh-keygen -f id_rsa -p ``` @@ -134,8 +134,8 @@ Follow these steps **only** if you can not obtain your certificate in a standard * Go to [COMODO Application for Secure Email Certificate](https://secure.comodo.com/products/frontpage?area=SecureEmailCertificate). * Fill in the form, accept the Subscriber Agreement and submit it by the _Next_ button. - * Type in the e-mail address, which you intend to use for communication with us. - * Don't forget your chosen _Revocation password_. + * Type in the e-mail address, which you intend to use for communication with us. + * Don't forget your chosen _Revocation password_. * You will receive an e-mail with link to collect your certificate. Be sure to open the link in the same browser, in which you submited the application. * Your browser should notify you, that the certificate has been correctly installed in it. Now you will need to save it as a file. * In Firefox navigate to _Options > Advanced > Certificates > View Certificates_. diff --git a/docs.it4i/index.md b/docs.it4i/index.md index 7e97161c12a16c0a8bec4540a77760cebf122063..b7a7bb2a724c74b121a8d0381d65881078b182fa 100644 --- a/docs.it4i/index.md +++ b/docs.it4i/index.md @@ -47,13 +47,13 @@ In this documentation, you will find a number of pages containing examples. We u Cluster command prompt -```bash +```console $ ``` Your local linux host command prompt -```bash +```console local $ ``` diff --git a/docs.it4i/salomon/capacity-computing.md b/docs.it4i/salomon/capacity-computing.md index 702ef7f5220722e447a562f2e1397cb6c79e85f4..39b4c029903b04c067c9f9e2d7e48d13fac3f133 100644 --- a/docs.it4i/salomon/capacity-computing.md +++ b/docs.it4i/salomon/capacity-computing.md @@ -41,7 +41,7 @@ Assume we have 900 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` @@ -78,7 +78,7 @@ If huge number of parallel multicore (in means of multinode multithread, e. g. M To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing/#array_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME -J 1-900 jobscript 506493[].isrv5 ``` @@ -87,7 +87,7 @@ In this example, we submit a job array of 900 subjobs. Each subjob will run on f Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround: -```bash +```console $ qsub -N JOBNAME -J 9-10:2 jobscript ``` @@ -97,7 +97,7 @@ This will only choose the lower index (9 in this example) for submitting/running Check status of the job array by the qstat command. -```bash +```console $ qstat -a 506493[].isrv5 isrv5: @@ -111,7 +111,7 @@ The status B means that some subjobs are already running. Check status of the first 100 subjobs by the qstat command. -```bash +```console $ qstat -a 12345[1-100].isrv5 isrv5: @@ -129,7 +129,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted. -```bash +```console $ qdel 12345[].isrv5 ``` @@ -137,13 +137,13 @@ Deleting large job arrays may take a while. Display status information for all user's jobs, job arrays, and subjobs. -```bash +```console $ qstat -u $USER -t ``` Display status information for all user's subjobs. -```bash +```console $ qstat -u $USER -tJ ``` @@ -158,7 +158,7 @@ GNU parallel is a shell tool for executing jobs in parallel using one or more co For more information and examples see the parallel man page: -```bash +```console $ module add parallel $ man parallel ``` @@ -173,7 +173,7 @@ Assume we have 101 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file, listing all tasks - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` @@ -211,7 +211,7 @@ In this example, tasks from tasklist are executed via the GNU parallel. The jobs To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing/#gp_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME jobscript 12345.dm2 ``` @@ -241,13 +241,13 @@ Assume we have 992 input files with name beginning with "file" (e. g. file001, . First, we create a tasklist file, listing all tasks - all input files in our example: -```bash +```console $ find . -name 'file*' > tasklist ``` Next we create a file, controlling how many tasks will be executed in one subjob -```bash +```console $ seq 32 > numtasks ``` @@ -296,7 +296,7 @@ When deciding this values, think about following guiding rules : To submit the job array, use the qsub -J command. The 992 tasks' job of the [example above](capacity-computing/#combined_example) may be submitted like this: -```bash +```console $ qsub -N JOBNAME -J 1-992:32 jobscript 12345[].dm2 ``` @@ -312,7 +312,7 @@ Download the examples in [capacity.zip](capacity.zip), illustrating the above li Unzip the archive in an empty directory on Anselm and follow the instructions in the README file -```bash +```console $ unzip capacity.zip $ cd capacity $ cat README diff --git a/docs.it4i/salomon/environment-and-modules.md b/docs.it4i/salomon/environment-and-modules.md index 9671013566e7621e42b2d0cdf693eed783f13197..5de3931c3d2b060d69a544343836c46caba20509 100644 --- a/docs.it4i/salomon/environment-and-modules.md +++ b/docs.it4i/salomon/environment-and-modules.md @@ -4,7 +4,7 @@ After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions and module loads in the .bashrc file -```bash +```console # ./bashrc # Source global definitions @@ -32,7 +32,7 @@ In order to configure your shell for running particular application on Salomon w Application modules on Salomon cluster are built using [EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). The modules are divided into the following structure: -```bash +```console base: Default module class bio: Bioinformatics, biology and biomedical cae: Computer Aided Engineering (incl. CFD) @@ -63,33 +63,33 @@ The modules may be loaded, unloaded and switched, according to momentary needs. To check available modules use -```bash -$ module avail +```console +$ module avail **or** ml av ``` To load a module, for example the Open MPI module use -```bash -$ module load OpenMPI +```console +$ module load OpenMPI **or** ml OpenMPI ``` loading the Open MPI module will set up paths and environment variables of your active shell such that you are ready to run the Open MPI software To check loaded modules use -```bash -$ module list +```console +$ module list **or** ml ``` To unload a module, for example the Open MPI module use -```bash -$ module unload OpenMPI +```console +$ module unload OpenMPI **or** ml -OpenMPI ``` Learn more on modules by reading the module man page -```bash +```console $ man module ``` diff --git a/docs.it4i/salomon/job-submission-and-execution.md b/docs.it4i/salomon/job-submission-and-execution.md index e7a4c4ff0039815504804e9f5fcb30959e8713e6..dea86065b70048af16b40dc9252525cfc0816de0 100644 --- a/docs.it4i/salomon/job-submission-and-execution.md +++ b/docs.it4i/salomon/job-submission-and-execution.md @@ -16,7 +16,7 @@ When allocating computational resources for the job, please specify Submit the job using the qsub command: -```bash +```console $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript ``` @@ -27,25 +27,25 @@ The qsub submits the job into the queue, in another words the qsub command creat ### Job Submission Examples -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=24,walltime=03:00:00 ./myjob ``` In this example, we allocate 64 nodes, 24 cores per node, for 3 hours. We allocate these resources via the qprod queue, consumed resources will be accounted to the Project identified by Project ID OPEN-0-0. Jobscript myjob will be executed on the first node in the allocation. -```bash +```console $ qsub -q qexp -l select=4:ncpus=24 -I ``` In this example, we allocate 4 nodes, 24 cores per node, for 1 hour. We allocate these resources via the qexp queue. The resources will be available interactively -```bash +```console $ qsub -A OPEN-0-0 -q qlong -l select=10:ncpus=24 ./myjob ``` In this example, we allocate 10 nodes, 24 cores per node, for 72 hours. We allocate these resources via the qlong queue. Jobscript myjob will be executed on the first node in the allocation. -```bash +```console $ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=24 ./myjob ``` @@ -57,13 +57,13 @@ To allocate a node with Xeon Phi co-processor, user needs to specify that in sel The absence of specialized queue for accessing the nodes with cards means, that the Phi cards can be utilized in any queue, including qexp for testing/experiments, qlong for longer jobs, qfree after the project resources have been spent, etc. The Phi cards are thus also available to PRACE users. There's no need to ask for permission to utilize the Phi cards in project proposals. -```bash +```console $ qsub -A OPEN-0-0 -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 ./myjob ``` In this example, we allocate 1 node, with 24 cores, with 2 Xeon Phi 7120p cards, running batch job ./myjob. The default time for qprod is used, e. g. 24 hours. -```bash +```console $ qsub -A OPEN-0-0 -I -q qlong -l select=4:ncpus=24:accelerator=True:naccelerators=2 -l walltime=56:00:00 -I ``` @@ -78,13 +78,13 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores The UV2000 (node uv1) offers 3328GB of RAM and 112 cores, distributed in 14 NUMA nodes. A NUMA node packs 8 cores and approx. 236GB RAM. In the PBS the UV2000 provides 14 chunks, a chunk per NUMA node (see [Resource allocation policy](resources-allocation-policy/)). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself. -```bash +```console $ qsub -A OPEN-0-0 -q qfat -l select=14 ./myjob ``` In this example, we allocate all 14 NUMA nodes (corresponds to 14 chunks), 112 cores of the SGI UV2000 node for 72 hours. Jobscript myjob will be executed on the node uv1. -```bash +```console $ qsub -A OPEN-0-0 -q qfat -l select=1:mem=2000GB ./myjob ``` @@ -94,13 +94,13 @@ In this example, we allocate 2000GB of memory on the UV2000 for 72 hours. By req All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such a case, no options to qsub are needed. -```bash +```console $ qsub ./myjob ``` By default, the PBS batch system sends an e-mail only when the job is aborted. Disabling mail events completely can be done like this: -```bash +```console $ qsub -m n ``` @@ -113,13 +113,13 @@ $ qsub -m n Specific nodes may be selected using PBS resource attribute host (for hostnames): -```bash +```console qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=r24u35n680+1:ncpus=24:host=r24u36n681 -I ``` Specific nodes may be selected using PBS resource attribute cname (for short names in cns[0-1]+ format): -```bash +```console qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=cns680+1:ncpus=24:host=cns681 -I ``` @@ -142,7 +142,7 @@ Nodes directly connected to the one InifiBand switch can be allocated using node In this example, we request all 9 nodes directly connected to the same switch using node grouping placement. -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob ``` @@ -155,13 +155,13 @@ Nodes directly connected to the specific InifiBand switch can be selected using In this example, we request all 9 nodes directly connected to r4i1s0sw1 switch. -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24:switch=r4i1s0sw1 ./myjob ``` List of all InifiBand switches: -```bash +```console $ qmgr -c 'print node @a' | grep switch | awk '{print $6}' | sort -u r1i0s0sw0 r1i0s0sw1 @@ -169,12 +169,11 @@ r1i1s0sw0 r1i1s0sw1 r1i2s0sw0 ... -... ``` List of all all nodes directly connected to the specific InifiBand switch: -```bash +```console $ qmgr -c 'p n @d' | grep 'switch = r36sw3' | awk '{print $3}' | sort r36u31n964 r36u32n965 @@ -203,7 +202,7 @@ Nodes located in the same dimension group may be allocated using node grouping o In this example, we allocate 16 nodes in the same [hypercube dimension](7d-enhanced-hypercube/) 1 group. -```bash +```console $ qsub -A OPEN-0-0 -q qprod -l select=16:ncpus=24 -l place=group=ehc_1d -I ``` @@ -211,7 +210,7 @@ For better understanding: List of all groups in dimension 1: -```bash +```console $ qmgr -c 'p n @d' | grep ehc_1d | awk '{print $6}' | sort |uniq -c 18 r1i0 18 r1i1 @@ -222,7 +221,7 @@ $ qmgr -c 'p n @d' | grep ehc_1d | awk '{print $6}' | sort |uniq -c List of all all nodes in specific dimension 1 group: -```bash +```console $ $ qmgr -c 'p n @d' | grep 'ehc_1d = r1i0' | awk '{print $3}' | sort r1i0n0 r1i0n1 @@ -236,7 +235,7 @@ r1i0n11 !!! note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands -```bash +```console $ qstat -a $ qstat -a -u username $ qstat -an -u username @@ -245,7 +244,7 @@ $ qstat -f 12345.isrv5 Example: -```bash +```console $ qstat -a srv11: @@ -261,7 +260,7 @@ In this example user1 and user2 are running jobs named job1, job2 and job3x. The Check status of your jobs using check-pbs-jobs command. Check presence of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output. -```bash +```console $ check-pbs-jobs --check-all $ check-pbs-jobs --print-load --print-processes $ check-pbs-jobs --print-job-out --print-job-err @@ -271,7 +270,7 @@ $ check-pbs-jobs --jobid JOBID --tailf-job-out Examples: -```bash +```console $ check-pbs-jobs --check-all JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Check session id: OK @@ -282,7 +281,7 @@ r3i6n3: No process In this example we see that job 35141.dm2 currently runs no process on allocated node r3i6n2, which may indicate an execution error. -```bash +```console $ check-pbs-jobs --print-load --print-processes JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Print load @@ -298,7 +297,7 @@ r3i6n2: 99.7 run-task In this example we see that job 35141.dm2 currently runs process run-task on node r3i6n2, using one thread only, while node r3i6n3 is empty, which may indicate an execution error. -```bash +```console $ check-pbs-jobs --jobid 35141.dm2 --print-job-out JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Print job standard output: @@ -317,19 +316,19 @@ In this example, we see actual output (some iteration loops) of the job 35141.dm You may release your allocation at any time, using qdel command -```bash +```console $ qdel 12345.isrv5 ``` You may kill a running job by force, using qsig command -```bash +```console $ qsig -s 9 12345.isrv5 ``` Learn more by reading the pbs man page -```bash +```console $ man pbs_professional ``` @@ -345,7 +344,7 @@ The Jobscript is a user made script, controlling sequence of commands for execut !!! note The jobscript or interactive shell is executed on first of the allocated nodes. -```bash +```console $ qsub -q qexp -l select=4:ncpus=24 -N Name0 ./myjob $ qstat -n -u username @@ -362,7 +361,7 @@ In this example, the nodes r21u01n577, r21u02n578, r21u03n579, r21u04n580 were a !!! note The jobscript or interactive shell is by default executed in home directory -```bash +```console $ qsub -q qexp -l select=4:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready @@ -380,7 +379,7 @@ The allocated nodes are accessible via ssh from login nodes. The nodes may acces Calculations on allocated nodes may be executed remotely via the MPI, ssh, pdsh or clush. You may find out which nodes belong to the allocation by reading the $PBS_NODEFILE file -```bash +```console qsub -q qexp -l select=2:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready diff --git a/docs.it4i/salomon/network.md b/docs.it4i/salomon/network.md index 2f3f8a09f474c12ffe961781c39ea6fbea260a46..91da0de5ee2114ca159ee722f6b5f7db212a9c0d 100644 --- a/docs.it4i/salomon/network.md +++ b/docs.it4i/salomon/network.md @@ -16,7 +16,7 @@ The network provides **2170MB/s** transfer rates via the TCP connection (single ## Example -```bash +```console $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username Req'd Req'd Elap @@ -28,14 +28,14 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time In this example, we access the node r4i1n0 by Infiniband network via the ib0 interface. -```bash +```console $ ssh 10.17.35.19 ``` In this example, we get information of the Infiniband network. -```bash +```console $ ifconfig .... inet addr:10.17.35.19.... diff --git a/docs.it4i/salomon/prace.md b/docs.it4i/salomon/prace.md index bcd9c4bd6b73795b026f41abd3e0d3bd5351251e..a3c80fa840dd1ff4ccb0dd17cd1f3d82001bdbdb 100644 --- a/docs.it4i/salomon/prace.md +++ b/docs.it4i/salomon/prace.md @@ -36,14 +36,14 @@ Most of the information needed by PRACE users accessing the Salomon TIER-1 syste Before you start to use any of the services don't forget to create a proxy certificate from your certificate: -```bash - $ grid-proxy-init +```console +$ grid-proxy-init ``` To check whether your proxy certificate is still valid (by default it's valid 12 hours), use: -```bash - $ grid-proxy-info +```console +$ grid-proxy-info ``` To access Salomon cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners). @@ -60,14 +60,14 @@ It is recommended to use the single DNS name salomon-prace.it4i.cz which is dist | login3-prace.salomon.it4i.cz | 2222 | gsissh | login3 | | login4-prace.salomon.it4i.cz | 2222 | gsissh | login4 | -```bash - $ gsissh -p 2222 salomon-prace.it4i.cz +```console +$ gsissh -p 2222 salomon-prace.it4i.cz ``` When logging from other PRACE system, the prace_service script can be used: -```bash - $ gsissh `prace_service -i -s salomon` +```console +$ gsissh `prace_service -i -s salomon` ``` #### Access From Public Internet: @@ -82,27 +82,24 @@ It is recommended to use the single DNS name salomon.it4i.cz which is distribute | login3-prace.salomon.it4i.cz | 2222 | gsissh | login3 | | login4-prace.salomon.it4i.cz | 2222 | gsissh | login4 | -```bash - $ gsissh -p 2222 salomon.it4i.cz +```console +$ gsissh -p 2222 salomon.it4i.cz ``` When logging from other PRACE system, the prace_service script can be used: -```bash - $ gsissh `prace_service -e -s salomon` +```console +$ gsissh `prace_service -e -s salomon` ``` Although the preferred and recommended file transfer mechanism is [using GridFTP](prace/#file-transfers), the GSI SSH implementation on Salomon supports also SCP, so for small files transfer gsiscp can be used: -```bash - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - - $ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ - - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - - $ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ +```console +$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ +$ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ ``` ### Access to X11 Applications (VNC) @@ -111,8 +108,8 @@ If the user needs to run X11 based graphical application and does not have a X11 If the user uses GSI SSH based access, then the procedure is similar to the SSH based access ([look here](../general/accessing-the-clusters/graphical-user-interface/x-window-system/)), only the port forwarding must be done using GSI SSH: -```bash - $ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961 +```console +$ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961 ``` ### Access With SSH @@ -138,26 +135,26 @@ There's one control server and three backend servers for striping and/or backup Copy files **to** Salomon by running the following commands on your local machine: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Copy files **from** Salomon: -```bash - $ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` ### Access From Public Internet @@ -171,26 +168,26 @@ Or by using prace_service script: Copy files **to** Salomon by running the following commands on your local machine: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ ``` Copy files **from** Salomon: -```bash - $ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Or by using prace_service script: -```bash - $ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +```console +$ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ ``` Generally both shared file systems are available through GridFTP: @@ -222,8 +219,8 @@ All system wide installed software on the cluster is made available to the users PRACE users can use the "prace" module to use the [PRACE Common Production Environment](http://www.prace-ri.eu/prace-common-production-environment/). -```bash - $ module load prace +```console +$ module load prace ``` ### Resource Allocation and Job Execution @@ -251,8 +248,8 @@ Users who have undergone the full local registration procedure (including signin !!! note The **it4ifree** command is a part of it4i.portal.clients package, [located here](https://pypi.python.org/pypi/it4i.portal.clients). -```bash - $ it4ifree +```console +$ it4ifree Password: PID Total Used ...by me Free -------- ------- ------ -------- ------- @@ -262,9 +259,9 @@ Users who have undergone the full local registration procedure (including signin By default file system quota is applied. To check the current status of the quota (separate for HOME and SCRATCH) use -```bash - $ quota - $ lfs quota -u USER_LOGIN /scratch +```console +$ quota +$ lfs quota -u USER_LOGIN /scratch ``` If the quota is insufficient, please contact the [support](prace/#help-and-support) and request an increase. diff --git a/docs.it4i/salomon/resources-allocation-policy.md b/docs.it4i/salomon/resources-allocation-policy.md index d705a527d4ed1e0988a4c76575687c23239e41de..296844a23f2b567b11f15ad2d79b630ac81f8fb5 100644 --- a/docs.it4i/salomon/resources-allocation-policy.md +++ b/docs.it4i/salomon/resources-allocation-policy.md @@ -46,13 +46,13 @@ Salomon users may check current queue configuration at <https://extranet.it4i.cz Display the queue status on Salomon: -```bash +```console $ qstat -q ``` The PBS allocation overview may be obtained also using the rspbs command. -```bash +```console $ rspbs Usage: rspbs [options] @@ -122,7 +122,7 @@ The resources that are currently subject to accounting are the core-hours. The c User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. -```bash +```console $ it4ifree Password: PID Total Used ...by me Free diff --git a/docs.it4i/salomon/shell-and-data-access.md b/docs.it4i/salomon/shell-and-data-access.md index 5d00acc957233f192d143143e5620da3326f6e36..c3aad60a094512084e56bd3b3f68f082bda37ee5 100644 --- a/docs.it4i/salomon/shell-and-data-access.md +++ b/docs.it4i/salomon/shell-and-data-access.md @@ -26,13 +26,13 @@ Private key authentication: On **Linux** or **Mac**, use -```bash +```console local $ ssh -i /path/to/id_rsa username@salomon.it4i.cz ``` If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this command to set lower permissions to private key file. -```bash +```console local $ chmod 600 /path/to/id_rsa ``` @@ -40,7 +40,7 @@ On **Windows**, use [PuTTY ssh client](../general/accessing-the-clusters/shell-a After logging in, you will see the command prompt: -```bash +```console _____ _ / ____| | | | (___ __ _| | ___ _ __ ___ ___ _ __ @@ -75,23 +75,23 @@ The authentication is by the [private key](../general/accessing-the-clusters/she On linux or Mac, use scp or sftp client to transfer the data to Salomon: -```bash +```console local $ scp -i /path/to/id_rsa my-local-file username@salomon.it4i.cz:directory/file ``` -```bash +```console local $ scp -i /path/to/id_rsa -r my-local-dir username@salomon.it4i.cz:directory ``` or -```bash +```console local $ sftp -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz ``` Very convenient way to transfer files in and out of the Salomon computer is via the fuse filesystem [sshfs](http://linux.die.net/man/1/sshfs) -```bash +```console local $ sshfs -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz:. mountpoint ``` @@ -99,7 +99,7 @@ Using sshfs, the users Salomon home directory will be mounted on your local comp Learn more on ssh, scp and sshfs by reading the manpages -```bash +```console $ man ssh $ man scp $ man sshfs @@ -136,7 +136,7 @@ It works by tunneling the connection from Salomon back to users workstation and Pick some unused port on Salomon login node (for example 6000) and establish the port forwarding: -```bash +```console local $ ssh -R 6000:remote.host.com:1234 salomon.it4i.cz ``` @@ -146,7 +146,7 @@ Port forwarding may be done **using PuTTY** as well. On the PuTTY Configuration Port forwarding may be established directly to the remote host. However, this requires that user has ssh access to remote.host.com -```bash +```console $ ssh -L 6000:localhost:1234 remote.host.com ``` @@ -160,7 +160,7 @@ First, establish the remote port forwarding form the login node, as [described a Second, invoke port forwarding from the compute node to the login node. Insert following line into your jobscript or interactive shell -```bash +```console $ ssh -TN -f -L 6000:localhost:6000 login1 ``` @@ -175,7 +175,7 @@ Port forwarding is static, each single port is mapped to a particular port on re To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: -```bash +```console local $ ssh -D 1080 localhost ``` @@ -183,7 +183,7 @@ On Windows, install and run the free, open source [Sock Puppet](http://sockspupp Once the proxy server is running, establish ssh port forwarding from Salomon to the proxy server, port 1080, exactly as [described above](#port-forwarding-from-login-nodes). -```bash +```console local $ ssh -R 6000:localhost:1080 salomon.it4i.cz ``` diff --git a/docs.it4i/salomon/software/ansys/ansys-fluent.md b/docs.it4i/salomon/software/ansys/ansys-fluent.md index 33e711b285cc8066604c43ebb7c943dcb1294fb6..27469a1c559355d1347ba3cfd76e303893caeb38 100644 --- a/docs.it4i/salomon/software/ansys/ansys-fluent.md +++ b/docs.it4i/salomon/software/ansys/ansys-fluent.md @@ -44,7 +44,7 @@ Working directory has to be created before sending pbs job into the queue. Input Journal file with definition of the input geometry and boundary conditions and defined process of solution has e.g. the following structure: -```bash +```console /file/read-case aircraft_2m.cas.gz /solve/init init @@ -58,7 +58,7 @@ The appropriate dimension of the problem has to be set by parameter (2d/3d). 1. Fast way to run Fluent from command line -```bash +```console fluent solver_version [FLUENT_options] -i journal_file -pbs ``` @@ -68,7 +68,7 @@ This syntax will start the ANSYS FLUENT job under PBS Professional using the qsu The sample script uses a configuration file called pbs_fluent.conf if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content of pbs_fluent.conf can be: -```bash +```console input="example_small.flin" case="Small-1.65m.cas" fluent_args="3d -pmyrinet" @@ -145,7 +145,7 @@ It runs the jobs out of the directory from which they are submitted (PBS_O_WORKD Fluent could be run in parallel only under Academic Research license. To do so this ANSYS Academic Research license must be placed before ANSYS CFD license in user preferences. To make this change anslic_admin utility should be run -```bash +```console /ansys_inc/shared_les/licensing/lic_admin/anslic_admin ``` diff --git a/docs.it4i/salomon/software/ansys/ansys.md b/docs.it4i/salomon/software/ansys/ansys.md index f93524a3e580f8a5c83302f8d1cd9997bb68c2be..d7e0f2e1444ddc77dd861a4cce4eef06b4c78a6c 100644 --- a/docs.it4i/salomon/software/ansys/ansys.md +++ b/docs.it4i/salomon/software/ansys/ansys.md @@ -6,8 +6,8 @@ Anselm provides as commercial as academic variants. Academic variants are distin To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module: -```bash - $ module load ansys +```console +$ ml ansys ``` ANSYS supports interactive regime, but due to assumed solution of extremely difficult tasks it is not recommended. diff --git a/docs.it4i/salomon/software/ansys/licensing.md b/docs.it4i/salomon/software/ansys/licensing.md index 04ff6513349ccede25a0846dd21227251e954732..eac78966d4b5183b2f0052d2ab6aea37f28eccc5 100644 --- a/docs.it4i/salomon/software/ansys/licensing.md +++ b/docs.it4i/salomon/software/ansys/licensing.md @@ -18,6 +18,7 @@ The licence intended to be used for science and research, publications, students * 16.1 * 17.0 +* 18.0 ## License Preferences diff --git a/docs.it4i/salomon/software/ansys/setting-license-preferences.md b/docs.it4i/salomon/software/ansys/setting-license-preferences.md index fe14541d46b1fe4cab38eb7b883c58e40e03dd32..b3f594d14863cde6aaa28f7a5139223d30a7d95b 100644 --- a/docs.it4i/salomon/software/ansys/setting-license-preferences.md +++ b/docs.it4i/salomon/software/ansys/setting-license-preferences.md @@ -6,8 +6,8 @@ Thus you need to configure preferred license order with ANSLIC_ADMIN. Please fol Launch the ANSLIC_ADMIN utility in a graphical environment: -```bash - $ANSYSLIC_DIR/lic_admin/anslic_admin +```console +$ANSYSLIC_DIR/lic_admin/anslic_admin ``` ANSLIC_ADMIN Utility will be run diff --git a/docs.it4i/salomon/software/ansys/workbench.md b/docs.it4i/salomon/software/ansys/workbench.md index 8ed07d789dea69798e68c177ac1612a3e391ec88..1b138ccd09fa64fd6ccbafbcb40ff14b2959bad4 100644 --- a/docs.it4i/salomon/software/ansys/workbench.md +++ b/docs.it4i/salomon/software/ansys/workbench.md @@ -8,7 +8,7 @@ It is possible to run Workbench scripts in batch mode. You need to configure sol Enable Distribute Solution checkbox and enter number of cores (eg. 48 to run on two Salomon nodes). If you want the job to run on more then 1 node, you must also provide a so called MPI appfile. In the Additional Command Line Arguments input field, enter: -```bash +```console -mpifile /path/to/my/job/mpifile.txt ``` diff --git a/docs.it4i/salomon/software/chemistry/nwchem.md b/docs.it4i/salomon/software/chemistry/nwchem.md index a26fc701ee44585dbab1f942685b92d9190adfa5..add429da99d2044e2ddaa64d29350e766c558bc2 100644 --- a/docs.it4i/salomon/software/chemistry/nwchem.md +++ b/docs.it4i/salomon/software/chemistry/nwchem.md @@ -15,8 +15,8 @@ The following versions are currently installed: For a current list of installed versions, execute: -```bash - module avail NWChem +```console +$ ml av NWChem ``` The recommend to use version 6.5. Version 6.3 fails on Salomon nodes with accelerator, because it attempts to communicate over scif0 interface. In 6.5 this is avoided by setting ARMCI_OPENIB_DEVICE=mlx4_0, this setting is included in the module. diff --git a/docs.it4i/salomon/software/chemistry/phono3py.md b/docs.it4i/salomon/software/chemistry/phono3py.md index 3f747d23bc9775f80137c0d6e4f1b4821d97439b..5f366baa1e6acb0cb948cd473a9acb65243691c8 100644 --- a/docs.it4i/salomon/software/chemistry/phono3py.md +++ b/docs.it4i/salomon/software/chemistry/phono3py.md @@ -4,11 +4,14 @@ This GPL software calculates phonon-phonon interactions via the third order force constants. It allows to obtain lattice thermal conductivity, phonon lifetime/linewidth, imaginary part of self energy at the lowest order, joint density of states (JDOS) and weighted-JDOS. For details see Phys. Rev. B 91, 094306 (2015) and <http://atztogo.github.io/phono3py/index.html> -!!! note - Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module +Available modules -```bash -$ module load phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 +```console +$ ml av phono3py +``` + +```console +$ ml phono3py ``` ## Example of Calculating Thermal Conductivity of Si Using VASP Code. @@ -17,7 +20,7 @@ $ module load phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 One needs to calculate second order and third order force constants using the diamond structure of silicon stored in [POSCAR](poscar-si) (the same form as in VASP) using single displacement calculations within supercell. -```bash +```console $ cat POSCAR Si 1.0 @@ -39,14 +42,14 @@ Direct ### Generating Displacement Using 2 by 2 by 2 Supercell for Both Second and Third Order Force Constants -```bash +```console $ phono3py -d --dim="2 2 2" -c POSCAR ``` 111 displacements is created stored in disp_fc3.yaml, and the structure input files with this displacements are POSCAR-00XXX, where the XXX=111. -```bash +```console disp_fc3.yaml POSCAR-00008 POSCAR-00017 POSCAR-00026 POSCAR-00035 POSCAR-00044 POSCAR-00053 POSCAR-00062 POSCAR-00071 POSCAR-00080 POSCAR-00089 POSCAR-00098 POSCAR-00107 POSCAR POSCAR-00009 POSCAR-00018 POSCAR-00027 POSCAR-00036 POSCAR-00045 POSCAR-00054 POSCAR-00063 POSCAR-00072 POSCAR-00081 POSCAR-00090 POSCAR-00099 POSCAR-00108 POSCAR-00001 POSCAR-00010 POSCAR-00019 POSCAR-00028 POSCAR-00037 POSCAR-00046 POSCAR-00055 POSCAR-00064 POSCAR-00073 POSCAR-00082 POSCAR-00091 POSCAR-00100 POSCAR-00109 @@ -60,7 +63,7 @@ POSCAR-00007 POSCAR-00016 POSCAR-00025 POSCAR-00034 POSCAR-00043 POSCAR-00052 For each displacement the forces needs to be calculated, i.e. in form of the output file of VASP (vasprun.xml). For a single VASP calculations one needs [KPOINTS](KPOINTS), [POTCAR](POTCAR), [INCAR](INCAR) in your case directory (where you have POSCARS) and those 111 displacements calculations can be generated by [prepare.sh](prepare.sh) script. Then each of the single 111 calculations is submitted [run.sh](run.sh) by [submit.sh](submit.sh). -```bash +```console $./prepare.sh $ls disp-00001 disp-00009 disp-00017 disp-00025 disp-00033 disp-00041 disp-00049 disp-00057 disp-00065 disp-00073 disp-00081 disp-00089 disp-00097 disp-00105 INCAR @@ -75,7 +78,7 @@ disp-00008 disp-00016 disp-00024 disp-00032 disp-00040 disp-00048 disp-00056 dis Taylor your run.sh script to fit into your project and other needs and submit all 111 calculations using submit.sh script -```bash +```console $ ./submit.sh ``` @@ -83,13 +86,13 @@ $ ./submit.sh Once all jobs are finished and vasprun.xml is created in each disp-XXXXX directory the collection is done by -```bash +```console $ phono3py --cf3 disp-{00001..00111}/vasprun.xml ``` and `disp_fc2.yaml, FORCES_FC2`, `FORCES_FC3` and disp_fc3.yaml should appear and put into the hdf format by -```bash +```console $ phono3py --dim="2 2 2" -c POSCAR ``` @@ -99,13 +102,13 @@ resulting in `fc2.hdf5` and `fc3.hdf5` The phonon lifetime calculations takes some time, however is independent on grid points, so could be splitted: -```bash +```console $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" --sigma 0.1 --wgp ``` ### Inspecting ir_grid_points.yaml -```bash +```console $ grep grid_point ir_grid_points.yaml num_reduced_ir_grid_points: 35 ir_grid_points: # [address, weight] @@ -148,18 +151,18 @@ ir_grid_points: # [address, weight] one finds which grid points needed to be calculated, for instance using following -```bash +```console $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" -c POSCAR --sigma 0.1 --br --write-gamma --gp="0 1 2 ``` one calculates grid points 0, 1, 2. To automize one can use for instance scripts to submit 5 points in series, see [gofree-cond1.sh](gofree-cond1.sh) -```bash +```console $ qsub gofree-cond1.sh ``` Finally the thermal conductivity result is produced by grouping single conductivity per grid calculations using -```bash +```console $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" --br --read_gamma ``` diff --git a/docs.it4i/salomon/software/compilers.md b/docs.it4i/salomon/software/compilers.md index 8e62965ff71b3afbd4e178c5019a0101597401b5..a49aa8eb4dfa2d832572e8c225b6ceccdd84bc82 100644 --- a/docs.it4i/salomon/software/compilers.md +++ b/docs.it4i/salomon/software/compilers.md @@ -29,25 +29,25 @@ For information about the usage of Intel Compilers and other Intel products, ple The Portland Group Cluster Development Kit (PGI CDK) is available. -```bash - $ module load PGI - $ pgcc -v - $ pgc++ -v - $ pgf77 -v - $ pgf90 -v - $ pgf95 -v - $ pghpf -v +```console +$ module load PGI +$ pgcc -v +$ pgc++ -v +$ pgf77 -v +$ pgf90 -v +$ pgf95 -v +$ pghpf -v ``` The PGI CDK also incudes tools for debugging and profiling. PGDBG OpenMP/MPI debugger and PGPROF OpenMP/MPI profiler are available -```bash - $ module load PGI - $ module load Java - $ pgdbg & - $ pgprof & +```console +$ module load PGI +$ module load Java +$ pgdbg & +$ pgprof & ``` For more information, see the [PGI page](http://www.pgroup.com/products/pgicdk.htm). @@ -58,21 +58,21 @@ For compatibility reasons there are still available the original (old 4.4.7-11) It is strongly recommended to use the up to date version which comes with the module GCC: -```bash - $ module load GCC - $ gcc -v - $ g++ -v - $ gfortran -v +```console +$ module load GCC +$ gcc -v +$ g++ -v +$ gfortran -v ``` With the module loaded two environment variables are predefined. One for maximum optimizations on the cluster's architecture, and the other for debugging purposes: -```bash - $ echo $OPTFLAGS - -O3 -march=native +```console +$ echo $OPTFLAGS +-O3 -march=native - $ echo $DEBUGFLAGS - -O0 -g +$ echo $DEBUGFLAGS +-O0 -g ``` For more information about the possibilities of the compilers, please see the man pages. @@ -88,41 +88,41 @@ UPC is supported by two compiler/runtime implementations: To use the GNU UPC compiler and run the compiled binaries use the module gupc -```bash - $ module add gupc - $ gupc -v - $ g++ -v +```console +$ module add gupc +$ gupc -v +$ g++ -v ``` Simple program to test the compiler -```bash - $ cat count.upc - - /* hello.upc - a simple UPC example */ - #include <upc.h> - #include <stdio.h> - - int main() { - if (MYTHREAD == 0) { - printf("Welcome to GNU UPC!!!n"); - } - upc_barrier; - printf(" - Hello from thread %in", MYTHREAD); - return 0; - } +```cpp +$ cat count.upc + +/* hello.upc - a simple UPC example */ +#include <upc.h> +#include <stdio.h> + +int main() { + if (MYTHREAD == 0) { + printf("Welcome to GNU UPC!!!n"); + } + upc_barrier; + printf(" - Hello from thread %in", MYTHREAD); + return 0; +} ``` To compile the example use -```bash - $ gupc -o count.upc.x count.upc +```console +$ gupc -o count.upc.x count.upc ``` To run the example with 5 threads issue -```bash - $ ./count.upc.x -fupc-threads-5 +```console +$ ./count.upc.x -fupc-threads-5 ``` For more information see the man pages. @@ -131,9 +131,9 @@ For more information see the man pages. To use the Berkley UPC compiler and runtime environment to run the binaries use the module bupc -```bash - $ module add BerkeleyUPC/2.16.2-gompi-2015b - $ upcc -version +```console +$ module add BerkeleyUPC/2.16.2-gompi-2015b +$ upcc -version ``` As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only. @@ -145,41 +145,41 @@ For production runs, it is recommended to use the native InfiniBand implementati Example UPC code: -```bash - $ cat hello.upc - - /* hello.upc - a simple UPC example */ - #include <upc.h> - #include <stdio.h> - - int main() { - if (MYTHREAD == 0) { - printf("Welcome to Berkeley UPC!!!n"); - } - upc_barrier; - printf(" - Hello from thread %in", MYTHREAD); - return 0; - } +```cpp +$ cat hello.upc + +/* hello.upc - a simple UPC example */ +#include <upc.h> +#include <stdio.h> + +int main() { + if (MYTHREAD == 0) { + printf("Welcome to Berkeley UPC!!!n"); + } + upc_barrier; + printf(" - Hello from thread %in", MYTHREAD); + return 0; +} ``` To compile the example with the "ibv" UPC network use -```bash - $ upcc -network=ibv -o hello.upc.x hello.upc +```console +$ upcc -network=ibv -o hello.upc.x hello.upc ``` To run the example with 5 threads issue -```bash - $ upcrun -n 5 ./hello.upc.x +```console +$ upcrun -n 5 ./hello.upc.x ``` To run the example on two compute nodes using all 48 cores, with 48 threads, issue -```bash - $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=24 - $ module add bupc - $ upcrun -n 48 ./hello.upc.x +```console +$ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=24 +$ module add bupc +$ upcrun -n 48 ./hello.upc.x ``` For more information see the man pages. diff --git a/docs.it4i/salomon/software/comsol/comsol-multiphysics.md b/docs.it4i/salomon/software/comsol/comsol-multiphysics.md index ca79d5235ae1bd9afb4299a45d5e2d57a79cba24..431294469311b408c9e023c17347cae239037622 100644 --- a/docs.it4i/salomon/software/comsol/comsol-multiphysics.md +++ b/docs.it4i/salomon/software/comsol/comsol-multiphysics.md @@ -22,22 +22,22 @@ On the clusters COMSOL is available in the latest stable version. There are two To load the of COMSOL load the module -```bash -$ module load COMSOL/51-EDU +```console +$ ml COMSOL/51-EDU ``` By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use -```bash -$ module avail COMSOL +```console +$ ml av COMSOL ``` If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the [Virtual Network Computing (VNC)](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/). -```bash +```console $ xhost + $ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ppn=24 -$ module load COMSOL +$ ml COMSOL $ comsol ``` @@ -76,11 +76,11 @@ COMSOL is the software package for the numerical solution of the partial differe LiveLink for MATLAB is available in both **EDU** and **COM** **variant** of the COMSOL release. On the clusters 1 commercial (**COM**) license and the 5 educational (**EDU**) licenses of LiveLink for MATLAB (please see the [ISV Licenses](../../../anselm/software/isv_licenses/)) are available. Following example shows how to start COMSOL model from MATLAB via LiveLink in the interactive mode. -```bash +```console $ xhost + $ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ppn=24 -$ module load MATLAB -$ module load COMSOL +$ ml MATLAB +$ ml COMSOL $ comsol server MATLAB ``` diff --git a/docs.it4i/salomon/software/debuggers/Introduction.md b/docs.it4i/salomon/software/debuggers/Introduction.md index a5c9cfb60154fbaf13faebaf15a508597b40703f..4ce2fc77b013659f5b128408e4ec5f0e78c9c686 100644 --- a/docs.it4i/salomon/software/debuggers/Introduction.md +++ b/docs.it4i/salomon/software/debuggers/Introduction.md @@ -10,9 +10,9 @@ Intel debugger is no longer available since Parallel Studio version 2015 The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. -```bash - $ module load intel - $ idb +```console +$ ml intel +$ idb ``` Read more at the [Intel Debugger](../intel-suite/intel-debugger/) page. @@ -21,9 +21,9 @@ Read more at the [Intel Debugger](../intel-suite/intel-debugger/) page. Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation. -```bash - $ module load Forge - $ forge +```console +$ ml Forge +$ forge ``` Read more at the [Allinea DDT](allinea-ddt/) page. @@ -32,9 +32,9 @@ Read more at the [Allinea DDT](allinea-ddt/) page. Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. -```bash - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 +```console +$ ml PerformanceReports/6.0 +$ perf-report mpirun -n 64 ./my_application argument01 argument02 ``` Read more at the [Allinea Performance Reports](allinea-performance-reports/) page. @@ -43,9 +43,9 @@ Read more at the [Allinea Performance Reports](allinea-performance-reports/) pag TotalView is a source- and machine-level debugger for multi-process, multi-threaded programs. Its wide range of tools provides ways to analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. -```bash - $ module load TotalView/8.15.4-6-linux-x86-64 - $ totalview +```console +$ ml TotalView/8.15.4-6-linux-x86-64 +$ totalview ``` Read more at the [Totalview](total-view/) page. @@ -54,8 +54,8 @@ Read more at the [Totalview](total-view/) page. Vampir is a GUI trace analyzer for traces in OTF format. -```bash - $ module load Vampir/8.5.0 +```console + $ ml Vampir/8.5.0 $ vampir ``` diff --git a/docs.it4i/salomon/software/debuggers/aislinn.md b/docs.it4i/salomon/software/debuggers/aislinn.md index e1dee28b8d6d78ef7be2371afb2f8884f2b5f364..89cf7538016c004b1ba9058bcf148bbf0761eb50 100644 --- a/docs.it4i/salomon/software/debuggers/aislinn.md +++ b/docs.it4i/salomon/software/debuggers/aislinn.md @@ -49,13 +49,13 @@ The program does the following: process 0 receives two messages from anyone and To verify this program by Aislinn, we first load Aislinn itself: -```bash -$ module load aislinn +```console +$ ml aislinn ``` Now we compile the program by Aislinn implementation of MPI. There are `mpicc` for C programs and `mpicxx` for C++ programs. Only MPI parts of the verified application has to be recompiled; non-MPI parts may remain untouched. Let us assume that our program is in `test.cpp`. -```bash +```console $ mpicc -g test.cpp -o test ``` @@ -63,7 +63,7 @@ The `-g` flag is not necessary, but it puts more debugging information into the Now we run the Aislinn itself. The argument `-p 3` specifies that we want to verify our program for the case of three MPI processes -```bash +```console $ aislinn -p 3 ./test ==AN== INFO: Aislinn v0.3.0 ==AN== INFO: Found error 'Invalid write' @@ -73,8 +73,8 @@ $ aislinn -p 3 ./test Aislinn found an error and produced HTML report. To view it, we can use any browser, e.g.: -```bash - $ firefox report.html +```console +$ firefox report.html ``` At the beginning of the report there are some basic summaries of the verification. In the second part (depicted in the following picture), the error is described. diff --git a/docs.it4i/salomon/software/debuggers/allinea-ddt.md b/docs.it4i/salomon/software/debuggers/allinea-ddt.md index 41dd4c6e8266e257a425c0e7a8b54330c38ccf04..6e1f046f10fd2d521343a995cb59580440080a73 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-ddt.md +++ b/docs.it4i/salomon/software/debuggers/allinea-ddt.md @@ -24,22 +24,21 @@ In case of debugging on accelerators: Load all necessary modules to compile the code. For example: -```bash - $ module load intel - $ module load impi ... or ... module load openmpi/X.X.X-icc +```console +$ ml intel +$ ml impi **or** ml OpenMPI/X.X.X-icc ``` Load the Allinea DDT module: -```bash - $ module load Forge +```console +$ ml Forge ``` Compile the code: -```bash +```console $ mpicc -g -O0 -o test_debug test.c - $ mpif90 -g -O0 -o test_debug test.f ``` @@ -56,22 +55,22 @@ Before debugging, you need to compile your code with theses flags: Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -```bash - $ ssh -X username@anselm.it4i.cz +```console +$ ssh -X username@anselm.it4i.cz ``` Other options is to access login node using VNC. Please see the detailed information on how to [use graphic user interface on Anselm](/general/accessing-the-clusters/graphical-user-interface/x-window-system/) From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command: -```bash - $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +```console +$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 ``` Then launch the debugger with the ddt command followed by the name of the executable to debug: -```bash - $ ddt test_debug +```console +$ ddt test_debug ``` A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. Command line arguments to a program can be entered to the "Arguments " box. @@ -80,16 +79,16 @@ A submission window that appears have a prefilled path to the executable to debu To start the debugging directly without the submission window, user can specify the debugging and execution parameters from the command line. For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help". -```bash - ddt -start -np 4 ./hello_debug_impi +```console +ddt -start -np 4 ./hello_debug_impi ``` ## Documentation Users can find original User Guide after loading the DDT module: -```bash - $DDTPATH/doc/userguide.pdf +```console +$DDTPATH/doc/userguide.pdf ``` [1] Discipline, Magic, Inspiration and Science: Best Practice Debugging with Allinea DDT, Workshop conducted at LLNL by Allinea on May 10, 2013, [link](https://computing.llnl.gov/tutorials/allineaDDT/index.html) diff --git a/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md b/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md index 3d0826e994bb6434b9cd0cd100249393191c03d3..ead91a093c83ba9503f2be7ba702e698d7bca0df 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/salomon/software/debuggers/allinea-performance-reports.md @@ -12,8 +12,8 @@ Our license is limited to 64 MPI processes. Allinea Performance Reports version 6.0 is available -```bash - $ module load PerformanceReports/6.0 +```console +$ ml PerformanceReports/6.0 ``` The module sets up environment variables, required for using the Allinea Performance Reports. @@ -24,8 +24,8 @@ Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi/mpi/), use the the perf report wrapper: -```bash - $ perf-report mpirun ./mympiprog.x +```console +$ perf-report mpirun ./mympiprog.x ``` The mpi program will run as usual. The perf-report creates two additional files, in \*.txt and \*.html format, containing the performance report. Note that demanding MPI codes should be run within [the queue system](../../job-submission-and-execution/). @@ -36,23 +36,24 @@ In this example, we will be profiling the mympiprog.x MPI program, using Allinea First, we allocate some nodes via the express queue: -```bash - $ qsub -q qexp -l select=2:ppn=24:mpiprocs=24:ompthreads=1 -I +```console +$ qsub -q qexp -l select=2:ppn=24:mpiprocs=24:ompthreads=1 -I qsub: waiting for job 262197.dm2 to start qsub: job 262197.dm2 ready ``` Then we load the modules and run the program the usual way: -```bash - $ module load intel impi PerfReports/6.0 - $ mpirun ./mympiprog.x +```console +$ ml intel +$ ml PerfReports/6.0 +$ mpirun ./mympiprog.x ``` Now lets profile the code: -```bash - $ perf-report mpirun ./mympiprog.x +```console +$ perf-report mpirun ./mympiprog.x ``` Performance report files [mympiprog_32p\*.txt](mympiprog_32p_2014-10-15_16-56.txt) and [mympiprog_32p\*.html](mympiprog_32p_2014-10-15_16-56.html) were created. We can see that the code is very efficient on MPI and is CPU bounded. diff --git a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md index 2fdbd18e166d3e553a8ad5719f7945f902cbd73c..192aece7e250dfb9b2938daebe83606a1f002b06 100644 --- a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md @@ -15,14 +15,14 @@ Intel *®* VTuneâ„¢ Amplifier, part of Intel Parallel studio, is a GUI profiling To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on the login nodes, thus direct profiling on login nodes is not possible. By default, the kernel modules ale not loaded on compute nodes neither. In order to have the modules loaded, you need to specify vtune=version PBS resource at job submit. The version is the same as for environment module. For example to use VTune/2016_update1: -```bash - $ qsub -q qexp -A OPEN-0-0 -I -l select=1,vtune=2016_update1 +```console +$ qsub -q qexp -A OPEN-0-0 -I -l select=1,vtune=2016_update1 ``` After that, you can verify the modules sep\*, pax and vtsspp are present in the kernel : -```bash - $ lsmod | grep -e sep -e pax -e vtsspp +```console +$ lsmod | grep -e sep -e pax -e vtsspp vtsspp 362000 0 sep3_15 546657 0 pax 4312 0 @@ -30,14 +30,14 @@ After that, you can verify the modules sep\*, pax and vtsspp are present in the To launch the GUI, first load the module: -```bash - $ module add VTune/2016_update1 +```console +$ module add VTune/2016_update1 ``` and launch the GUI : -```bash - $ amplxe-gui +```console +$ amplxe-gui ``` The GUI will open in new window. Click on "New Project..." to create a new project. After clicking OK, a new window with project properties will appear. At "Application:", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "Managed code profiling mode:" select "Native" (unless you want to profile managed mode .NET/Mono applications). After clicking OK, your project is created. @@ -50,8 +50,8 @@ VTune Amplifier also allows a form of remote analysis. In this mode, data for an The command line will look like this: -```bash - /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -app-working-dir /home/sta545/tmp -- /home/sta545/tmp/sgemm +```console +/apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -app-working-dir /home/sta545/tmp -- /home/sta545/tmp/sgemm ``` Copy the line to clipboard and then you can paste it in your jobscript or in command line. After the collection is run, open the GUI once again, click the menu button in the upper right corner, and select "Open > Result...". The GUI will load the results from the run. @@ -75,14 +75,14 @@ You may also use remote analysis to collect data from the MIC and then analyze i Native launch: -```bash - $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-native:0 -collect advanced-hotspots -- /home/sta545/tmp/vect-add-mic +```console +$ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-native:0 -collect advanced-hotspots -- /home/sta545/tmp/vect-add-mic ``` Host launch: -```bash - $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-host-launch:0 -collect advanced-hotspots -- /home/sta545/tmp/sgemm +```console +$ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-host-launch:0 -collect advanced-hotspots -- /home/sta545/tmp/sgemm ``` You can obtain this command line by pressing the "Command line..." button on Analysis Type screen. diff --git a/docs.it4i/salomon/software/debuggers/total-view.md b/docs.it4i/salomon/software/debuggers/total-view.md index f4f69278ff59e8f2cd35aad8b5c79bf78a4a0171..0235c845d012f4c0f5245e7ae2c5f8d96b6efe3c 100644 --- a/docs.it4i/salomon/software/debuggers/total-view.md +++ b/docs.it4i/salomon/software/debuggers/total-view.md @@ -6,7 +6,7 @@ TotalView is a GUI-based source code multi-process, multi-thread debugger. On the cluster users can debug OpenMP or MPI code that runs up to 64 parallel processes. These limitation means that: -```bash +```console 1 user can debug up 64 processes, or 32 users can debug 2 processes, etc. ``` @@ -21,23 +21,20 @@ You can check the status of the licenses [here](https://extranet.it4i.cz/rsweb/a Load all necessary modules to compile the code. For example: -```bash - module load intel - - module load impi ... or ... module load OpenMPI/X.X.X-icc +```console + ml intel ``` Load the TotalView module: -```bash - module load TotalView/8.15.4-6-linux-x86-64 +```console + ml TotalView/8.15.4-6-linux-x86-64 ``` Compile the code: -```bash +```console mpicc -g -O0 -o test_debug test.c - mpif90 -g -O0 -o test_debug test.f ``` @@ -54,16 +51,16 @@ Before debugging, you need to compile your code with theses flags: Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -```bash - ssh -X username@salomon.it4i.cz +```console +ssh -X username@salomon.it4i.cz ``` Other options is to access login node using VNC. Please see the detailed information on how to use graphic user interface on Anselm. From the login node an interactive session with X windows forwarding (-X option) can be started by following command: -```bash - qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 +```console +$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 ``` Then launch the debugger with the totalview command followed by the name of the executable to debug. @@ -72,8 +69,8 @@ Then launch the debugger with the totalview command followed by the name of the To debug a serial code use: -```bash - totalview test_debug +```console +totalview test_debug ``` ### Debugging a Parallel Code - Option 1 @@ -83,7 +80,7 @@ To debug a parallel code compiled with **OpenMPI** you need to setup your TotalV !!! hint To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file. -```bash +```console proc mpi_auto_run_starter {loaded_id} { set starter_programs {mpirun mpiexec orterun} set executable_name [TV::symbol get $loaded_id full_pathname] @@ -105,23 +102,23 @@ To debug a parallel code compiled with **OpenMPI** you need to setup your TotalV The source code of this function can be also found in -```bash - /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl +```console +$ /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl ``` You can also add only following line to you ~/.tvdrc file instead of the entire function: -```bash -source /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl +```console +$ source /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl ``` You need to do this step only once. See also [OpenMPI FAQ entry](https://www.open-mpi.org/faq/?category=running#run-with-tv) Now you can run the parallel debugger using: -```bash - mpirun -tv -n 5 ./test_debug +```console +$ mpirun -tv -n 5 ./test_debug ``` When following dialog appears click on "Yes" @@ -138,10 +135,10 @@ Other option to start new parallel debugging session from a command line is to l The following example shows how to start debugging session with Intel MPI: -```bash - module load intel/2015b-intel-2015b impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25 TotalView/8.15.4-6-linux-x86-64 - - totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi +```console +$ ml intel +$ ml TotalView/8.15.4-6-linux-x86-64 +$ totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi ``` After running previous command you will see the same window as shown in the screenshot above. diff --git a/docs.it4i/salomon/software/debuggers/valgrind.md b/docs.it4i/salomon/software/debuggers/valgrind.md index 430118785a08bc43e67a4711396f9ac6b63c4afb..188f98502862effe90495934c6288aa64b042318 100644 --- a/docs.it4i/salomon/software/debuggers/valgrind.md +++ b/docs.it4i/salomon/software/debuggers/valgrind.md @@ -47,9 +47,9 @@ For example, lets look at this C code, which has two problems: Now, compile it with Intel compiler: -```bash - $ module add intel - $ icc -g valgrind-example.c -o valgrind-example +```console +$ module add intel +$ icc -g valgrind-example.c -o valgrind-example ``` Now, lets run it with Valgrind. The syntax is: @@ -58,8 +58,8 @@ valgrind [valgrind options] < your program binary > [your program options] If no Valgrind options are specified, Valgrind defaults to running Memcheck tool. Please refer to the Valgrind documentation for a full description of command line options. -```bash - $ valgrind ./valgrind-example +```console +$ valgrind ./valgrind-example ==12652== Memcheck, a memory error detector ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info @@ -92,8 +92,8 @@ If no Valgrind options are specified, Valgrind defaults to running Memcheck tool In the output we can see that Valgrind has detected both errors - the off-by-one memory access at line 5 and a memory leak of 40 bytes. If we want a detailed analysis of the memory leak, we need to run Valgrind with --leak-check=full option: -```bash - $ valgrind --leak-check=full ./valgrind-example +```console +$ valgrind --leak-check=full ./valgrind-example ==23856== Memcheck, a memory error detector ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info @@ -134,13 +134,13 @@ Now we can see that the memory leak is due to the malloc() at line 6. Although Valgrind is not primarily a parallel debugger, it can be used to debug parallel applications as well. When launching your parallel applications, prepend the valgrind command. For example: -```bash - $ mpirun -np 4 valgrind myapplication +```console +$ mpirun -np 4 valgrind myapplication ``` The default version without MPI support will however report a large number of false errors in the MPI library, such as: -```bash +```console ==30166== Conditional jump or move depends on uninitialised value(s) ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) @@ -181,16 +181,16 @@ Lets look at this MPI example: There are two errors - use of uninitialized memory and invalid length of the buffer. Lets debug it with valgrind : -```bash - $ module add intel impi - $ mpiicc -g valgrind-example-mpi.c -o valgrind-example-mpi - $ module add Valgrind/3.11.0-intel-2015b - $ mpirun -np 2 -env LD_PRELOAD $EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi +```console +$ module add intel impi +$ mpiicc -g valgrind-example-mpi.c -o valgrind-example-mpi +$ module add Valgrind/3.11.0-intel-2015b +$ mpirun -np 2 -env LD_PRELOAD $EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi ``` Prints this output : (note that there is output printed for every launched MPI process) -```bash +```console ==31318== Memcheck, a memory error detector ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info diff --git a/docs.it4i/salomon/software/debuggers/vampir.md b/docs.it4i/salomon/software/debuggers/vampir.md index 99053546c14b43c51d5ab7728dfa3824f2016170..852374d229d2c4f4a2e4c612c85d25b1c121faf0 100644 --- a/docs.it4i/salomon/software/debuggers/vampir.md +++ b/docs.it4i/salomon/software/debuggers/vampir.md @@ -6,11 +6,13 @@ Vampir is a commercial trace analysis and visualisation tool. It can work with t ## Installed Versions -Version 8.5.0 is currently installed as module Vampir/8.5.0 : +```console +$ ml av Vampir +``` -```bash - $ module load Vampir/8.5.0 - $ vampir & +```console +$ ml Vampir +$ vampir & ``` ## User Manual diff --git a/docs.it4i/salomon/software/intel-suite/intel-advisor.md b/docs.it4i/salomon/software/intel-suite/intel-advisor.md index 427f5c98cfccf29de4870043c08074ac1a246135..688deda17708cc23578fd50dc6063fb7716c5858 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-advisor.md +++ b/docs.it4i/salomon/software/intel-suite/intel-advisor.md @@ -16,8 +16,8 @@ Profiling is possible either directly from the GUI, or from command line. To profile from GUI, launch Advisor: -```bash - $ advixe-gui +```console +$ advixe-gui ``` Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will appear, where you can configure path to your binary, launch arguments, working directory etc. After clicking OK, the project is ready. diff --git a/docs.it4i/salomon/software/intel-suite/intel-compilers.md b/docs.it4i/salomon/software/intel-suite/intel-compilers.md index 63a05bd91e15c04afa6a3cc8d21231ba030437bc..8e2ee714f6e5c61ec8b4e3b4522a3a06fdd11f46 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-compilers.md +++ b/docs.it4i/salomon/software/intel-suite/intel-compilers.md @@ -2,28 +2,28 @@ The Intel compilers in multiple versions are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. -```bash - $ module load intel - $ icc -v - $ ifort -v +```console +$ ml intel +$ icc -v +$ ifort -v ``` The intel compilers provide for vectorization of the code, via the AVX2 instructions and support threading parallelization via OpenMP For maximum performance on the Salomon cluster compute nodes, compile your programs using the AVX2 instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance -```bash - $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x +```console +$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x ``` In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggresive loop optimizations (-O3) and vectorization (-xCORE-AVX2) The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX2 vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch. -```bash - $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x +```console +$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x ``` Read more at <https://software.intel.com/en-us/intel-cplusplus-compiler-16.0-user-and-reference-guide> diff --git a/docs.it4i/salomon/software/intel-suite/intel-debugger.md b/docs.it4i/salomon/software/intel-suite/intel-debugger.md index d0fef6ab7fbe2e50e8e7f8238585521bb5cb9695..15788c798785390777016856b8ffcc111227c1d2 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-debugger.md +++ b/docs.it4i/salomon/software/intel-suite/intel-debugger.md @@ -6,31 +6,30 @@ IDB is no longer available since Intel Parallel Studio 2015 The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use [X display](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/) for running the GUI. -```bash - $ module load intel/2014.06 - $ module load Java - $ idb +```console +$ ml intel +$ ml Java +$ idb ``` The debugger may run in text mode. To debug in text mode, use -```bash - $ idbc +```console +$ idbc ``` To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in [the GUI section](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/) Example: -```bash - $ qsub -q qexp -l select=1:ncpus=24 -X -I +```console +$ qsub -q qexp -l select=1:ncpus=24 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19654.srv11 ready - - $ module load intel - $ module load Java - $ icc -O0 -g myprog.c -o myprog.x - $ idb ./myprog.x +$ ml intel +$ ml Java +$ icc -O0 -g myprog.c -o myprog.x +$ idb ./myprog.x ``` In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger interactively on the myprog.x executable. The GUI access is via X11 port forwarding provided by the PBS workload manager. @@ -43,13 +42,12 @@ In this example, we allocate 1 full compute node, compile program myprog.c with For debugging small number of MPI ranks, you may execute and debug each rank in separate xterm terminal (do not forget the [X display](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/)). Using Intel MPI, this may be done in following way: -```bash - $ qsub -q qexp -l select=2:ncpus=24 -X -I +```console +$ qsub -q qexp -l select=2:ncpus=24 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready - - $ module load intel impi - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x +$ ml intel +$ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x ``` In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of mympiprog.x application. The xterm will pop up for each rank, with idb prompt ready. The example is not limited to use of Intel MPI @@ -58,13 +56,12 @@ In this example, we allocate 2 full compute node, run xterm on each node and sta Run the idb debugger from within the MPI debug option. This will cause the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI: -```bash - $ qsub -q qexp -l select=2:ncpus=24 -X -I +```console +$ qsub -q qexp -l select=2:ncpus=24 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready - - $ module load intel impi - $ mpirun -n 48 -idb ./mympiprog.x +$ ml intel +$ mpirun -n 48 -idb ./mympiprog.x ``` ### Debugging Multithreaded Application diff --git a/docs.it4i/salomon/software/intel-suite/intel-inspector.md b/docs.it4i/salomon/software/intel-suite/intel-inspector.md index 6231a65347abc13d442aea0586d6003ac7d3c798..bd298923813d786c7620c751a3c267983bb2a48d 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-inspector.md +++ b/docs.it4i/salomon/software/intel-suite/intel-inspector.md @@ -18,8 +18,8 @@ Debugging is possible either directly from the GUI, or from command line. To debug from GUI, launch Inspector: -```bash - $ inspxe-gui & +```console +$ inspxe-gui & ``` Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will appear, where you can configure path to your binary, launch arguments, working directory etc. After clicking OK, the project is ready. diff --git a/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md index ead2008dc115bd5b8d7d76a623e9fe22b9161d56..60628eed0744d4305f79f4b77ff2f4de8e11c10d 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md @@ -6,8 +6,8 @@ Intel Integrated Performance Primitives, version 9.0.1, compiled for AVX2 vector Check out IPP before implementing own math functions for data processing, it is likely already there. -```bash - $ module load ipp +```console +$ ml ipp ``` The module sets up environment variables, required for linking and running ipp enabled applications. @@ -57,20 +57,18 @@ The module sets up environment variables, required for linking and running ipp e Compile above example, using any compiler and the ipp module. -```bash - $ module load intel - $ module load ipp - - $ icc testipp.c -o testipp.x -lippi -lipps -lippcore +```console +$ ml intel +$ ml ipp +$ icc testipp.c -o testipp.x -lippi -lipps -lippcore ``` You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable -```bash - $ module load intel - $ module load ipp - - $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore +```console +$ ml intel +$ ml ipp +$ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore ``` ## Code Samples and Documentation diff --git a/docs.it4i/salomon/software/intel-suite/intel-mkl.md b/docs.it4i/salomon/software/intel-suite/intel-mkl.md index 322492010827e5dc2cc63d6ccd7cb3452f1a4214..6b54e0890202f817dd42c04eabf886489bd695d0 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-mkl.md +++ b/docs.it4i/salomon/software/intel-suite/intel-mkl.md @@ -17,8 +17,8 @@ For details see the [Intel MKL Reference Manual](http://software.intel.com/sites Intel MKL version 11.2.3.187 is available on the cluster -```bash - $ module load imkl +```console +$ ml imkl ``` The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $CPATH, $LD_LIBRARY_PATH and $MKL_EXAMPLES @@ -40,8 +40,8 @@ Linking Intel MKL libraries may be complex. Intel [mkl link line advisor](http:/ You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line: -```bash - $ icc .... -Wl,-rpath=$LIBRARY_PATH ... +```console +$ icc .... -Wl,-rpath=$LIBRARY_PATH ... ``` ### Threading @@ -50,9 +50,9 @@ Advantage in using Intel MKL library is that it brings threaded parallelization For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS -```bash - $ export OMP_NUM_THREADS=24 - $ export KMP_AFFINITY=granularity=fine,compact,1,0 +```console +$ export OMP_NUM_THREADS=24 +$ export KMP_AFFINITY=granularity=fine,compact,1,0 ``` The application will run with 24 threads with affinity optimized for fine grain parallelization. @@ -63,50 +63,45 @@ Number of examples, demonstrating use of the Intel MKL library and its linking i ### Working With Examples -```bash - $ module load intel - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ make sointel64 function=cblas_dgemm +```console +$ ml intel +$ ml imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ make sointel64 function=cblas_dgemm ``` In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on clusters. ### Example: MKL and Intel Compiler -```bash - $ module load intel - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - $ - $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x - $ ./cblas_dgemmx.x data/cblas_dgemmx.d +```console +$ ml intel +$ ml imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ +$ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x +$ ./cblas_dgemmx.x data/cblas_dgemmx.d ``` In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to: -```bash - $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 +```console +$ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 ``` In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation. ### Example: Intel MKL and GNU Compiler -```bash - $ module load GCC - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm - - $ ./cblas_dgemmx.x data/cblas_dgemmx.d +```console +$ ml GCC +$ ml imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm +$ ./cblas_dgemmx.x data/cblas_dgemmx.d ``` In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation. diff --git a/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md b/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md index 4b1c9308957a43fafafb8f5c1280c11ba2bf81a1..b22274a0e0a4c32942b15ba90244621eba21aa54 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md +++ b/docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md @@ -17,10 +17,10 @@ Intel Parallel Studio XE The Intel compilers version 131.3 are available, via module iccifort/2013.5.192-GCC-4.8.3. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. -```bash - $ module load intel - $ icc -v - $ ifort -v +```console +$ ml intel +$ icc -v +$ ifort -v ``` Read more at the [Intel Compilers](intel-compilers/) page. @@ -31,9 +31,9 @@ IDB is no longer available since Parallel Studio 2015. The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. -```bash - $ module load intel - $ idb +```console +$ ml intel +$ idb ``` Read more at the [Intel Debugger](intel-debugger/) page. @@ -42,8 +42,8 @@ Read more at the [Intel Debugger](intel-debugger/) page. Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. -```bash - $ module load imkl +```console +$ ml imkl ``` Read more at the [Intel MKL](intel-mkl/) page. @@ -52,8 +52,8 @@ Read more at the [Intel MKL](intel-mkl/) page. Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. -```bash - $ module load ipp +```console +$ ml ipp ``` Read more at the [Intel IPP](intel-integrated-performance-primitives/) page. @@ -62,8 +62,8 @@ Read more at the [Intel IPP](intel-integrated-performance-primitives/) page. Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. -```bash - $ module load tbb +```console +$ ml tbb ``` Read more at the [Intel TBB](intel-tbb/) page. diff --git a/docs.it4i/salomon/software/intel-suite/intel-tbb.md b/docs.it4i/salomon/software/intel-suite/intel-tbb.md index 94e32f39073b41801f20391b04cc5081f99649f7..59976aa7ef31d2e97e9799ced80578be11a2d8ab 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-tbb.md +++ b/docs.it4i/salomon/software/intel-suite/intel-tbb.md @@ -4,10 +4,10 @@ Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. The tasks are executed by a runtime scheduler and may be offloaded to [MIC accelerator](../intel-xeon-phi/). -Intel TBB version 4.3.5.187 is available on the cluster. +Intel is available on the cluster. -```bash - $ module load tbb +```console +$ ml av tbb ``` The module sets up environment variables, required for linking and running tbb enabled applications. @@ -18,21 +18,21 @@ Link the tbb library, using -ltbb Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory. -```bash - $ module load intel - $ module load tbb - $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ - $ cd /tmp/parallel_reduce/primes - $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb - $ ./primes.x +```console +$ ml intel +$ ml tbb +$ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ +$ cd /tmp/parallel_reduce/primes +$ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb +$ ./primes.x ``` In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime numbers. You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable. -```bash - $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb +```console +$ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb ``` ## Further Reading diff --git a/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md b/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md index 5d4513d306d1b9a4bf159c71231c9677cc2b8165..9cae361ca43dccb382bd5b09f5c5a9d270e0414c 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md +++ b/docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md @@ -12,9 +12,9 @@ Currently on Salomon is version 9.1.2.024 available as module itac/9.1.2.024 ITAC can collect traces from applications that are using Intel MPI. To generate a trace, simply add -trace option to your mpirun command : -```bash - $ module load itac/9.1.2.024 - $ mpirun -trace myapp +```console +$ ml itac/9.1.2.024 +$ mpirun -trace myapp ``` The trace will be saved in file myapp.stf in the current directory. @@ -23,9 +23,9 @@ The trace will be saved in file myapp.stf in the current directory. To view and analyze the trace, open ITAC GUI in a [graphical environment](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/): -```bash - $ module load itac/9.1.2.024 - $ traceanalyzer +```console +$ ml itac/9.1.2.024 +$ traceanalyzer ``` The GUI will launch and you can open the produced `*`.stf file. diff --git a/docs.it4i/salomon/software/intel-xeon-phi.md b/docs.it4i/salomon/software/intel-xeon-phi.md index 49746016bc8f96222d8c3bc125e7bf5cfea06a71..dbcc5fad47a654d760b6e4fc5476ace34337334f 100644 --- a/docs.it4i/salomon/software/intel-xeon-phi.md +++ b/docs.it4i/salomon/software/intel-xeon-phi.md @@ -2,150 +2,196 @@ ## Guide to Intel Xeon Phi Usage -Intel Xeon Phi can be programmed in several modes. The default mode on Anselm is offload mode, but all modes described in this document are supported. +Intel Xeon Phi accelerator can be programmed in several modes. The default mode on the cluster is offload mode, but all modes described in this document are supported. ## Intel Utilities for Xeon Phi To get access to a compute node with Intel Xeon Phi accelerator, use the PBS interactive session -```bash - $ qsub -I -q qmic -A NONE-0-0 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 ``` -To set up the environment module "Intel" has to be loaded +To set up the environment module "intel" has to be loaded, without specifying the version, default version is loaded (at time of writing this, it's 2015b) -```bash - $ module load intel/13.5.192 +```console +$ ml intel ``` Information about the hardware can be obtained by running the micinfo program on the host. -```bash - $ /usr/bin/micinfo -``` - -The output of the "micinfo" utility executed on one of the Anselm node is as follows. (note: to get PCIe related details the command has to be run with root privileges) - -```bash - MicInfo Utility Log - - Created Mon Jul 22 00:23:50 2013 - - System Info - HOST OS : Linux - OS Version : 2.6.32-279.5.2.bl6.Bull.33.x86_64 - Driver Version : 6720-15 - MPSS Version : 2.1.6720-15 - Host Physical Memory : 98843 MB - - Device No: 0, Device Name: mic0 - - Version - Flash Version : 2.1.03.0386 - SMC Firmware Version : 1.15.4830 - SMC Boot Loader Version : 1.8.4326 - uOS Version : 2.6.38.8-g2593b11 - Device Serial Number : ADKC30102482 - - Board - Vendor ID : 0x8086 - Device ID : 0x2250 - Subsystem ID : 0x2500 - Coprocessor Stepping ID : 3 - PCIe Width : x16 - PCIe Speed : 5 GT/s - PCIe Max payload size : 256 bytes - PCIe Max read req size : 512 bytes - Coprocessor Model : 0x01 - Coprocessor Model Ext : 0x00 - Coprocessor Type : 0x00 - Coprocessor Family : 0x0b - Coprocessor Family Ext : 0x00 - Coprocessor Stepping : B1 - Board SKU : B1PRQ-5110P/5120D - ECC Mode : Enabled - SMC HW Revision : Product 225W Passive CS - - Cores - Total No of Active Cores : 60 - Voltage : 1032000 uV - Frequency : 1052631 kHz - - Thermal - Fan Speed Control : N/A - Fan RPM : N/A - Fan PWM : N/A - Die Temp : 49 C - - GDDR - GDDR Vendor : Elpida - GDDR Version : 0x1 - GDDR Density : 2048 Mb - GDDR Size : 7936 MB - GDDR Technology : GDDR5 - GDDR Speed : 5.000000 GT/s - GDDR Frequency : 2500000 kHz - GDDR Voltage : 1501000 uV +```console +$ /usr/bin/micinfo +``` + +The output of the "micinfo" utility executed on one of the cluster node is as follows. (note: to get PCIe related details the command has to be run with root privileges) + +```console +MicInfo Utility Log +Created Mon Aug 17 13:55:59 2015 + + + System Info + HOST OS : Linux + OS Version : 2.6.32-504.16.2.el6.x86_64 + Driver Version : 3.4.1-1 + MPSS Version : 3.4.1 + Host Physical Memory : 131930 MB + +Device No: 0, Device Name: mic0 + + Version + Flash Version : 2.1.02.0390 + SMC Firmware Version : 1.16.5078 + SMC Boot Loader Version : 1.8.4326 + uOS Version : 2.6.38.8+mpss3.4.1 + Device Serial Number : ADKC44601414 + + Board + Vendor ID : 0x8086 + Device ID : 0x225c + Subsystem ID : 0x7d95 + Coprocessor Stepping ID : 2 + PCIe Width : x16 + PCIe Speed : 5 GT/s + PCIe Max payload size : 256 bytes + PCIe Max read req size : 512 bytes + Coprocessor Model : 0x01 + Coprocessor Model Ext : 0x00 + Coprocessor Type : 0x00 + Coprocessor Family : 0x0b + Coprocessor Family Ext : 0x00 + Coprocessor Stepping : C0 + Board SKU : C0PRQ-7120 P/A/X/D + ECC Mode : Enabled + SMC HW Revision : Product 300W Passive CS + + Cores + Total No of Active Cores : 61 + Voltage : 1007000 uV + Frequency : 1238095 kHz + + Thermal + Fan Speed Control : N/A + Fan RPM : N/A + Fan PWM : N/A + Die Temp : 60 C + + GDDR + GDDR Vendor : Samsung + GDDR Version : 0x6 + GDDR Density : 4096 Mb + GDDR Size : 15872 MB + GDDR Technology : GDDR5 + GDDR Speed : 5.500000 GT/s + GDDR Frequency : 2750000 kHz + GDDR Voltage : 1501000 uV + +Device No: 1, Device Name: mic1 + + Version + Flash Version : 2.1.02.0390 + SMC Firmware Version : 1.16.5078 + SMC Boot Loader Version : 1.8.4326 + uOS Version : 2.6.38.8+mpss3.4.1 + Device Serial Number : ADKC44500454 + + Board + Vendor ID : 0x8086 + Device ID : 0x225c + Subsystem ID : 0x7d95 + Coprocessor Stepping ID : 2 + PCIe Width : x16 + PCIe Speed : 5 GT/s + PCIe Max payload size : 256 bytes + PCIe Max read req size : 512 bytes + Coprocessor Model : 0x01 + Coprocessor Model Ext : 0x00 + Coprocessor Type : 0x00 + Coprocessor Family : 0x0b + Coprocessor Family Ext : 0x00 + Coprocessor Stepping : C0 + Board SKU : C0PRQ-7120 P/A/X/D + ECC Mode : Enabled + SMC HW Revision : Product 300W Passive CS + + Cores + Total No of Active Cores : 61 + Voltage : 998000 uV + Frequency : 1238095 kHz + + Thermal + Fan Speed Control : N/A + Fan RPM : N/A + Fan PWM : N/A + Die Temp : 59 C + + GDDR + GDDR Vendor : Samsung + GDDR Version : 0x6 + GDDR Density : 4096 Mb + GDDR Size : 15872 MB + GDDR Technology : GDDR5 + GDDR Speed : 5.500000 GT/s + GDDR Frequency : 2750000 kHz + GDDR Voltage : 1501000 uV ``` ## Offload Mode To compile a code for Intel Xeon Phi a MPSS stack has to be installed on the machine where compilation is executed. Currently the MPSS stack is only installed on compute nodes equipped with accelerators. -```bash - $ qsub -I -q qmic -A NONE-0-0 - $ module load intel/13.5.192 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ ml intel ``` For debugging purposes it is also recommended to set environment variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher number means more debugging information. -```bash - export OFFLOAD_REPORT=3 +```console +export OFFLOAD_REPORT=3 ``` -A very basic example of code that employs offload programming technique is shown in the next listing. +A very basic example of code that employs offload programming technique is shown in the next listing. Please note that this code is sequential and utilizes only single core of the accelerator. -!!! note - This code is sequential and utilizes only single core of the accelerator. - -```bash - $ vim source-offload.cpp +```console +$ cat source-offload.cpp - #include <iostream> +#include <iostream> - int main(int argc, char* argv[]) - { - const int niter = 100000; - double result = 0; +int main(int argc, char* argv[]) +{ + const int niter = 100000; + double result = 0; - #pragma offload target(mic) - for (int i = 0; i < niter; ++i) { - const double t = (i + 0.5) / niter; - result += 4.0 / (t * t + 1.0); - } - result /= niter; - std::cout << "Pi ~ " << result << 'n'; + #pragma offload target(mic) + for (int i = 0; i < niter; ++i) { + const double t = (i + 0.5) / niter; + result += 4.0 / (t * t + 1.0); } + result /= niter; + std::cout << "Pi ~ " << result << '\n'; +} ``` To compile a code using Intel compiler run -```bash - $ icc source-offload.cpp -o bin-offload +```console +$ icc source-offload.cpp -o bin-offload ``` To execute the code, run the following command on the host -```bash - ./bin-offload +```console +$ ./bin-offload ``` ### Parallelization in Offload Mode Using OpenMP One way of paralelization a code for Xeon Phi is using OpenMP directives. The following example shows code for parallel vector addition. -```bash - $ vim ./vect-add +```console +$ cat ./vect-add #include <stdio.h> @@ -224,10 +270,9 @@ One way of paralelization a code for Xeon Phi is using OpenMP directives. The fo During the compilation Intel compiler shows which loops have been vectorized in both host and accelerator. This can be enabled with compiler option "-vec-report2". To compile and execute the code run -```bash - $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add - - $ ./vect-add +```console +$ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add +$ ./vect-add ``` Some interesting compiler flags useful not only for code debugging are: @@ -244,18 +289,19 @@ Some interesting compiler flags useful not only for code debugging are: Intel MKL includes an Automatic Offload (AO) feature that enables computationally intensive MKL functions called in user code to benefit from attached Intel Xeon Phi coprocessors automatically and transparently. -Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of controls is listed [here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). +!!! note + Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of controls is listed [here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). The Automatic Offload may be enabled by either an MKL function call within the code: ```cpp - mkl_mic_enable(); +mkl_mic_enable(); ``` or by setting environment variable -```bash - $ export MKL_MIC_ENABLE=1 +```console +$ export MKL_MIC_ENABLE=1 ``` To get more information about automatic offload please refer to "[Using Intel® MKL Automatic Offload on Intel ® Xeon Phiâ„¢ Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)" white paper or [Intel MKL documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). @@ -264,68 +310,68 @@ To get more information about automatic offload please refer to "[Using Intel® At first get an interactive PBS session on a node with MIC accelerator and load "intel" module that automatically loads "mkl" module as well. -```bash - $ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 - $ module load intel +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ ml intel ``` -Following example show how to automatically offload an SGEMM (single precision - general matrix multiply) function to MIC coprocessor. The code can be copied to a file and compiled without any necessary modification. - -```bash - $ vim sgemm-ao-short.c +The code can be copied to a file and compiled without any necessary modification. - #include <stdio.h> - #include <stdlib.h> - #include <malloc.h> - #include <stdint.h> +```console +$ vim sgemm-ao-short.c - #include "mkl.h" +#include <stdio.h> +#include <stdlib.h> +#include <malloc.h> +#include <stdint.h> - int main(int argc, char **argv) - { - float *A, *B, *C; /* Matrices */ +#include "mkl.h" - MKL_INT N = 2560; /* Matrix dimensions */ - MKL_INT LD = N; /* Leading dimension */ - int matrix_bytes; /* Matrix size in bytes */ - int matrix_elements; /* Matrix size in elements */ +int main(int argc, char **argv) +{ + float *A, *B, *C; /* Matrices */ - float alpha = 1.0, beta = 1.0; /* Scaling factors */ - char transa = 'N', transb = 'N'; /* Transposition options */ + MKL_INT N = 2560; /* Matrix dimensions */ + MKL_INT LD = N; /* Leading dimension */ + int matrix_bytes; /* Matrix size in bytes */ + int matrix_elements; /* Matrix size in elements */ - int i, j; /* Counters */ + float alpha = 1.0, beta = 1.0; /* Scaling factors */ + char transa = 'N', transb = 'N'; /* Transposition options */ - matrix_elements = N * N; - matrix_bytes = sizeof(float) * matrix_elements; + int i, j; /* Counters */ - /* Allocate the matrices */ - A = malloc(matrix_bytes); B = malloc(matrix_bytes); C = malloc(matrix_bytes); + matrix_elements = N * N; + matrix_bytes = sizeof(float) * matrix_elements; - /* Initialize the matrices */ - for (i = 0; i < matrix_elements; i++) { - A[i] = 1.0; B[i] = 2.0; C[i] = 0.0; - } + /* Allocate the matrices */ + A = malloc(matrix_bytes); B = malloc(matrix_bytes); C = malloc(matrix_bytes); - printf("Computing SGEMM on the hostn"); - sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); + /* Initialize the matrices */ + for (i = 0; i < matrix_elements; i++) { + A[i] = 1.0; B[i] = 2.0; C[i] = 0.0; + } - printf("Enabling Automatic Offloadn"); - /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */ - mkl_mic_enable(); + printf("Computing SGEMM on the host\n"); + sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); - int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */ - printf("Automatic Offload enabled: %d MIC devices presentn", ndevices); + printf("Enabling Automatic Offload\n"); + /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */ + mkl_mic_enable(); + + int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */ + printf("Automatic Offload enabled: %d MIC devices present\n", ndevices); - printf("Computing SGEMM with automatic workdivisionn"); - sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); + printf("Computing SGEMM with automatic workdivision\n"); + sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); - /* Free the matrix memory */ - free(A); free(B); free(C); + /* Free the matrix memory */ + free(A); free(B); free(C); - printf("Donen"); + printf("Done\n"); - return 0; - } + return 0; +} ``` !!! note @@ -333,31 +379,74 @@ Following example show how to automatically offload an SGEMM (single precision - To compile a code using Intel compiler use: -```bash - $ icc -mkl sgemm-ao-short.c -o sgemm +```console +$ icc -mkl sgemm-ao-short.c -o sgemm ``` For debugging purposes enable the offload report to see more information about automatic offloading. -```bash - $ export OFFLOAD_REPORT=2 +```console +$ export OFFLOAD_REPORT=2 ``` The output of a code should look similar to following listing, where lines starting with [MKL] are generated by offload reporting: -```bash - Computing SGEMM on the host - Enabling Automatic Offload - Automatic Offload enabled: 1 MIC devices present - Computing SGEMM with automatic workdivision - [MKL] [MIC --] [AO Function] SGEMM - [MKL] [MIC --] [AO SGEMM Workdivision] 0.00 1.00 - [MKL] [MIC 00] [AO SGEMM CPU Time] 0.463351 seconds - [MKL] [MIC 00] [AO SGEMM MIC Time] 0.179608 seconds - [MKL] [MIC 00] [AO SGEMM CPU->MIC Data] 52428800 bytes - [MKL] [MIC 00] [AO SGEMM MIC->CPU Data] 26214400 bytes - Done -``` +```console +[user@r31u03n799 ~]$ ./sgemm +Computing SGEMM on the host +Enabling Automatic Offload +Automatic Offload enabled: 2 MIC devices present +Computing SGEMM with automatic workdivision +[MKL] [MIC --] [AO Function] SGEMM +[MKL] [MIC --] [AO SGEMM Workdivision] 0.44 0.28 0.28 +[MKL] [MIC 00] [AO SGEMM CPU Time] 0.252427 seconds +[MKL] [MIC 00] [AO SGEMM MIC Time] 0.091001 seconds +[MKL] [MIC 00] [AO SGEMM CPU->MIC Data] 34078720 bytes +[MKL] [MIC 00] [AO SGEMM MIC->CPU Data] 7864320 bytes +[MKL] [MIC 01] [AO SGEMM CPU Time] 0.252427 seconds +[MKL] [MIC 01] [AO SGEMM MIC Time] 0.094758 seconds +[MKL] [MIC 01] [AO SGEMM CPU->MIC Data] 34078720 bytes +[MKL] [MIC 01] [AO SGEMM MIC->CPU Data] 7864320 bytes +Done +``` + +!!! note "" + Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of controls is listed [here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). + +### Automatic offload example #2 + +In this example, we will demonstrate automatic offload control via an environment vatiable MKL_MIC_ENABLE. The function DGEMM will be offloaded. + +At first get an interactive PBS session on a node with MIC accelerator. + +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +``` + +Once in, we enable the offload and run the Octave software. In octave, we generate two large random matrices and let them multiply together. + +```console +$ export MKL_MIC_ENABLE=1 +$ export OFFLOAD_REPORT=2 +$ ml Octave/3.8.2-intel-2015b +$ octave -q +octave:1> A=rand(10000); +octave:2> B=rand(10000); +octave:3> C=A*B; +[MKL] [MIC --] [AO Function] DGEMM +[MKL] [MIC --] [AO DGEMM Workdivision] 0.14 0.43 0.43 +[MKL] [MIC 00] [AO DGEMM CPU Time] 3.814714 seconds +[MKL] [MIC 00] [AO DGEMM MIC Time] 2.781595 seconds +[MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 1145600000 bytes +[MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 1382400000 bytes +[MKL] [MIC 01] [AO DGEMM CPU Time] 3.814714 seconds +[MKL] [MIC 01] [AO DGEMM MIC Time] 2.843016 seconds +[MKL] [MIC 01] [AO DGEMM CPU->MIC Data] 1145600000 bytes +[MKL] [MIC 01] [AO DGEMM MIC->CPU Data] 1382400000 bytes +octave:4> exit +``` + +On the example above we observe, that the DGEMM function workload was split over CPU, MIC 0 and MIC 1, in the ratio 0.14 0.43 0.43. The matrix multiplication was done on the CPU, accelerated by two Xeon Phi accelerators. ## Native Mode @@ -365,10 +454,9 @@ In the native mode a program is executed directly on Intel Xeon Phi without invo To compile a code user has to be connected to a compute with MIC and load Intel compilers module. To get an interactive session on a compute node with an Intel Xeon Phi and load the module use following commands: -```bash - $ qsub -I -q qmic -A NONE-0-0 - - $ module load intel/13.5.192 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ ml intel ``` !!! note @@ -376,105 +464,108 @@ To compile a code user has to be connected to a compute with MIC and load Intel To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: -```bash - $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host +```console +$ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host ``` To run this code on host, use: -```bash - $ ./vect-add-host +```console +$ ./vect-add-host ``` The second example shows how to compile the same code for Intel Xeon Phi: -```bash - $ icc -mmic -fopenmp vect-add.c -o vect-add-mic +```console +$ icc -mmic -fopenmp vect-add.c -o vect-add-mic ``` ### Execution of the Program in Native Mode on Intel Xeon Phi The user access to the Intel Xeon Phi is through the SSH. Since user home directories are mounted using NFS on the accelerator, users do not have to copy binary files or libraries between the host and accelerator. +Get the PATH of MIC enabled libraries for currently used Intel Compiler (here was icc/2015.3.187-GNU-5.1.0-2.25 used): + +```console +$ echo $MIC_LD_LIBRARY_PATH +/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic +``` + To connect to the accelerator run: -```bash - $ ssh mic0 +```console +$ ssh mic0 ``` If the code is sequential, it can be executed directly: -```bash - mic0 $ ~/path_to_binary/vect-add-seq-mic +```console +mic0 $ ~/path_to_binary/vect-add-seq-mic ``` If the code is parallelized using OpenMP a set of additional libraries is required for execution. To locate these libraries new path has to be added to the LD_LIBRARY_PATH environment variable prior to the execution: -```bash - mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH +```console +mic0 $ export LD_LIBRARY_PATH=/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH ``` !!! note - The path exported contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. + Please note that the path exported in the previous example contains path to a specific compiler (here the version is 2015.3.187-GNU-5.1.0-2.25). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: !!! note - /apps/intel/composer_xe_2013.5.192/compiler/lib/mic + /apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic - - libiomp5.so - - libimf.so - - libsvml.so - - libirng.so - - libintlc.so.5 + libiomp5.so + libimf.so + libsvml.so + libirng.so + libintlc.so.5 Finally, to run the compiled code use: -```bash - $ ~/path_to_binary/vect-add-mic -``` - ## OpenCL OpenCL (Open Computing Language) is an open standard for general-purpose parallel programming for diverse mix of multi-core CPUs, GPU coprocessors, and other parallel processors. OpenCL provides a flexible execution model and uniform programming environment for software developers to write portable code for systems running on both the CPU and graphics processors or accelerators like the Intel® Xeon Phi. -On Anselm OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. +On Salomon OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. -```bash - module load opencl-sdk opencl-rt +```console +module load opencl-sdk opencl-rt ``` Always load "opencl-sdk" (providing devel files like headers) and "opencl-rt" (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load "opencl-rt" for running your compiled code. There are two basic examples of OpenCL code in the following directory: -```bash - /apps/intel/opencl-examples/ +```console +/apps/intel/opencl-examples/ ``` First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC, and prints basic information about the capabilities of it. -```bash - /apps/intel/opencl-examples/CapsBasic/capsbasic +```console +/apps/intel/opencl-examples/CapsBasic/capsbasic ``` -To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Anselm. +To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Salomon. -```bash - $ cp /apps/intel/opencl-examples/CapsBasic/* . - $ qsub -I -q qmic -A NONE-0-0 - $ make +```console +$ cp /apps/intel/opencl-examples/CapsBasic/* . +$ qsub -I -q qmic -A NONE-0-0 +$ make ``` The compilation command for this example is: -```bash - $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ +```console +$ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ ``` After executing the complied binary file, following output should be displayed. -```bash +```console ./capsbasic Number of available platforms: 1 @@ -505,22 +596,22 @@ After executing the complied binary file, following output should be displayed. The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. -```bash - $ cp -r /apps/intel/opencl-examples/* . - $ qsub -I -q qmic -A NONE-0-0 - $ cd GEMM - $ make +```console +$ cp -r /apps/intel/opencl-examples/* . +$ qsub -I -q qmic -A NONE-0-0 +$ cd GEMM +$ make ``` The compilation command for this example is: -```bash - $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ +```console +$ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ ``` To see the performance of Intel Xeon Phi performing the DGEMM run the example as follows: -```bash +```console ./gemm -d 1 Platforms (1): [0] Intel(R) OpenCL [Selected] @@ -547,28 +638,48 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ### Environment Setup and Compilation +To achieve best MPI performance always use following setup for Intel MPI on Xeon Phi accelerated nodes: + +```console +$ export I_MPI_FABRICS=shm:dapl +$ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 +``` + +This ensures, that MPI inside node will use SHMEM communication, between HOST and Phi the IB SCIF will be used and between different nodes or Phi's on diferent nodes a CCL-Direct proxy will be used. + +!!! note + Other FABRICS like tcp,ofa may be used (even combined with shm) but there's severe loss of performance (by order of magnitude). + Usage of single DAPL PROVIDER (e. g. I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u) will cause failure of Host<->Phi and/or Phi<->Phi communication. + Usage of the I_MPI_DAPL_PROVIDER_LIST on non-accelerated node will cause failure of any MPI communication, since those nodes don't have SCIF device and there's no CCL-Direct proxy runnig. + Again an MPI code for Intel Xeon Phi has to be compiled on a compute node with accelerator and MPSS software stack installed. To get to a compute node with accelerator use: -```bash - $ qsub -I -q qmic -A NONE-0-0 +```console +$ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 ``` The only supported implementation of MPI standard for Intel Xeon Phi is Intel MPI. To setup a fully functional development environment a combination of Intel compiler and Intel MPI has to be used. On a host load following modules before compilation: -```bash - $ module load intel/13.5.192 impi/4.1.1.036 +```console +$ module load intel ``` To compile an MPI code for host use: -```bash - $ mpiicc -xhost -o mpi-test mpi-test.c +```console +$ mpiicc -xhost -o mpi-test mpi-test.c ``` To compile the same code for Intel Xeon Phi architecture use: -```bash - $ mpiicc -mmic -o mpi-test-mic mpi-test.c +```console +$ mpiicc -mmic -o mpi-test-mic mpi-test.c +``` + +Or, if you are using Fortran : + +```console +$ mpiifort -mmic -o mpi-test-mic mpi-test.f90 ``` An example of basic MPI version of "hello-world" example in C language, that can be executed on both host and Xeon Phi is (can be directly copy and pasted to a .c file) @@ -613,17 +724,17 @@ Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: In this case all environment variables are set by modules, so to execute the compiled MPI program on a single node, use: -```bash - $ mpirun -np 4 ./mpi-test +```console +$ mpirun -np 4 ./mpi-test ``` The output should be similar to: -```bash - Hello world from process 1 of 4 on host cn207 - Hello world from process 3 of 4 on host cn207 - Hello world from process 2 of 4 on host cn207 - Hello world from process 0 of 4 on host cn207 +```console +Hello world from process 1 of 4 on host r38u31n1000 +Hello world from process 3 of 4 on host r38u31n1000 +Hello world from process 2 of 4 on host r38u31n1000 +Hello world from process 0 of 4 on host r38u31n1000 ``` ### Coprocessor-Only Model @@ -635,18 +746,27 @@ coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a host. Similarly to execution of OpenMP programs in native mode, since the environmental module are not supported on MIC, user has to setup paths to Intel MPI libraries and binaries manually. One time setup can be done by creating a "**.profile**" file in user's home directory. This file sets up the environment on the MIC automatically once user access to the accelerator through the SSH. -```bash - $ vim ~/.profile +At first get the LD_LIBRARY_PATH for currenty used Intel Compiler and Intel MPI: + +```console +$ echo $MIC_LD_LIBRARY_PATH +/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic/ +``` + +Use it in your ~/.profile: + +```console +$ cat ~/.profile - PS1='[u@h W]$ ' - export PATH=/usr/bin:/usr/sbin:/bin:/sbin +PS1='[\u@\h \W]\$ ' +export PATH=/usr/bin:/usr/sbin:/bin:/sbin - #OpenMP - export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH +#IMPI +export PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25/mic/bin/:$PATH + +#OpenMP (ICC, IFORT), IMKL and IMPI +export LD_LIBRARY_PATH=/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH - #Intel MPI - export LD_LIBRARY_PATH=/apps/intel/impi/4.1.1.036/mic/lib/:$LD_LIBRARY_PATH - export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` !!! note @@ -655,29 +775,29 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta To access a MIC accelerator located on a node that user is currently connected to, use: -```bash - $ ssh mic0 +```console +$ ssh mic0 ``` or in case you need specify a MIC accelerator on a particular node, use: -```bash - $ ssh cn207-mic0 +```console +$ ssh r38u31n1000-mic0 ``` To run the MPI code in parallel on multiple core of the accelerator, use: -```bash - $ mpirun -np 4 ./mpi-test-mic +```console +$ mpirun -np 4 ./mpi-test-mic ``` The output should be similar to: -```bash - Hello world from process 1 of 4 on host cn207-mic0 - Hello world from process 2 of 4 on host cn207-mic0 - Hello world from process 3 of 4 on host cn207-mic0 - Hello world from process 0 of 4 on host cn207-mic0 +```console +Hello world from process 1 of 4 on host r38u31n1000-mic0 +Hello world from process 2 of 4 on host r38u31n1000-mic0 +Hello world from process 3 of 4 on host r38u31n1000-mic0 +Hello world from process 0 of 4 on host r38u31n1000-mic0 ``` #### Execution on Host @@ -686,20 +806,20 @@ If the MPI program is launched from host instead of the coprocessor, the environ First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the environmental variable "I_MPI_MIC" -```bash - $ export I_MPI_MIC=1 +```console +$ export I_MPI_MIC=1 ``` Now the MPI program can be executed as: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic ``` or using mpirun -```bash - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic ``` !!! note @@ -708,11 +828,11 @@ or using mpirun The output should be again similar to: -```bash - Hello world from process 1 of 4 on host cn207-mic0 - Hello world from process 2 of 4 on host cn207-mic0 - Hello world from process 3 of 4 on host cn207-mic0 - Hello world from process 0 of 4 on host cn207-mic0 +```console +Hello world from process 1 of 4 on host r38u31n1000-mic0 +Hello world from process 2 of 4 on host r38u31n1000-mic0 +Hello world from process 3 of 4 on host r38u31n1000-mic0 +Hello world from process 0 of 4 on host r38u31n1000-mic0 ``` !!! hint @@ -720,166 +840,151 @@ The output should be again similar to: A simple test to see if the file is present is to execute: -```bash - $ ssh mic0 ls /bin/pmi_proxy - /bin/pmi_proxy +```console +$ ssh mic0 ls /bin/pmi_proxy + /bin/pmi_proxy ``` #### Execution on Host - MPI Processes Distributed Over Multiple Accelerators on Multiple Nodes To get access to multiple nodes with MIC accelerator, user has to use PBS to allocate the resources. To start interactive session, that allocates 2 compute nodes = 2 MIC accelerators run qsub command with following parameters: -```bash - $ qsub -I -q qmic -A NONE-0-0 -l select=2:ncpus=16 - - $ module load intel/13.5.192 impi/4.1.1.036 +```console +$ qsub -I -q qprod -l select=2:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 +$ module load intel impi ``` This command connects user through ssh to one of the nodes immediately. To see the other nodes that have been allocated use: -```bash - $ cat $PBS_NODEFILE +```console +$ cat $PBS_NODEFILE ``` For example: -```bash - cn204.bullx - cn205.bullx +```console +r25u25n710.ib0.smc.salomon.it4i.cz +r25u26n711.ib0.smc.salomon.it4i.cz ``` -This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. +This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**r25u25n710-mic0**" and "**r25u26n711-mic0**" accelerators. !!! note At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - - to connect to the second node : `$ ssh cn205` - - to connect to the accelerator on the first node from the first node: `$ ssh cn204-mic0` or `$ ssh mic0` - - to connect to the accelerator on the second node from the first node: `$ ssh cn205-mic0` + - to connect to the second node : `$ ssh r25u26n711` + - to connect to the accelerator on the first node from the first node: `$ ssh r25u25n710-mic0` or `$ ssh mic0` + - to connect to the accelerator on the second node from the first node: `$ ssh r25u25n711-mic0` -At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" +At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC", don't forget to have correct FABRIC and PROVIDER defined. -```bash - $ export I_MPI_MIC=1 +```console +$ export I_MPI_MIC=1 +$ export I_MPI_FABRICS=shm:dapl +$ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 ``` The launch the MPI program use: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic - : -host cn205-mic0 -n 6 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \ + -host r25u25n710-mic0 -n 4 ~/mpi-test-mic \ +: -host r25u26n711-mic0 -n 6 ~/mpi-test-mic ``` or using mpirun: -```bash - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic - : -host cn205-mic0 -n 6 ~/mpi-test-mic +```console +$ mpirun -genv LD_LIBRARY_PATH \ + -host r25u25n710-mic0 -n 4 ~/mpi-test-mic \ +: -host r25u26n711-mic0 -n 6 ~/mpi-test-mic ``` In this case four MPI processes are executed on accelerator cn204-mic and six processes are executed on accelerator cn205-mic0. The sample output (sorted after execution) is: -```bash - Hello world from process 0 of 10 on host cn204-mic0 - Hello world from process 1 of 10 on host cn204-mic0 - Hello world from process 2 of 10 on host cn204-mic0 - Hello world from process 3 of 10 on host cn204-mic0 - Hello world from process 4 of 10 on host cn205-mic0 - Hello world from process 5 of 10 on host cn205-mic0 - Hello world from process 6 of 10 on host cn205-mic0 - Hello world from process 7 of 10 on host cn205-mic0 - Hello world from process 8 of 10 on host cn205-mic0 - Hello world from process 9 of 10 on host cn205-mic0 +```console +Hello world from process 0 of 10 on host r25u25n710-mic0 +Hello world from process 1 of 10 on host r25u25n710-mic0 +Hello world from process 2 of 10 on host r25u25n710-mic0 +Hello world from process 3 of 10 on host r25u25n710-mic0 +Hello world from process 4 of 10 on host r25u26n711-mic0 +Hello world from process 5 of 10 on host r25u26n711-mic0 +Hello world from process 6 of 10 on host r25u26n711-mic0 +Hello world from process 7 of 10 on host r25u26n711-mic0 +Hello world from process 8 of 10 on host r25u26n711-mic0 +Hello world from process 9 of 10 on host r25u26n711-mic0 ``` The same way MPI program can be executed on multiple hosts: -```bash - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204 -n 4 ~/mpi-test - : -host cn205 -n 6 ~/mpi-test +```console +$ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \ + -host r25u25n710 -n 4 ~/mpi-test \ +: -host r25u26n711 -n 6 ~/mpi-test ``` -\###Symmetric model +### Symmetric model In a symmetric mode MPI programs are executed on both host computer(s) and MIC accelerator(s). Since MIC has a different architecture and requires different binary file produced by the Intel compiler two different files has to be compiled before MPI program is executed. In the previous section we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using mpiexec.hydra: -```bash - $ mpiexec.hydra - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -host cn205 -n 2 ~/mpi-test - : -host cn205-mic0 -n 2 ~/mpi-test-mic +```console +$ mpirun \ + -genv $MIC_LD_LIBRARY_PATH \ + -host r38u32n1001 -n 2 ~/mpi-test \ +: -host r38u32n1001-mic0 -n 2 ~/mpi-test-mic ``` -In this example the first two parameters (line 2 and 3) sets up required environment variables for execution. The third line specifies binary that is executed on host (here cn205) and the last line specifies the binary that is execute on the accelerator (here cn205-mic0). +In this example the first two parameters (line 2 and 3) sets up required environment variables for execution. The third line specifies binary that is executed on host (here r38u32n1001) and the last line specifies the binary that is execute on the accelerator (here r38u32n1001-mic0). The output of the program is: -```bash - Hello world from process 0 of 4 on host cn205 - Hello world from process 1 of 4 on host cn205 - Hello world from process 2 of 4 on host cn205-mic0 - Hello world from process 3 of 4 on host cn205-mic0 +```console +Hello world from process 0 of 4 on host r38u32n1001 +Hello world from process 1 of 4 on host r38u32n1001 +Hello world from process 2 of 4 on host r38u32n1001-mic0 +Hello world from process 3 of 4 on host r38u32n1001-mic0 ``` The execution procedure can be simplified by using the mpirun command with the machine file a a parameter. Machine file contains list of all nodes and accelerators that should used to execute MPI processes. -An example of a machine file that uses 2 >hosts (**cn205** and **cn206**) and 2 accelerators **(cn205-mic0** and **cn206-mic0**) to run 2 MPI processes on each of them: +An example of a machine file that uses 2 >hosts (**r38u32n1001** and **r38u32n1002**) and 2 accelerators **(r38u32n1001-mic0** and **r38u32n1002-mic0**) to run 2 MPI processes on each of them: -```bash - $ cat hosts_file_mix - cn205:2 - cn205-mic0:2 - cn206:2 - cn206-mic0:2 +```console +$ cat hosts_file_mix +r38u32n1001:2 +r38u32n1001-mic0:2 +r38u33n1002:2 +r38u33n1002-mic0:2 ``` In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not have to specify the names of booth binaries. In this case mpirun needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). -```bash - $ export I_MPI_MIC_POSTFIX=-mic +```console +$ export I_MPI_MIC_POSTFIX=-mic ``` To run the MPI code using mpirun and the machine file "hosts_file_mix" use: -```bash - $ mpirun - -genv I_MPI_FABRICS shm:tcp - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp - -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -machinefile hosts_file_mix - ~/mpi-test +```console +$ mpirun \ + -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH \ + -machinefile hosts_file_mix \ + ~/mpi-test ``` A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: -```bash - Hello world from process 0 of 8 on host cn204 - Hello world from process 1 of 8 on host cn204 - Hello world from process 2 of 8 on host cn204-mic0 - Hello world from process 3 of 8 on host cn204-mic0 - Hello world from process 4 of 8 on host cn205 - Hello world from process 5 of 8 on host cn205 - Hello world from process 6 of 8 on host cn205-mic0 - Hello world from process 7 of 8 on host cn205-mic0 +```console +Hello world from process 0 of 8 on host r38u31n1000 +Hello world from process 1 of 8 on host r38u31n1000 +Hello world from process 2 of 8 on host r38u31n1000-mic0 +Hello world from process 3 of 8 on host r38u31n1000-mic0 +Hello world from process 4 of 8 on host r38u32n1001 +Hello world from process 5 of 8 on host r38u32n1001 +Hello world from process 6 of 8 on host r38u32n1001-mic0 +Hello world from process 7 of 8 on host r38u32n1001-mic0 ``` !!! note diff --git a/docs.it4i/salomon/software/java.md b/docs.it4i/salomon/software/java.md index 703e53fc1093cf28aeb5c80b985174784e54ad90..83c3738c0802e612ba84c25868771c44fa51a1ab 100644 --- a/docs.it4i/salomon/software/java.md +++ b/docs.it4i/salomon/software/java.md @@ -2,24 +2,24 @@ Java is available on the cluster. Activate java by loading the Java module -```bash - $ module load Java +```console +$ ml Java ``` Note that the Java module must be loaded on the compute nodes as well, in order to run java on compute nodes. Check for java version and path -```bash - $ java -version - $ which java +```console +$ java -version +$ which java ``` With the module loaded, not only the runtime environment (JRE), but also the development environment (JDK) with the compiler is available. -```bash - $ javac -version - $ which javac +```console +$ javac -version +$ which javac ``` Java applications may use MPI for inter-process communication, in conjunction with Open MPI. Read more on <http://www.open-mpi.org/faq/?category=java>. This functionality is currently not supported on Anselm cluster. In case you require the java interface to MPI, please contact [cluster support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md index 9aa54f09aa07ccde2daa1bfc5c6ff4daeab2b78b..e2633236ac6624c7a41ed56496bacb9795158901 100644 --- a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md @@ -10,16 +10,14 @@ Use the mpiexec to run the OpenMPI code. Example: -```bash - $ qsub -q qexp -l select=4:ncpus=24 -I +```console +$ qsub -q qexp -l select=4:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready - - $ pwd +$ pwd /home/username - - $ module load OpenMPI - $ mpiexec -pernode ./helloworld_mpi.x +$ ml OpenMPI +$ mpiexec -pernode ./helloworld_mpi.x Hello world! from rank 0 of 4 on host r1i0n17 Hello world! from rank 1 of 4 on host r1i0n5 Hello world! from rank 2 of 4 on host r1i0n6 @@ -33,11 +31,10 @@ Note that the executable helloworld_mpi.x must be available within the same path You need to preload the executable, if running on the local ramdisk /tmp filesystem -```bash - $ pwd +```console +$ pwd /tmp/pbs.15210.isrv5 - - $ mpiexec -pernode --preload-binary ./helloworld_mpi.x +$ mpiexec -pernode --preload-binary ./helloworld_mpi.x Hello world! from rank 0 of 4 on host r1i0n17 Hello world! from rank 1 of 4 on host r1i0n5 Hello world! from rank 2 of 4 on host r1i0n6 @@ -54,12 +51,10 @@ The mpiprocs and ompthreads parameters allow for selection of number of running Follow this example to run one MPI process per node, 24 threads per process. -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=1:ompthreads=24 -I - - $ module load OpenMPI - - $ mpiexec --bind-to-none ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=1:ompthreads=24 -I +$ ml OpenMPI +$ mpiexec --bind-to-none ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 24 threads per socket, on 4 nodes. @@ -68,12 +63,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpiexec. -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=2:ompthreads=12 -I - - $ module load OpenMPI - - $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=2:ompthreads=12 -I +$ ml OpenMPI +$ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 12 threads per socket, each process and its threads bound to a separate processor socket of the node, on 4 nodes @@ -82,12 +75,10 @@ In this example, we demonstrate recommended way to run an MPI application, using Follow this example to run 24 MPI processes per node, 1 thread per process. Note the options to mpiexec. -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I - - $ module load OpenMPI - - $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I +$ ml OpenMPI +$ mpiexec -bycore -bind-to-core ./helloworld_mpi.x ``` In this example, we demonstrate recommended way to run an MPI application, using 24 MPI processes per node, single threaded. Each process is bound to separate processor core, on 4 nodes. @@ -99,21 +90,21 @@ In this example, we demonstrate recommended way to run an MPI application, using In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: -```bash - $ export GOMP_CPU_AFFINITY="0-23" +```console +$ export GOMP_CPU_AFFINITY="0-23" ``` or this one for Intel OpenMP: -```bash - $ export KMP_AFFINITY=granularity=fine,compact,1,0 +```console +$ export KMP_AFFINITY=granularity=fine,compact,1,0 ``` As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: -```bash - $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores +```console +$ export OMP_PROC_BIND=true +$ export OMP_PLACES=cores ``` ## OpenMPI Process Mapping and Binding @@ -126,7 +117,7 @@ MPI process mapping may be specified by a hostfile or rankfile input to the mpie Example hostfile -```bash +```console r1i0n17.smc.salomon.it4i.cz r1i0n5.smc.salomon.it4i.cz r1i0n6.smc.salomon.it4i.cz @@ -135,8 +126,8 @@ Example hostfile Use the hostfile to control process placement -```bash - $ mpiexec -hostfile hostfile ./helloworld_mpi.x +```console +$ mpiexec -hostfile hostfile ./helloworld_mpi.x Hello world! from rank 0 of 4 on host r1i0n17 Hello world! from rank 1 of 4 on host r1i0n5 Hello world! from rank 2 of 4 on host r1i0n6 @@ -153,7 +144,7 @@ Appropriate binding may boost performance of your application. Example rankfile -```bash +```console rank 0=r1i0n7.smc.salomon.it4i.cz slot=1:0,1 rank 1=r1i0n6.smc.salomon.it4i.cz slot=0:* rank 2=r1i0n5.smc.salomon.it4i.cz slot=1:1-2 @@ -170,7 +161,7 @@ rank 2 will be bounded to r1i0n5, socket1, core1 and core2 rank 3 will be bounded to r1i0n17, socket0 core1, socket1 core0, core1, core2 rank 4 will be bounded to r1i0n6, all cores on both sockets -```bash +```console $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x [r1i0n17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . . . . . .][B B B . . . . . . . . .] (slot list 0:1,1:0-2) [r1i0n7:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . . . . . .][B B . . . . . . . . . .] (slot list 1:0,1) @@ -192,10 +183,10 @@ It is users responsibility to provide correct number of ranks, sockets and cores In all cases, binding and threading may be verified by executing for example: -```bash - $ mpiexec -bysocket -bind-to-socket --report-bindings echo - $ mpiexec -bysocket -bind-to-socket numactl --show - $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS +```console +$ mpiexec -bysocket -bind-to-socket --report-bindings echo +$ mpiexec -bysocket -bind-to-socket numactl --show +$ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS ``` ## Changes in OpenMPI 1.8 diff --git a/docs.it4i/salomon/software/mpi/mpi.md b/docs.it4i/salomon/software/mpi/mpi.md index 411d54ddabae7b32ef32f894f2cc466e93eeb866..99f8745aca779ad71a3ab5322499aa9e8bc9fd25 100644 --- a/docs.it4i/salomon/software/mpi/mpi.md +++ b/docs.it4i/salomon/software/mpi/mpi.md @@ -15,8 +15,8 @@ MPI libraries are activated via the environment modules. Look up section modulefiles/mpi in module avail -```bash - $ module avail +```console +$ ml av ------------------------------ /apps/modules/mpi ------------------------------- impi/4.1.1.036-iccifort-2013.5.192 impi/4.1.1.036-iccifort-2013.5.192-GCC-4.8.3 @@ -35,16 +35,16 @@ There are default compilers associated with any particular MPI implementation. T Examples: -```bash - $ module load gompi/2015b +```console +$ ml gompi/2015b ``` In this example, we activate the latest OpenMPI with latest GNU compilers (OpenMPI 1.8.6 and GCC 5.1). Please see more information about toolchains in section [Environment and Modules](../../environment-and-modules/) . To use OpenMPI with the intel compiler suite, use -```bash - $ module load iompi/2015.03 +```console +$ ml iompi/2015.03 ``` In this example, the openmpi 1.8.6 using intel compilers is activated. It's used "iompi" toolchain. @@ -53,17 +53,17 @@ In this example, the openmpi 1.8.6 using intel compilers is activated. It's used After setting up your MPI environment, compile your program using one of the mpi wrappers -```bash - $ mpicc -v - $ mpif77 -v - $ mpif90 -v +```console +$ mpicc -v +$ mpif77 -v +$ mpif90 -v ``` When using Intel MPI, use the following MPI wrappers: -```bash - $ mpicc - $ mpiifort +```console +$ mpicc +$ mpiifort ``` Wrappers mpif90, mpif77 that are provided by Intel MPI are designed for gcc and gfortran. You might be able to compile MPI code by them even with Intel compilers, but you might run into problems (for example, native MIC compilation with -mmic does not work with mpif90). @@ -100,8 +100,8 @@ Example program: Compile the above example with -```bash - $ mpicc helloworld_mpi.c -o helloworld_mpi.x +```console +$ mpicc helloworld_mpi.c -o helloworld_mpi.x ``` ## Running MPI Programs diff --git a/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md b/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md index 160478b6ed3c4dbfaf7226759fab0fd8fb9ddc67..8b2a12823aee3f9ce87e8b1be3c26a4dea8d5e4e 100644 --- a/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md +++ b/docs.it4i/salomon/software/mpi/mpi4py-mpi-for-python.md @@ -14,28 +14,28 @@ On Anselm MPI4Py is available in standard Python modules. MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to load Python and OpenMPI modules. You can use toolchain, that loads Python and OpenMPI at once. -```bash - $ module load Python/2.7.9-foss-2015g +```console +$ ml Python/2.7.9-foss-2015g ``` ## Execution You need to import MPI to your python program. Include the following line to the python script: -```bash +```console from mpi4py import MPI ``` The MPI4Py enabled python programs [execute as any other OpenMPI](Running_OpenMPI/) code.The simpliest way is to run -```bash - $ mpiexec python <script>.py +```console +$ mpiexec python <script>.py ``` For example -```bash - $ mpiexec python hello_world.py +```console +$ mpiexec python hello_world.py ``` ## Examples @@ -83,12 +83,10 @@ For example Execute the above code as: -```bash - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I - - $ module load Python/2.7.9-foss-2015g - - $ mpiexec --map-by core --bind-to core python hello_world.py +```console +$ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I +$ ml Python/2.7.9-foss-2015g + $ mpiexec --map-by core --bind-to core python hello_world.py ``` In this example, we run MPI4Py enabled code on 4 nodes, 24 cores per node (total of 96 processes), each python process is bound to a different core. More examples and documentation can be found on [MPI for Python webpage](https://pypi.python.org/pypi/mpi4py). diff --git a/docs.it4i/salomon/software/numerical-languages/introduction.md b/docs.it4i/salomon/software/numerical-languages/introduction.md index 50f083a91c52acc731fcbd0abe849904df757221..13ba67071a136612568b6772104f0c8c5430ba40 100644 --- a/docs.it4i/salomon/software/numerical-languages/introduction.md +++ b/docs.it4i/salomon/software/numerical-languages/introduction.md @@ -10,9 +10,9 @@ This section contains a collection of high-level interpreted languages, primaril MATLAB®^ is a high-level language and interactive environment for numerical computation, visualization, and programming. -```bash - $ module load MATLAB - $ matlab +```console +$ ml MATLAB +$ matlab ``` Read more at the [Matlab page](matlab/). @@ -21,9 +21,9 @@ Read more at the [Matlab page](matlab/). GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to Matlab so that most programs are easily portable. -```bash - $ module load Octave - $ octave +```console +$ ml Octave +$ octave ``` Read more at the [Octave page](octave/). @@ -32,9 +32,9 @@ Read more at the [Octave page](octave/). The R is an interpreted language and environment for statistical computing and graphics. -```bash - $ module load R - $ R +```console +$ ml R +$ R ``` Read more at the [R page](r/). diff --git a/docs.it4i/salomon/software/numerical-languages/matlab.md b/docs.it4i/salomon/software/numerical-languages/matlab.md index aec28baaedbec6491cfe8ba14a7442368dbdec17..e08bf9099ee9d5175a8579afe2fc9d6d32b1aa8f 100644 --- a/docs.it4i/salomon/software/numerical-languages/matlab.md +++ b/docs.it4i/salomon/software/numerical-languages/matlab.md @@ -9,14 +9,14 @@ Matlab is available in versions R2015a and R2015b. There are always two variants To load the latest version of Matlab load the module -```bash - $ module load MATLAB +```console +$ ml MATLAB ``` By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use -```bash - $ module avail MATLAB +```console +$ module avail MATLAB ``` If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. @@ -27,14 +27,14 @@ Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so u To run Matlab with GUI, use -```bash - $ matlab +```console +$ matlab ``` To run Matlab in text mode, without the Matlab Desktop GUI environment, use -```bash - $ matlab -nodesktop -nosplash +```console +$ matlab -nodesktop -nosplash ``` plots, images, etc... will be still available. @@ -49,7 +49,7 @@ Delete previously used file mpiLibConf.m, we have observed crashes when using In To use Distributed Computing, you first need to setup a parallel profile. We have provided the profile for you, you can either import it in MATLAB command line: -```bash +```console > parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings') ans = @@ -67,10 +67,9 @@ With the new mode, MATLAB itself launches the workers via PBS, so you can either Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see [this page](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/). -```bash - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 +```console +$ xhost + +$ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 -l feature__matlab__MATLAB=1 ``` This qsub command example shows how to run Matlab on a single node. @@ -79,8 +78,8 @@ The second part of the command shows how to request all necessary licenses. In t Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: -```bash - r1i0n17$ module load MATLAB/2015a-EDU +```console + r1i0n17$ ml MATLAB/2015a-EDU r1i0n17$ matlab & ``` @@ -115,15 +114,15 @@ This script may be submitted directly to the PBS workload manager via the qsub c Submit the jobscript using qsub -```bash - $ qsub ./jobscript +```console +$ qsub ./jobscript ``` ### Parallel Matlab Local Mode Program Example The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. -```bash +```console cluster = parcluster('local') ``` @@ -134,7 +133,7 @@ This script creates scheduler object "cluster" of type "local" that starts worke The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. -```bash +```console parpool(cluster,24); @@ -146,7 +145,7 @@ The last step is to start matlabpool with "cluster" object and correct number of The complete example showing how to use Distributed Computing Toolbox in local mode is shown here. -```bash +```console cluster = parcluster('local'); cluster @@ -179,7 +178,7 @@ This mode uses PBS scheduler to launch the parallel pool. It uses the SalomonPBS This is an example of m-script using PBS mode: -```bash +```console cluster = parcluster('SalomonPBSPro'); set(cluster, 'SubmitArguments', '-A OPEN-0-0'); set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=24'); @@ -220,7 +219,7 @@ For this method, you need to use SalomonDirect profile, import it using [the sam This is an example of m-script using direct mode: -```bash +```console parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonDirect.settings') cluster = parcluster('SalomonDirect'); set(cluster, 'NumWorkers', 48); diff --git a/docs.it4i/salomon/software/numerical-languages/octave.md b/docs.it4i/salomon/software/numerical-languages/octave.md index 6461bc4cc003b806d0f75320d58d5c9009ab5b8b..5c679dd1b87e587965d802f2845997b755254fa2 100644 --- a/docs.it4i/salomon/software/numerical-languages/octave.md +++ b/docs.it4i/salomon/software/numerical-languages/octave.md @@ -8,16 +8,16 @@ Two versions of octave are available on the cluster, via module | ---------- | ------------ | ------ | | **Stable** | Octave 3.8.2 | Octave | -```bash - $ module load Octave +```console +$ ml Octave ``` The octave on the cluster is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, octave would parallelize to 24 threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. To run octave interactively, log in with ssh -X parameter for X11 forwarding. Run octave: -```bash - $ octave +```console +$ octave ``` To run octave in batch mode, write an octave script, then write a bash jobscript and execute via the qsub command. By default, octave will use 16 threads when running MKL kernels. @@ -49,8 +49,8 @@ This script may be submitted directly to the PBS workload manager via the qsub c The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. -```bash - $ mkoctfile -v +```console +$ mkoctfile -v ``` Octave may use MPI for interprocess communication This functionality is currently not supported on the cluster cluster. In case you require the octave interface to MPI, please contact our [cluster support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/salomon/software/numerical-languages/r.md b/docs.it4i/salomon/software/numerical-languages/r.md index 6a01926e1b69bdd97d695d19b7a056419408acde..a3511b3795a499c27c7ba62529e4141776870409 100644 --- a/docs.it4i/salomon/software/numerical-languages/r.md +++ b/docs.it4i/salomon/software/numerical-languages/r.md @@ -21,8 +21,8 @@ The R version 3.1.1 is available on the cluster, along with GUI interface Rstudi | **R** | R 3.1.1 | R/3.1.1-intel-2015b | | **Rstudio** | Rstudio 0.98.1103 | Rstudio | -```bash - $ module load R +```console +$ ml R ``` ## Execution @@ -33,9 +33,9 @@ The R on Anselm is linked to highly optimized MKL mathematical library. This pro To run R interactively, using Rstudio GUI, log in with ssh -X parameter for X11 forwarding. Run rstudio: -```bash - $ module load Rstudio - $ rstudio +```console +$ ml Rstudio +$ rstudio ``` ### Batch Execution @@ -45,25 +45,25 @@ To run R in batch mode, write an R script, then write a bash jobscript and execu Example jobscript: ```bash - #!/bin/bash +#!/bin/bash - # change to local scratch directory - cd /lscratch/$PBS_JOBID || exit +# change to local scratch directory +cd /lscratch/$PBS_JOBID || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/rscript.R . +# copy input file to scratch +cp $PBS_O_WORKDIR/rscript.R . - # load R module - module load R +# load R module +module load R - # execute the calculation - R CMD BATCH rscript.R routput.out +# execute the calculation +R CMD BATCH rscript.R routput.out - # copy output file to home - cp routput.out $PBS_O_WORKDIR/. +# copy output file to home +cp routput.out $PBS_O_WORKDIR/. - #exit - exit +#exit +exit ``` This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in rscript.R file, outputs in routput.out file. See the single node jobscript example in the [Job execution section](../../job-submission-and-execution/). @@ -78,17 +78,17 @@ The package parallel provides support for parallel computation, including by for The package is activated this way: -```bash - $ R - > library(parallel) +```console +$ R +> library(parallel) ``` More information and examples may be obtained directly by reading the documentation available in R -```bash - > ?parallel - > library(help = "parallel") - > vignette("parallel") +```console +> ?parallel +> library(help = "parallel") +> vignette("parallel") ``` Download the package [parallell](package-parallel-vignette.pdf) vignette. @@ -103,41 +103,41 @@ The forking is the most simple to use. Forking family of functions provide paral Forking example: ```cpp - library(parallel) +library(parallel) - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) - } +#integrand function +f <- function(i,h) { +x <- h*(i-0.5) +return (4/(1 + x*x)) +} - #initialize - size <- detectCores() +#initialize +size <- detectCores() - while (TRUE) - { - #read number of intervals - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) +while (TRUE) +{ + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - if(n<=0) break + if(n<=0) break - #run the calculation - n <- max(n,size) - h <- 1.0/n + #run the calculation + n <- max(n,size) + h <- 1.0/n - i <- seq(1,n); - pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size))); + i <- seq(1,n); + pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size))); - #print results - cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) +} ``` The above example is the classic parallel example for calculating the number Ï€. Note the **detectCores()** and **mclapply()** functions. Execute the example as: -```bash - $ R --slave --no-save --no-restore -f pi3p.R +```console +$ R --slave --no-save --no-restore -f pi3p.R ``` Every evaluation of the integrad function runs in parallel on different process. @@ -152,9 +152,9 @@ Read more on Rmpi at <http://cran.r-project.org/web/packages/Rmpi/>, reference m When using package Rmpi, both openmpi and R modules must be loaded -```bash - $ module load OpenMPI - $ module load R +```console +$ ml OpenMPI +$ ml R ``` Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI programm. In addition, there is Rslaves dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number Ï€ integration example, to illustrate all these concepts. @@ -211,8 +211,8 @@ Static Rmpi example: The above is the static MPI example for calculating the number Ï€. Note the **library(Rmpi)** and **mpi.comm.dup()** function calls. Execute the example as: -```bash - $ mpirun R --slave --no-save --no-restore -f pi3.R +```console +$ mpirun R --slave --no-save --no-restore -f pi3.R ``` ### Dynamic Rmpi @@ -283,8 +283,8 @@ The above example is the dynamic MPI example for calculating the number Ï€. Both Execute the example as: -```bash - $ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R +```console +$ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R ``` Note that this method uses MPI_Comm_spawn (Dynamic process feature of MPI-2) to start the slave processes - the master process needs to be launched with MPI. In general, Dynamic processes are not well supported among MPI implementations, some issues might arise. Also, environment variables are not propagated to spawned processes, so they will not see paths from modules. @@ -300,59 +300,59 @@ Execution is identical to other dynamic Rmpi programs. mpi.apply Rmpi example: ```cpp - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) - } - - #the worker function - workerpi <- function(rank,size,n) - { - #run the calculation - n <- max(n,size) - h <- 1.0/n - - i <- seq(rank,n,size); - mypi <- h*sum(sapply(i,f,h)); - - return(mypi) - } - - #main - library(Rmpi) - - cat("Enter the number of slaves: ") - fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) - - mpi.spawn.Rslaves(nslaves=ns) - mpi.bcast.Robj2slave(f) - mpi.bcast.Robj2slave(workerpi) - - while (TRUE) - { - #read number of intervals - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - if(n<=0) break - - #run workerpi - i=seq(1,2*ns) - pi3=sum(mpi.parSapply(i,workerpi,2*ns,n)) - - #print results - cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } - - mpi.quit() +#integrand function +f <- function(i,h) { +x <- h*(i-0.5) +return (4/(1 + x*x)) +} + +#the worker function +workerpi <- function(rank,size,n) +{ + #run the calculation + n <- max(n,size) + h <- 1.0/n + + i <- seq(rank,n,size); + mypi <- h*sum(sapply(i,f,h)); + + return(mypi) +} + +#main +library(Rmpi) + +cat("Enter the number of slaves: ") +fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) + +mpi.spawn.Rslaves(nslaves=ns) +mpi.bcast.Robj2slave(f) +mpi.bcast.Robj2slave(workerpi) + +while (TRUE) +{ + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + if(n<=0) break + + #run workerpi + i=seq(1,2*ns) + pi3=sum(mpi.parSapply(i,workerpi,2*ns,n)) + + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) +} + +mpi.quit() ``` The above is the mpi.apply MPI example for calculating the number Ï€. Only the slave processes carry out the calculation. Note the **mpi.parSapply()**, function call. The package parallel [example](r/#package-parallel) [above](r/#package-parallel) may be trivially adapted (for much better performance) to this structure using the mclapply() in place of mpi.parSapply(). Execute the example as: -```bash - $ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R +```console +$ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R ``` ## Combining Parallel and Rmpi @@ -366,30 +366,30 @@ The R parallel jobs are executed via the PBS queue system exactly as any other p Example jobscript for [static Rmpi](r/#static-rmpi) parallel R execution, running 1 process per core: ```bash - #!/bin/bash - #PBS -q qprod - #PBS -N Rjob - #PBS -l select=100:ncpus=24:mpiprocs=24:ompthreads=1 +#!/bin/bash +#PBS -q qprod +#PBS -N Rjob +#PBS -l select=100:ncpus=24:mpiprocs=24:ompthreads=1 - # change to scratch directory - SCRDIR=/scratch/work/user/$USER/myjob - cd $SCRDIR || exit +# change to scratch directory +SCRDIR=/scratch/work/user/$USER/myjob +cd $SCRDIR || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/rscript.R . +# copy input file to scratch +cp $PBS_O_WORKDIR/rscript.R . - # load R and openmpi module - module load R - module load OpenMPI +# load R and openmpi module +module load R +module load OpenMPI - # execute the calculation - mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R +# execute the calculation +mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R - # copy output file to home - cp routput.out $PBS_O_WORKDIR/. +# copy output file to home +cp routput.out $PBS_O_WORKDIR/. - #exit - exit +#exit +exit ``` For more information about jobscripts and MPI execution refer to the [Job submission](../../job-submission-and-execution/) and general [MPI](../mpi/mpi/) sections. @@ -398,8 +398,8 @@ For more information about jobscripts and MPI execution refer to the [Job submis By leveraging MKL, R can accelerate certain computations, most notably linear algebra operations on the Xeon Phi accelerator by using Automated Offload. To use MKL Automated Offload, you need to first set this environment variable before R execution: -```bash - $ export MKL_MIC_ENABLE=1 +```console +$ export MKL_MIC_ENABLE=1 ``` [Read more about automatic offload](../intel-xeon-phi/) diff --git a/docs.it4i/salomon/storage.md b/docs.it4i/salomon/storage.md index d83dbc119e5a9803b947a8d508a36aba0f265870..b0e401cde014a3decb5fc4c7199796735d923cf8 100644 --- a/docs.it4i/salomon/storage.md +++ b/docs.it4i/salomon/storage.md @@ -65,14 +65,14 @@ There is default stripe configuration for Salomon Lustre file systems. However, Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. -```bash +```console $ lfs getstripe dir | filename $ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir | filename ``` Example: -```bash +```console $ lfs getstripe /scratch/work/user/username /scratch/work/user/username stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 @@ -87,7 +87,7 @@ In this example, we view current stripe setting of the /scratch/username/ direct Use lfs check OSTs to see the number and status of active OSTs for each file system on Salomon. Learn more by reading the man page -```bash +```console $ lfs check osts $ man lfs ``` @@ -112,13 +112,13 @@ Read more on <http://wiki.lustre.org/manual/LustreManual20_HTML/ManagingStriping User quotas on the Lustre file systems (SCRATCH) can be checked and reviewed using following command: -```bash +```console $ lfs quota dir ``` Example for Lustre SCRATCH directory: -```bash +```console $ lfs quota /scratch Disk quotas for user user001 (uid 1234): Filesystem kbytes quota limit grace files quota limit grace @@ -132,14 +132,14 @@ In this example, we view current quota size limit of 100TB and 8KB currently use HOME directory is mounted via NFS, so a different command must be used to obtain quota information: -```bash - $ quota +```console +$ quota ``` Example output: -```bash - $ quota +```console +$ quota Disk quotas for user vop999 (uid 1025): Filesystem blocks quota limit grace files quota limit grace home-nfs-ib.salomon.it4i.cz:/home @@ -148,13 +148,13 @@ Example output: To have a better understanding of where the space is exactly used, you can use following command to find out. -```bash +```console $ du -hs dir ``` Example for your HOME directory: -```bash +```console $ cd /home $ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr 258M cuda-samples @@ -168,11 +168,11 @@ This will list all directories which are having MegaBytes or GigaBytes of consum To have a better understanding of previous commands, you can read manpages. -```bash +```console $ man lfs ``` -```bash +```console $ man du ``` @@ -182,7 +182,7 @@ Extended ACLs provide another security mechanism beside the standard POSIX ACLs ACLs on a Lustre file system work exactly like ACLs on any Linux file system. They are manipulated with the standard tools in the standard manner. Below, we create a directory and allow a specific user access. -```bash +```console [vop999@login1.salomon ~]$ umask 027 [vop999@login1.salomon ~]$ mkdir test [vop999@login1.salomon ~]$ ls -ld test @@ -356,40 +356,40 @@ The SSHFS provides a very convenient way to access the CESNET Storage. The stora First, create the mount point -```bash - $ mkdir cesnet +```console +$ mkdir cesnet ``` Mount the storage. Note that you can choose among the ssh.du1.cesnet.cz (Plzen), ssh.du2.cesnet.cz (Jihlava), ssh.du3.cesnet.cz (Brno) Mount tier1_home **(only 5120M !)**: -```bash - $ sshfs username@ssh.du1.cesnet.cz:. cesnet/ +```console +$ sshfs username@ssh.du1.cesnet.cz:. cesnet/ ``` For easy future access from Anselm, install your public key -```bash - $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys +```console +$ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys ``` Mount tier1_cache_tape for the Storage VO: -```bash - $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ +```console +$ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ ``` View the archive, copy the files and directories in and out -```bash - $ ls cesnet/ - $ cp -a mydir cesnet/. - $ cp cesnet/myfile . +```console +$ ls cesnet/ +$ cp -a mydir cesnet/. +$ cp cesnet/myfile . ``` Once done, please remember to unmount the storage -```bash - $ fusermount -u cesnet +```console +$ fusermount -u cesnet ``` ### Rsync Access @@ -405,16 +405,16 @@ More about Rsync at [here](https://du.cesnet.cz/en/navody/rsync/start#pro_bezne_ Transfer large files to/from CESNET storage, assuming membership in the Storage VO -```bash - $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . +```console +$ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. +$ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . ``` Transfer large directories to/from CESNET storage, assuming membership in the Storage VO -```bash - $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . +```console +$ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. +$ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . ``` Transfer rates of about 28 MB/s can be expected. diff --git a/docs.it4i/software/bioinformatics.md b/docs.it4i/software/bioinformatics.md index 76991fe7810ea45fdf7a77ed1cd03adf20a79152..91de9ca9cce57d66c005ee919a0444a852660fac 100644 --- a/docs.it4i/software/bioinformatics.md +++ b/docs.it4i/software/bioinformatics.md @@ -6,7 +6,7 @@ In addition to the many applications available through modules (deployed through ## Starting the Environment -```bash +```console mmokrejs@login2~$ /apps/gentoo/startprefix ``` @@ -14,7 +14,7 @@ mmokrejs@login2~$ /apps/gentoo/startprefix Create a template file which can be used and an argument to qsub command. Notably, the 'PBS -S' line specifies full PATH to the Bourne shell of the Gentoo Linux environment. -```bash +```console mmokrejs@login2~$ cat myjob.pbs #PBS -S /apps/gentoo/bin/sh #PBS -l nodes=1:ppn=16,walltime=12:00:00 @@ -37,14 +37,14 @@ $ qstat ## Reading Manual Pages for Installed Applications -```bash +```console mmokrejs@login2~$ man -M /apps/gentoo/usr/share/man bwa mmokrejs@login2~$ man -M /apps/gentoo/usr/share/man samtools ``` ## Listing of Bioinformatics Applications -```bash +```console mmokrejs@login2~$ grep biology /scratch/mmokrejs/gentoo_rap/installed.txt sci-biology/ANGLE-bin-20080813-r1 sci-biology/AlignGraph-9999 @@ -172,7 +172,7 @@ sci-biology/velvetk-20120606 sci-biology/zmsort-110625 ``` -```bash +```console mmokrejs@login2~$ grep sci-libs /scratch/mmokrejs/gentoo_rap/installed.txt sci-libs/amd-2.3.1 sci-libs/blas-reference-20151113-r1 @@ -228,7 +228,7 @@ sci-libs/umfpack-5.6.2 Gentoo Linux is a allows compilation of its applications from source code while using compiler and optimize flags set to user's wish. This facilitates creation of optimized binaries for the host platform. Users maybe also use several versions of gcc, python and other tools. -```bash +```console mmokrejs@login2~$ gcc-config -l mmokrejs@login2~$ java-config -L mmokrejs@login2~$ eselect diff --git a/docs.it4i/software/lmod.md b/docs.it4i/software/lmod.md index 3ddd5cc1d1951de11047ea7cdfca91198d11aa19..8bb5e5498b01af8ec656cb5022f66111391a6197 100644 --- a/docs.it4i/software/lmod.md +++ b/docs.it4i/software/lmod.md @@ -18,7 +18,7 @@ Detailed documentation on Lmod is available at [here](http://lmod.readthedocs.io Create folder or file `.lmod` into your home folder. Logout and login. New Lmod enviroment will be active now. -```bash +```console $ mkdir ~/.lmod $ logout Connection to login4.salomon.it4i.cz closed. @@ -65,7 +65,7 @@ Below you will find more details and examples. To get an overview of the currently loaded modules, use module list or ml (without specifying extra arguments). -```bash +```console $ ml Currently Loaded Modules: 1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2 @@ -80,7 +80,7 @@ Currently Loaded Modules: To get an overview of all available modules, you can use ml avail or simply ml av: -```bash +```console $ ml av ---------------------------------------- /apps/modules/compiler ---------------------------------------------- GCC/5.2.0 GCCcore/6.2.0 (D) icc/2013.5.192 ifort/2013.5.192 LLVM/3.9.0-intel-2017.00 (D) @@ -104,7 +104,7 @@ In the current module naming scheme, each module name consists of two parts: If you just provide a software name, for example gcc, it prints on overview of all available modules for GCC. -```bash +```console $ ml spider gcc --------------------------------------------------------------------------------- GCC: @@ -147,7 +147,7 @@ $ ml spider gcc If you use spider on a full module name like GCC/6.2.0-2.27 it will tell on which cluster(s) that module available: -```bash +```console $ module spider GCC/6.2.0-2.27 -------------------------------------------------------------------------------------------------------------- GCC: GCC/6.2.0-2.27 @@ -169,7 +169,7 @@ This tells you what the module contains and a URL to the homepage of the softwar To check which modules are available for a particular software package, you can provide the software name to ml av. For example, to check which versions of git are available: -```bash +```console $ ml av git -------------------------------------- /apps/modules/tools ---------------------------------------- @@ -187,7 +187,7 @@ Use "module keyword key1 key2 ..." to search for all possible modules matching a Lmod does a partial match on the module name, so sometimes you need to use / to indicate the end of the software name you are interested in: -```bash +```console $ ml av GCC/ ------------------------------------------ /apps/modules/compiler ------------------------------------------- @@ -204,7 +204,7 @@ Use "module keyword key1 key2 ..." to search for all possible modules matching a To see how a module would change the environment, use ml show: -```bash +```console $ ml show Python/3.5.2 help([[Python is a programming language that lets you work more quickly and integrate your systems more effectively. - Homepage: http://python.org/]]) @@ -240,7 +240,7 @@ If you're not sure what all of this means: don't worry, you don't have to know, The effectively apply the changes to the environment that are specified by a module, use ml and specify the name of the module. For example, to set up your environment to use intel: -```bash +```console $ ml intel/2017.00 $ ml Currently Loaded Modules: @@ -275,7 +275,7 @@ In addition, only **one single version** of each software package can be loaded To revert the changes to the environment that were made by a particular module, you can use ml -<modname>. For example: -```bash +```console $ ml Currently Loaded Modules: 1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2 @@ -299,7 +299,7 @@ $ which gcc To reset your environment back to a clean state, you can use ml purge or ml purge --force: -```bash +```console $ ml Currently Loaded Modules: 1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2 3) GCCcore/6.2.0 4) binutils/2.27-GCCcore-6.2.0 (H) @@ -323,25 +323,25 @@ If you have a set of modules that you need to load often, you can save these in First, load all the modules you need, for example: -```bash -ml intel/2017.00 Python/3.5.2-intel-2017.00 +```console +$ ml intel/2017.00 Python/3.5.2-intel-2017.00 ``` Now store them in a collection using ml save: -```bash +```console $ ml save my-collection ``` Later, for example in a job script, you can reload all these modules with ml restore: -```bash +```console $ ml restore my-collection ``` With ml savelist can you gets a list of all saved collections: -```bash +```console $ ml savelist Named collection list: 1) my-collection diff --git a/docs.it4i/software/orca.md b/docs.it4i/software/orca.md index 8fcfd69bfb44f9f978b18d8b8ac4e82a71653f36..3f62415459eceea55e4268d3bd2ca301748e0ce2 100644 --- a/docs.it4i/software/orca.md +++ b/docs.it4i/software/orca.md @@ -6,13 +6,13 @@ ORCA is a flexible, efficient and easy-to-use general purpose tool for quantum c The following module command makes the latest version of orca available to your session -```bash +```console $ module load ORCA/3_0_3-linux_x86-64 ``` ### Dependency -```bash +```console $ module list Currently Loaded Modulefiles: 1) /opt/modules/modulefiles/oscar-modules/1.0.3(default) @@ -46,7 +46,7 @@ Create a file called orca_serial.inp that contains the following orca commands Create a Sun Grid Engine submission file called submit_serial.sh that looks like this -```bash +```console !/bin/bash module load ORCA/3_0_3-linux_x86-64 @@ -55,7 +55,7 @@ orca orca_serial.inp Submit the job to the queue with the command -```bash +```console $ qsub -q qexp -I -l select=1 qsub: waiting for job 196821.isrv5 to start qsub: job 196821.isrv5 ready diff --git a/docs.it4i/software/singularity.md b/docs.it4i/software/singularity.md new file mode 100644 index 0000000000000000000000000000000000000000..39618e32c735f1ef1dd02447014015518f51e342 --- /dev/null +++ b/docs.it4i/software/singularity.md @@ -0,0 +1,128 @@ +[Singularity](http://singularity.lbl.gov/) enables users to have full control of their environment. A non-privileged user can "swap out" the operating system on the host for one they control. So if the host system is running RHEL6 but your application runs in Ubuntu/RHEL7, you can create an Ubuntu/RHEL7 image, install your applications into that image, copy the image to another host, and run your application on that host in it’s native Ubuntu/RHEL7 environment. + +Singularity also allows you to leverage the resources of whatever host you are on. This includes HPC interconnects, resource managers, file systems, GPUs and/or accelerators, etc. Singularity does this by enabling several key facets: + +* Encapsulation of the environment +* Containers are image based +* No user contextual changes or root escalation allowed +* No root owned daemon processes + +## Using Docker Images + +Singularity can import, bootstrap, and even run Docker images directly from [Docker Hub](https://hub.docker.com/). You can easily run RHEL7 like this: + +```console +[hrb33@r33u01n865 ~]$ cat /etc/redhat-release +CentOS release 6.7 (Final) +[hrb33@r33u01n865 ~]$ ml Singularity +[hrb33@r33u01n865 ~]$ singularity shell docker://centos:latest +library/centos:latest +Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 +Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 +Downloading layer: sha256:45a2e645736c4c66ef34acce2407ded21f7a9b231199d3b92d6c9776df264729 +Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 +Singularity: Invoking an interactive shell within container... + +Singularity.centos:latest> cat /etc/redhat-release +CentOS Linux release 7.3.1611 (Core) +``` + +## Creating Own Image from Docker Image + +```console +hrb33@hrb33-toshiba:/$ cd /tmp/ +hrb33@hrb33-toshiba:/tmp$ sudo singularity create /tmp/c7.img +[sudo] password for hrb33: +Creating a new image with a maximum size of 768MiB... +Executing image create helper +Formatting image with ext3 file system +Done. +hrb33@hrb33-toshiba:/tmp$ sudo singularity import c7.img docker://centos:latest +library/centos:latest +Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 +Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 +Downloading layer: sha256:45a2e645736c4c66ef34acce2407ded21f7a9b231199d3b92d6c9776df264729 +Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 +Adding Docker CMD as Singularity runscript... +Bootstrap initialization +No bootstrap definition passed, updating container +Executing Prebootstrap module +Executing Postbootstrap module +Done. +hrb33@hrb33-toshiba:/tmp$ sudo singularity shell --writable c7.img +Singularity: Invoking an interactive shell within container... + +Singularity.c7.img> mkdir /apps /scratch +Singularity.c7.img> exit +hrb33@hrb33-toshiba:/tmp$ rsync -av c7.img hrb33@login4.salomon:/home/hrb33/c7.img +sending incremental file list +c7.img + +sent 805,503,090 bytes received 34 bytes 9,205,749.99 bytes/sec +total size is 805,306,399 speedup is 1.00 + +``` + +Accessing /HOME and /SCRATCH Within Container + +```console +hrb33@hrb33-toshiba:/tmp$ ssh hrb33@login4.salomon + + _____ _ + / ____| | | + | (___ __ _| | ___ _ __ ___ ___ _ __ + \___ \ / _` | |/ _ \| '_ ` _ \ / _ \| '_ \ + ____) | (_| | | (_) | | | | | | (_) | | | | + |_____/ \__,_|_|\___/|_| |_| |_|\___/|_| |_| + + http://www.it4i.cz/?lang=en + + +Last login: Fri Feb 10 14:38:36 2017 from 10.0.131.12 +[hrb33@login4.salomon ~]$ ml Singularity +[hrb33@login4.salomon ~]$ singularity shell --bind /scratch --bind /apps --writable c7.img +Singularity: Invoking an interactive shell within container... + +Singularity.c7.img> ls /apps/ -l +total 68 +drwx------ 4 root root 29 Sep 29 10:28 SCS +drwxrwxr-x 301 2757 2796 8192 Feb 16 10:58 all +drwxrwxr-x 3 2757 2796 19 Jul 9 2015 base +drwxrwxr-x 16 2757 2796 4096 Nov 24 21:47 bio +drwxrwxr-x 10 2757 2796 116 Apr 8 2016 cae +drwxrwxr-x 18 2757 2796 4096 Jan 17 09:49 chem +drwxrwxr-x 11 2757 2796 122 Dec 7 09:25 compiler +drwxrwxr-x 7 2757 2796 73 Jun 29 2016 data +drwxr-xr-x 7 2757 2796 88 Jan 8 2016 debugger +drwxrwxr-x 38 2757 2796 4096 Feb 16 13:37 devel +drwxrwxr-x 9 2757 2796 130 Jan 9 08:40 easybuild +drwxr-xr-x 11 3900 4011 4096 Feb 15 09:50 gentoo +drwxr-xr-x 10 3900 4011 4096 Feb 10 17:01 gentoo_uv +drwxrwxr-x 5 2757 2796 39 Jan 18 2016 geo +drwxr-xr-x 18 2757 2796 4096 Sep 6 16:03 intel2017 +drwxrwxr-x 20 2757 2796 4096 Nov 28 08:50 lang +drwxrwxr-x 31 2757 2796 4096 Dec 7 07:48 lib +drwxrwxr-x 4 2757 2796 32 Nov 9 09:19 licenses +drwxrwxr-x 17 2757 2796 4096 Nov 15 09:24 math +drwxr-xr-x 22 2757 2796 4096 Jan 19 13:15 modules +drwxrwxr-x 8 2757 2796 82 Apr 18 2016 mpi +drwxrwxr-x 13 2757 2796 4096 Oct 24 09:08 numlib +drwxrwxr-x 10 2757 2796 108 Feb 3 11:01 perf +drwxrwxr-x 5 2757 2796 41 Jan 17 09:49 phys +drwxrwxr-x 2 2757 2796 6 Feb 3 11:01 prace +drwxr-xr-x 4 root root 36 Jun 18 2015 sw +drwxrwxr-x 5 2757 2796 49 Feb 15 2016 system +drwxr-xr-x 3 root root 19 Dec 4 2015 test +drwxrwxr-x 13 2757 2796 138 May 31 2016 toolchain +drwxrwxr-x 39 2757 2796 4096 Feb 3 11:27 tools +drwxr-xr-x 4 root root 31 Aug 11 2015 user +drwxrwxr-x 21 2757 2796 4096 Jan 5 18:56 uv +drwxrwxr-x 40 2757 2796 4096 Feb 3 11:01 vis +Singularity.c7.img> ls /scratch/ -l +total 32 +drwx------ 3 root root 4096 Aug 15 2016 backup +drwxr-x--- 2 root root 4096 Dec 5 10:34 sys +drwxrwxrwt 154 root root 20480 Feb 14 14:03 temp +drwxr-xr-x 4 root root 4096 Jan 25 10:48 work +Singularity.c7.img> +``` diff --git a/mkdocs.yml b/mkdocs.yml index 0d147055f0d5ff9ede7e7b8684393927d02a7514..6bdf66c26619f7af572e14d1d543129e1fd1da32 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,7 +41,7 @@ pages: - Compute Nodes: salomon/compute-nodes.md - Network: - InfiniBand Network: salomon/network.md - - IB Single-plane Topology: salomon/ib-single-plane-topology.md + - IB Single-Plane Topology: salomon/ib-single-plane-topology.md - 7D Enhanced Hypercube: salomon/7d-enhanced-hypercube.md - Storage: salomon/storage.md - PRACE User Support: salomon/prace.md @@ -63,6 +63,7 @@ pages: - 'Software': - Lmod Environment: software/lmod.md - Modules Matrix: modules-matrix.md + - Singularity Container: software/singularity.md - Salomon Software: - Available Modules: modules-salomon.md - Available Modules on UV: modules-salomon-uv.md