diff --git a/docs.it4i/anselm-cluster-documentation/capacity-computing.md b/docs.it4i/anselm-cluster-documentation/capacity-computing.md index cab8b9c19c39ed2b1828768b484769dfc30a0255..59d58906020f9d73c15c259916c421a5e47b048b 100644 --- a/docs.it4i/anselm-cluster-documentation/capacity-computing.md +++ b/docs.it4i/anselm-cluster-documentation/capacity-computing.md @@ -6,12 +6,12 @@ In many cases, it is useful to submit huge (>100+) number of computational jobs However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array** -!!! Note "Note" - Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. +!!! note + Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. -- Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs -- Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs -- Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs +* Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs +* Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs +* Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs ## Policy @@ -20,14 +20,14 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays -!!! Note "Note" - Huge number of jobs may be easily submitted and managed as a job array. +!!! note + Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: -- each subjob has a unique index, $PBS_ARRAY_INDEX -- job Identifiers of subjobs only differ by their indices -- the state of subjobs can differ (R,Q,...etc.) +* each subjob has a unique index, $PBS_ARRAY_INDEX +* job Identifiers of subjobs only differ by their indices +* the state of subjobs can differ (R,Q,...etc.) All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job. @@ -149,8 +149,8 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel -!!! Note "Note" - Use GNU parallel to run many single core tasks on one node. +!!! note + Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. @@ -216,17 +216,18 @@ $ qsub -N JOBNAME jobscript In this example, we submit a job of 101 tasks. 16 input files will be processed in parallel. The 101 tasks on 16 cores are assumed to complete in less than 2 hours. -Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. +!!! hint + Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Job Arrays and GNU Parallel -!!! Note "Note" - Combine the Job arrays and GNU parallel for best throughput of single core jobs +!!! note + Combine the Job arrays and GNU parallel for best throughput of single core jobs While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. -!!! Note "Note" - Every subjob in an array runs GNU parallel to utilize all cores on the node +!!! note + Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU Parallel, Shared jobscript @@ -280,8 +281,8 @@ cp output $PBS_O_WORKDIR/$TASK.out In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. -!!! Note "Note" - Select subjob walltime and number of tasks per subjob carefully +!!! note + Select subjob walltime and number of tasks per subjob carefully When deciding this values, think about following guiding rules: @@ -300,7 +301,8 @@ $ qsub -N JOBNAME -J 1-992:32 jobscript In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**, this must be the same as the number sent to numtasks file. Each subjob will run on full node and process 16 input files in parallel, 32 in total per subjob. Every subjob is assumed to complete in less than 2 hours. -Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. +!!! hint + Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Examples diff --git a/docs.it4i/anselm-cluster-documentation/compute-nodes.md b/docs.it4i/anselm-cluster-documentation/compute-nodes.md index 6cd3c18f75ce9886bf315073d0ec6adeba51768c..440201ed024b87d101ea4a28ffd9ad393df83e37 100644 --- a/docs.it4i/anselm-cluster-documentation/compute-nodes.md +++ b/docs.it4i/anselm-cluster-documentation/compute-nodes.md @@ -6,46 +6,46 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bu ### Compute Nodes Without Accelerator -- 180 nodes -- 2880 cores in total -- two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node -- 64 GB of physical memory per node -- one 500GB SATA 2,5” 7,2 krpm HDD per node -- bullx B510 blade servers -- cn[1-180] +* 180 nodes +* 2880 cores in total +* two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node +* 64 GB of physical memory per node +* one 500GB SATA 2,5” 7,2 krpm HDD per node +* bullx B510 blade servers +* cn[1-180] ### Compute Nodes With GPU Accelerator -- 23 nodes -- 368 cores in total -- two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node -- 96 GB of physical memory per node -- one 500GB SATA 2,5” 7,2 krpm HDD per node -- GPU accelerator 1x NVIDIA Tesla Kepler K20 per node -- bullx B515 blade servers -- cn[181-203] +* 23 nodes +* 368 cores in total +* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node +* 96 GB of physical memory per node +* one 500GB SATA 2,5” 7,2 krpm HDD per node +* GPU accelerator 1x NVIDIA Tesla Kepler K20 per node +* bullx B515 blade servers +* cn[181-203] ### Compute Nodes With MIC Accelerator -- 4 nodes -- 64 cores in total -- two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node -- 96 GB of physical memory per node -- one 500GB SATA 2,5” 7,2 krpm HDD per node -- MIC accelerator 1x Intel Phi 5110P per node -- bullx B515 blade servers -- cn[204-207] +* 4 nodes +* 64 cores in total +* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node +* 96 GB of physical memory per node +* one 500GB SATA 2,5” 7,2 krpm HDD per node +* MIC accelerator 1x Intel Phi 5110P per node +* bullx B515 blade servers +* cn[204-207] ### Fat Compute Nodes -- 2 nodes -- 32 cores in total -- 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node -- 512 GB of physical memory per node -- two 300GB SAS 3,5”15krpm HDD (RAID1) per node -- two 100GB SLC SSD per node -- bullx R423-E3 servers -- cn[208-209] +* 2 nodes +* 32 cores in total +* 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node +* 512 GB of physical memory per node +* two 300GB SAS 3,5”15krpm HDD (RAID1) per node +* two 100GB SLC SSD per node +* bullx R423-E3 servers +* cn[208-209]  **Figure Anselm bullx B510 servers** @@ -65,23 +65,23 @@ Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes ### Intel Sandy Bridge E5-2665 Processor -- eight-core -- speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology -- peak performance: 19.2 GFLOP/s per core -- caches: - - L2: 256 KB per core - - L3: 20 MB per processor -- memory bandwidth at the level of the processor: 51.2 GB/s +* eight-core +* speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology +* peak performance: 19.2 GFLOP/s per core +* caches: + * L2: 256 KB per core + * L3: 20 MB per processor +* memory bandwidth at the level of the processor: 51.2 GB/s ### Intel Sandy Bridge E5-2470 Processor -- eight-core -- speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology -- peak performance: 18.4 GFLOP/s per core -- caches: - - L2: 256 KB per core - - L3: 20 MB per processor -- memory bandwidth at the level of the processor: 38.4 GB/s +* eight-core +* speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology +* peak performance: 18.4 GFLOP/s per core +* caches: + * L2: 256 KB per core + * L3: 20 MB per processor +* memory bandwidth at the level of the processor: 38.4 GB/s Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23. @@ -101,30 +101,30 @@ Intel Turbo Boost Technology is used by default, you can disable it for all nod ### Compute Node Without Accelerator -- 2 sockets -- Memory Controllers are integrated into processors. - - 8 DDR3 DIMMs per node - - 4 DDR3 DIMMs per CPU - - 1 DDR3 DIMMs per channel - - Data rate support: up to 1600MT/s -- Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz +* 2 sockets +* Memory Controllers are integrated into processors. + * 8 DDR3 DIMMs per node + * 4 DDR3 DIMMs per CPU + * 1 DDR3 DIMMs per channel + * Data rate support: up to 1600MT/s +* Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz ### Compute Node With GPU or MIC Accelerator -- 2 sockets -- Memory Controllers are integrated into processors. - - 6 DDR3 DIMMs per node - - 3 DDR3 DIMMs per CPU - - 1 DDR3 DIMMs per channel - - Data rate support: up to 1600MT/s -- Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz +* 2 sockets +* Memory Controllers are integrated into processors. + * 6 DDR3 DIMMs per node + * 3 DDR3 DIMMs per CPU + * 1 DDR3 DIMMs per channel + * Data rate support: up to 1600MT/s +* Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz ### Fat Compute Node -- 2 sockets -- Memory Controllers are integrated into processors. - - 16 DDR3 DIMMs per node - - 8 DDR3 DIMMs per CPU - - 2 DDR3 DIMMs per channel - - Data rate support: up to 1600MT/s -- Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz +* 2 sockets +* Memory Controllers are integrated into processors. + * 16 DDR3 DIMMs per node + * 8 DDR3 DIMMs per CPU + * 2 DDR3 DIMMs per channel + * Data rate support: up to 1600MT/s +* Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz diff --git a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md index 7d3283538a7cf40865734fb98548682f09fd8876..1439c6733c35f3440df643da8e83e1c6308726c7 100644 --- a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md +++ b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md @@ -23,15 +23,15 @@ then fi ``` -!!! Note "Note" - Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example. +!!! note + Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example. ### Application Modules In order to configure your shell for running particular application on Anselm we use Module package interface. -!!! Note "Note" - The modules set up the application paths, library paths and environment variables for running particular application. +!!! note + The modules set up the application paths, library paths and environment variables for running particular application. We have also second modules repository. This modules repository is created using tool called EasyBuild. On Salomon cluster, all modules will be build by this tool. If you want to use software from this modules repository, please follow instructions in section [Application Modules Path Expansion](environment-and-modules/#EasyBuild). diff --git a/docs.it4i/anselm-cluster-documentation/hardware-overview.md b/docs.it4i/anselm-cluster-documentation/hardware-overview.md index 84e05272bd3765b79b12fb66abf793992b148819..b477688da52a57619221a9bcc22397e0e7769191 100644 --- a/docs.it4i/anselm-cluster-documentation/hardware-overview.md +++ b/docs.it4i/anselm-cluster-documentation/hardware-overview.md @@ -12,10 +12,10 @@ The cluster compute nodes cn[1-207] are organized within 13 chassis. There are four types of compute nodes: -- 180 compute nodes without the accelerator -- 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla Kepler K20 -- 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi 5110P -- 2 fat nodes - equipped with 512 GB RAM and two 100 GB SSD drives +* 180 compute nodes without the accelerator +* 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla Kepler K20 +* 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi 5110P +* 2 fat nodes - equipped with 512 GB RAM and two 100 GB SSD drives [More about Compute nodes](compute-nodes/). diff --git a/docs.it4i/anselm-cluster-documentation/job-priority.md b/docs.it4i/anselm-cluster-documentation/job-priority.md index fbd57f0f79a196b0f979378ea932b43bf99ea2a4..2eebe9d54719878015d18604abc33d7be16db422 100644 --- a/docs.it4i/anselm-cluster-documentation/job-priority.md +++ b/docs.it4i/anselm-cluster-documentation/job-priority.md @@ -35,8 +35,8 @@ usage<sub>Total</sub> is total usage by all users, by all projects. Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut in half periodically, at the interval 168 hours (one week). Jobs queued in queue qexp are not calculated to project's usage. -!!! Note "Note" - Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>. +!!! note + Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -64,7 +64,7 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. -!!! Note "Note" - It is **very beneficial to specify the walltime** when submitting jobs. +!!! note + It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md index 8ab52e9bc20154637d5ac62c9d0169949279baa5..2f76f9280b0751802d69d11587aee54f41f611b7 100644 --- a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md +++ b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md @@ -11,7 +11,7 @@ When allocating computational resources for the job, please specify 5. Project ID 6. Jobscript or interactive switch -!!! Note "Note" +!!! note Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: @@ -132,7 +132,7 @@ Although this example is somewhat artificial, it demonstrates the flexibility of ## Job Management -!!! Note "Note" +!!! note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands ```bash @@ -213,7 +213,7 @@ Run loop 3 In this example, we see actual output (some iteration loops) of the job 35141.dm2 -!!! Note "Note" +!!! note Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel**, **qsig** or **qalter** commands You may release your allocation at any time, using qdel command @@ -238,12 +238,12 @@ $ man pbs_professional ### Jobscript -!!! Note "Note" +!!! note Prepare the jobscript to run batch jobs in the PBS queue system The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager. -!!! Note "Note" +!!! note The jobscript or interactive shell is executed on first of the allocated nodes. ```bash @@ -273,7 +273,7 @@ $ pwd In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory. -!!! Note "Note" +!!! note All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well. @@ -305,7 +305,7 @@ In this example, the hostname program is executed via pdsh from the interactive ### Example Jobscript for MPI Calculation -!!! Note "Note" +!!! note Production jobs must use the /scratch directory for I/O The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. @@ -337,12 +337,12 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. -!!! Note "Note" +!!! note Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -!!! Note "Note" +!!! note Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. Example jobscript for an MPI job with preloaded inputs and executables, options for qsub are stored within the script : @@ -375,7 +375,7 @@ sections. ### Example Jobscript for Single Node Calculation -!!! Note "Note" +!!! note Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Example jobscript for single node calculation, using [local scratch](storage/) on the node: diff --git a/docs.it4i/anselm-cluster-documentation/network.md b/docs.it4i/anselm-cluster-documentation/network.md index 307931a5d350f697be4679358a7e1c74e57a7cb5..a682f44ff119881d0f0e1ad1695afdd8046b5ec0 100644 --- a/docs.it4i/anselm-cluster-documentation/network.md +++ b/docs.it4i/anselm-cluster-documentation/network.md @@ -8,8 +8,8 @@ All compute and login nodes of Anselm are interconnected by a high-bandwidth, lo The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes. -!!! Note "Note" - The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol. +!!! note + The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol. The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently. diff --git a/docs.it4i/anselm-cluster-documentation/prace.md b/docs.it4i/anselm-cluster-documentation/prace.md index 4a7417fde25a63baac0aaaf17adc40d0d650fa9a..1754d8e28202c6c553e9607846f4ab664a600bbd 100644 --- a/docs.it4i/anselm-cluster-documentation/prace.md +++ b/docs.it4i/anselm-cluster-documentation/prace.md @@ -28,11 +28,11 @@ The user will need a valid certificate and to be present in the PRACE LDAP (plea Most of the information needed by PRACE users accessing the Anselm TIER-1 system can be found here: -- [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs) -- [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ) -- [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) -- [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) -- [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) +* [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs) +* [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ) +* [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) +* [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) +* [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) Before you start to use any of the services don't forget to create a proxy certificate from your certificate: @@ -233,9 +233,12 @@ The resources that are currently subject to accounting are the core hours. The c PRACE users should check their project accounting using the [PRACE Accounting Tool (DART)](http://www.prace-ri.eu/accounting-report-tool/). -Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". Please note that you need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". +Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". -!!! Note "Note" +!!! note + You need to know your user password to use the command. Displayed core hours are "system core hours" which differ from PRACE "standardized core hours". + +!!! hint The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> ```bash diff --git a/docs.it4i/anselm-cluster-documentation/remote-visualization.md b/docs.it4i/anselm-cluster-documentation/remote-visualization.md index 929f6930a42b253a763e43af1cfe2da9add18614..fc01d18790668c5a4eee13dbc0c8090f8cd3d89c 100644 --- a/docs.it4i/anselm-cluster-documentation/remote-visualization.md +++ b/docs.it4i/anselm-cluster-documentation/remote-visualization.md @@ -41,7 +41,7 @@ Please [follow the documentation](shell-and-data-access/). To have the OpenGL acceleration, **24 bit color depth must be used**. Otherwise only the geometry (desktop size) definition is needed. -!!! Hint +!!! hint At first VNC server run you need to define a password. This example defines desktop with dimensions 1200x700 pixels and 24 bit color depth. @@ -138,7 +138,7 @@ qviz**. The queue has following properties: Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 16 GB of RAM and 1/4 of the GPU capacity. -!!! Note +!!! note If more GPU power or RAM is required, it is recommended to allocate one whole node per user, so that all 16 cores, whole RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. To access the visualization node, follow these steps: @@ -192,7 +192,7 @@ $ module load virtualgl/2.4 $ vglrun glxgears ``` -Please note, that if you want to run an OpenGL application which is vailable through modules, you need at first load the respective module. . g. to run the **Mentat** OpenGL application from **MARC** software ackage use: +If you want to run an OpenGL application which is vailable through modules, you need at first load the respective module. E.g. to run the **Mentat** OpenGL application from **MARC** software ackage use: ```bash $ module load marc/2013.1 diff --git a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md index 27340a1f8d0d1b36c4feba9233be4df667e6a08e..37a8f71f127a81669472f9a5b7fa5df910193fde 100644 --- a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md +++ b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md @@ -6,21 +6,21 @@ To run a [job](../introduction/), [computational resources](../introduction/) fo The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](job-priority/) at Anselm ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are available to Anselm users: -- **qexp**, the Express queue -- **qprod**, the Production queue -- **qlong**, the Long queue, regula -- **qnvidia**, **qmic**, **qfat**, the Dedicated queues -- **qfree**, the Free resource utilization queue +* **qexp**, the Express queue +* **qprod**, the Production queue +* **qlong**, the Long queue, regula +* **qnvidia**, **qmic**, **qfat**, the Dedicated queues +* **qfree**, the Free resource utilization queue -!!! Note "Note" - Check the queue status at <https://extranet.it4i.cz/anselm/> +!!! note + Check the queue status at <https://extranet.it4i.cz/anselm/> Read more on the [Resource AllocationPolicy](resources-allocation-policy/) page. ## Job Submission and Execution -!!! Note "Note" - Use the **qsub** command to submit your jobs. +!!! note + Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 16 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** @@ -28,8 +28,8 @@ Read more on the [Job submission and execution](job-submission-and-execution/) p ## Capacity Computing -!!! Note "Note" - Use Job arrays when running huge number of jobs. +!!! note + Use Job arrays when running huge number of jobs. Use GNU Parallel and/or Job arrays when running (many) single core jobs. diff --git a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md index 9ed9bb7cbabfec0328b3da68e67cd27daba12477..ba4dde0614f2159082eaf6983a867af9b66d4ab1 100644 --- a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md +++ b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md @@ -4,7 +4,7 @@ The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The Fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: -!!! Note "Note" +!!! note Check the queue status at <https://extranet.it4i.cz/anselm/> | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | @@ -15,16 +15,16 @@ The resources are allocated to the job in a fair-share fashion, subject to const | qnvidia, qmic, qfat | yes | 0 | 23 total qnvidia4 total qmic2 total qfat | 16 | 200 | yes | 24/48 h | | qfree | yes | none required | 178 w/o accelerator | 16 | -1024 | no | 12 h | -!!! Note "Note" +!!! note **The qfree queue is not free of charge**. [Normal accounting](#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. **The qexp queue is equipped with the nodes not having the very same CPU clock speed.** Should you need the very same CPU speed, you have to select the proper nodes during the PSB job submission. -- **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. -- **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. -- **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 x 48 h). -- **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project. -- **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. +* **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. +* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. +* **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 x 48 h). +* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project. +* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. ### Notes @@ -113,7 +113,7 @@ The resources that are currently subject to accounting are the core-hours. The c ### Check Consumed Resources -!!! Note "Note" +!!! note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. diff --git a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md index 2f3aad9bb4abc2cbab7ee681bfc030437c72e02b..38fbda64678ccf7c7a406163ce8f27be753ac67a 100644 --- a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md +++ b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md @@ -53,7 +53,7 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com Example to the cluster login: -!!! Note "Note" +!!! note The environment is **not** shared between login nodes, except for [shared filesystems](storage/#shared-filesystems). ## Data Transfer @@ -69,14 +69,14 @@ Data in and out of the system may be transferred by the [scp](http://en.wikipedi The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) -!!! Note "Note" +!!! note Data transfer rates up to **160MB/s** can be achieved with scp or sftp. 1TB may be transferred in 1:50h. To achieve 160MB/s transfer rates, the end user must be connected by 10G line all the way to IT4Innovations and use computer with fast processor for the transfer. Using Gigabit ethernet connection, up to 110MB/s may be expected. Fast cipher (aes128-ctr) should be used. -!!! Note "Note" +!!! note If you experience degraded data transfer performance, consult your local network provider. On linux or Mac, use scp or sftp client to transfer the data to Anselm: @@ -126,7 +126,7 @@ Outgoing connections, from Anselm Cluster login nodes to the outside world, are | 443 | https | | 9418 | git | -!!! Note "Note" +!!! note Please use **ssh port forwarding** and proxy servers to connect from Anselm to all other remote ports. Outgoing connections, from Anselm Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. @@ -135,7 +135,7 @@ Outgoing connections, from Anselm Cluster compute nodes are restricted to the in ### Port Forwarding From Login Nodes -!!! Note "Note" +!!! note Port forwarding allows an application running on Anselm to connect to arbitrary remote host and port. It works by tunneling the connection from Anselm back to users workstation and forwarding from the workstation to the remote host. @@ -177,7 +177,7 @@ In this example, we assume that port forwarding from login1:6000 to remote.host. Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. -!!! Note "Note" +!!! note Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: @@ -198,9 +198,9 @@ Now, configure the applications proxy settings to **localhost:6000**. Use port f ## Graphical User Interface -- The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters. -- The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer). +* The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters. +* The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer). ## VPN Access -- Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/). +* Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/). diff --git a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md index 7f1bdcd9f66f4ce224966eb3fba5a7941abb4531..e8827e17c777055dbf9d916d5fd55174be129bee 100644 --- a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md +++ b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md @@ -32,8 +32,8 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. -!!! Note "Note" - The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. +!!! note + The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH file system](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch file system. diff --git a/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md b/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md index d22c987f2f4e794def52c33eb6950b825eac6708..569f20771197f93a74559037809e38bb606449f0 100644 --- a/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md +++ b/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md @@ -12,10 +12,10 @@ NWChem aims to provide its users with computational chemistry tools that are sca The following versions are currently installed: -- 6.1.1, not recommended, problems have been observed with this version -- 6.3-rev2-patch1, current release with QMD patch applied. Compiled with Intel compilers, MKL and Intel MPI -- 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI and NWChem provided BLAS instead of MKL. This version is expected to be slower -- 6.3-rev2-patch1-venus, this version contains only libraries for VENUS interface linking. Does not provide standalone NWChem executable +* 6.1.1, not recommended, problems have been observed with this version +* 6.3-rev2-patch1, current release with QMD patch applied. Compiled with Intel compilers, MKL and Intel MPI +* 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI and NWChem provided BLAS instead of MKL. This version is expected to be slower +* 6.3-rev2-patch1-venus, this version contains only libraries for VENUS interface linking. Does not provide standalone NWChem executable For a current list of installed versions, execute: @@ -40,5 +40,5 @@ NWChem is compiled for parallel MPI execution. Normal procedure for MPI jobs app Please refer to [the documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and in the input file set the following directives : -- MEMORY : controls the amount of memory NWChem will use -- SCRATCH_DIR : set this to a directory in [SCRATCH file system](../../storage/storage/#scratch) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, e.g.. "scf direct" +* MEMORY : controls the amount of memory NWChem will use +* SCRATCH_DIR : set this to a directory in [SCRATCH file system](../../storage/storage/#scratch) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, e.g.. "scf direct" diff --git a/docs.it4i/anselm-cluster-documentation/software/compilers.md b/docs.it4i/anselm-cluster-documentation/software/compilers.md index 67c0bd30edcc631860ec8d853e0905729f8e5108..deb6c1122ee897e1eeb2a7df4a44158ac53ace58 100644 --- a/docs.it4i/anselm-cluster-documentation/software/compilers.md +++ b/docs.it4i/anselm-cluster-documentation/software/compilers.md @@ -4,11 +4,11 @@ Currently there are several compilers for different programming languages available on the Anselm cluster: -- C/C++ -- Fortran 77/90/95 -- Unified Parallel C -- Java -- NVIDIA CUDA +* C/C++ +* Fortran 77/90/95 +* Unified Parallel C +* Java +* NVIDIA CUDA The C/C++ and Fortran compilers are divided into two main groups GNU and Intel. @@ -45,8 +45,8 @@ For more information about the possibilities of the compilers, please see the ma UPC is supported by two compiler/runtime implementations: -- GNU - SMP/multi-threading support only -- Berkley - multi-node support as well as SMP/multi-threading support +* GNU - SMP/multi-threading support only +* Berkley - multi-node support as well as SMP/multi-threading support ### GNU UPC Compiler @@ -102,7 +102,10 @@ To use the Berkley UPC compiler and runtime environment to run the binaries use As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only. -For production runs, it is recommended to use the native Infiband implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. Please note, that **the selection of the network is done at the compile time** and not at runtime (as expected)! +For production runs, it is recommended to use the native Infiband implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. + +!!! warning + Selection of the network is done at the compile time and not at runtime (as expected)! Example UPC code: diff --git a/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md b/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md index befce6a433d0f0f35a7429deb1b7e6b11311b335..5dd2b59ef6413cd27c2b47f5740c9affbadf995b 100644 --- a/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md +++ b/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md @@ -6,11 +6,11 @@ standard engineering problems COMSOL provides add-on products such as electrical, mechanical, fluid flow, and chemical applications. -- [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module), -- [Heat Transfer Module](http://www.comsol.com/heat-transfer-module), -- [CFD Module](http://www.comsol.com/cfd-module), -- [Acoustics Module](http://www.comsol.com/acoustics-module), -- and [many others](http://www.comsol.com/products) +* [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module), +* [Heat Transfer Module](http://www.comsol.com/heat-transfer-module), +* [CFD Module](http://www.comsol.com/cfd-module), +* [Acoustics Module](http://www.comsol.com/acoustics-module), +* and [many others](http://www.comsol.com/products) COMSOL also allows an interface support for equation-based modelling of partial differential equations. @@ -18,19 +18,19 @@ COMSOL also allows an interface support for equation-based modelling of partial On the Anselm cluster COMSOL is available in the latest stable version. There are two variants of the release: -- **Non commercial** or so called **EDU variant**, which can be used for research and educational purposes. -- **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon. +* **Non commercial** or so called **EDU variant**, which can be used for research and educational purposes. +* **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon. To load the of COMSOL load the module ```bash - $ module load comsol + $ module load comsol ``` By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use ```bash - $ module avail comsol + $ module avail comsol ``` If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the Virtual Network Computing (VNC). diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md index 7d2fe37aaba8aab17a4086d6c657faaf6d5d65a3..bd5ece9bd34415d42838ce75defb300695738338 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md @@ -10,13 +10,13 @@ Allinea MAP is a profiler for C/C++/Fortran HPC codes. It is designed for profil On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that: -- 1 user can debug up 64 processes, or -- 32 users can debug 2 processes, etc. +* 1 user can debug up 64 processes, or +* 32 users can debug 2 processes, etc. In case of debugging on accelerators: -- 1 user can debug on up to 8 accelerators, or -- 8 users can debug on single accelerator. +* 1 user can debug on up to 8 accelerators, or +* 8 users can debug on single accelerator. ## Compiling Code to Run With DDT @@ -47,7 +47,7 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: -!!! Note +!!! note - **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - **O0** : Suppress all optimizations. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md index fdc57fb37aec104af98ab2a030210c387ef4277f..ad8d74d773f621ffad8ce0af1ca3bb5000e7ece3 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md @@ -20,8 +20,8 @@ The module sets up environment variables, required for using the Allinea Perform ## Usage -!!! Note "Note" - Use the the perf-report wrapper on your (MPI) program. +!!! note + Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi/), use the the perf report wrapper: diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md index 23849f609b3b96db56c1f93da53f0f350cc9b1e9..799b10bad52639bf995ed19ae609c1ef2e42c503 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md @@ -4,9 +4,9 @@ CUBE is a graphical performance report explorer for displaying data from Score-P and Scalasca (and other compatible tools). The name comes from the fact that it displays performance data in a three-dimensions : -- **performance metric**, where a number of metrics are available, such as communication time or cache misses, -- **call path**, which contains the call tree of your program -- **system resource**, which contains system's nodes, processes and threads, depending on the parallel programming model. +* **performance metric**, where a number of metrics are available, such as communication time or cache misses, +* **call path**, which contains the call tree of your program +* **system resource**, which contains system's nodes, processes and threads, depending on the parallel programming model. Each dimension is organized in a tree, for example the time performance metric is divided into Execution time and Overhead time, call path dimension is organized by files and routines in your source code etc. @@ -20,15 +20,15 @@ Each node in the tree is colored by severity (the color scheme is displayed at t Currently, there are two versions of CUBE 4.2.3 available as [modules](../../environment-and-modules/): -- cube/4.2.3-gcc, compiled with GCC -- cube/4.2.3-icc, compiled with Intel compiler +* cube/4.2.3-gcc, compiled with GCC +* cube/4.2.3-icc, compiled with Intel compiler ## Usage CUBE is a graphical application. Refer to Graphical User Interface documentation for a list of methods to launch graphical applications on Anselm. -!!! Note - Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes. +!!! note + Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes. After loading the appropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md index c408fd4fe08815c59c5e5beafecd99fe4a1d6b9a..d9b878254aab451e1ec5b8c9ffe7efb1a3dca363 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md @@ -192,8 +192,8 @@ Can be used as a sensor for ksysguard GUI, which is currently not installed on A In a similar fashion to PAPI, PCM provides a C++ API to access the performance counter from within your application. Refer to the [Doxygen documentation](http://intel-pcm-api-documentation.github.io/classPCM.html) for details of the API. -!!! Note - Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user) +!!! note + Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user) Sample program using the API : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md index e9bae568d427dcda6f11dd0a728533e13d194d17..3c3f1a8af340e402b2e8797a8fbb802ae2b5257d 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md @@ -4,11 +4,11 @@ Intel VTune Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A highlight of the features: -- Hotspot analysis -- Locks and waits analysis -- Low level specific counters, such as branch analysis and memory +* Hotspot analysis +* Locks and waits analysis +* Low level specific counters, such as branch analysis and memory bandwidth -- Power usage analysis - frequency and sleep states. +* Power usage analysis - frequency and sleep states.  @@ -26,8 +26,8 @@ and launch the GUI : $ amplxe-gui ``` -!!! Note - To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications. +!!! note + To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications. The GUI will open in new window. Click on "_New Project..._" to create a new project. After clicking _OK_, a new window with project properties will appear. At "_Application:_", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "_Managed code profiling mode:_" select "_Native_" (unless you want to profile managed mode .NET/Mono applications). After clicking _OK_, your project is created. @@ -47,8 +47,8 @@ Copy the line to clipboard and then you can paste it in your jobscript or in com ## Xeon Phi -!!! Note - This section is outdated. It will be updated with new information soon. +!!! note + This section is outdated. It will be updated with new information soon. It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties: @@ -58,8 +58,8 @@ Application parameters: mic0 source ~/.profile && /path/to/your/bin Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/). -!!! Note - If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. +!!! note + If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md index 689bdf611508229df8611022283972632ff84fd9..ee7d63fe69c9440d50da4200bdca431687ab9cec 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md @@ -76,15 +76,15 @@ Prints information about the memory architecture of the current CPU. PAPI provides two kinds of events: -- **Preset events** is a set of predefined common CPU events, standardized across platforms. -- **Native events **is a set of all events supported by the current hardware. This is a larger set of features than preset. For other components than CPU, only native events are usually available. +* **Preset events** is a set of predefined common CPU events, standardized across platforms. +* **Native events **is a set of all events supported by the current hardware. This is a larger set of features than preset. For other components than CPU, only native events are usually available. To use PAPI in your application, you need to link the appropriate include file. -- papi.h for C -- f77papi.h for Fortran 77 -- f90papi.h for Fortran 90 -- fpapi.h for Fortran with preprocessor +* papi.h for C +* f77papi.h for Fortran 77 +* f90papi.h for Fortran 90 +* fpapi.h for Fortran with preprocessor The include path is automatically added by papi module to $INCLUDE. @@ -190,8 +190,8 @@ Now the compiler won't remove the multiplication loop. (However it is still not ### Intel Xeon Phi -!!! Note "Note" - PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing. +!!! note + PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing. To use PAPI in [Intel Xeon Phi](../intel-xeon-phi/) native applications, you need to load module with " -mic" suffix, for example " papi/5.3.2-mic" : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md index 45c0768e7cae1ed4e5256e461b2b29f40aa86bb5..fe807dc4994b7934dca927fd3e4ef0a14f6249a3 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md @@ -10,8 +10,8 @@ Scalasca supports profiling of MPI, OpenMP and hybrid MPI+OpenMP applications. There are currently two versions of Scalasca 2.0 [modules](../../environment-and-modules/) installed on Anselm: -- scalasca2/2.0-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/), -- scalasca2/2.0-icc-impi, for usage with [Intel Compiler](../compilers.html) and [Intel MPI](../mpi/running-mpich2/). +* scalasca2/2.0-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/), +* scalasca2/2.0-icc-impi, for usage with [Intel Compiler](../compilers.html) and [Intel MPI](../mpi/running-mpich2/). ## Usage @@ -39,11 +39,11 @@ An example : Some notable Scalasca options are: -- **-t Enable trace data collection. By default, only summary data are collected.** -- **-e <directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.** +* **-t Enable trace data collection. By default, only summary data are collected.** +* **-e <directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.** -!!! Note - Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/). +!!! note + Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/). ### Analysis of Reports diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md index 4f1296679c56b6d65a0edb873196c5c0bb537519..f0d0c33b8e48afa24e51d6540d53705dfa1e477a 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md @@ -10,8 +10,8 @@ Score-P can be used as an instrumentation tool for [Scalasca](scalasca/). There are currently two versions of Score-P version 1.2.6 [modules](../../environment-and-modules/) installed on Anselm : -- scorep/1.2.3-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/) -- scorep/1.2.3-icc-impi, for usage with [Intel Compiler](../compilers.html)> and [Intel MPI](../mpi/running-mpich2/)>. +* scorep/1.2.3-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/) +* scorep/1.2.3-icc-impi, for usage with [Intel Compiler](../compilers.html)> and [Intel MPI](../mpi/running-mpich2/)>. ## Instrumentation diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md index ca08c5ea8f6b45f048e42a95f9f119f05dc35ef2..2265a89b6e4b51024f36fcabb0b537426931ca60 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md @@ -57,7 +57,7 @@ Compile the code: Before debugging, you need to compile your code with theses flags: -!!! Note +!!! note - **-g** : Generates extra debugging information usable by GDB. **-g3** includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - **-O0** : Suppress all optimizations. @@ -91,8 +91,8 @@ To debug a serial code use: To debug a parallel code compiled with **OpenMPI** you need to setup your TotalView environment: -!!! Note - **Please note:** To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file: +!!! hint + To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your `~/.tvdrc` file: ```bash proc mpi_auto_run_starter {loaded_id} { @@ -120,8 +120,8 @@ The source code of this function can be also found in /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl ``` -!!! Note - You can also add only following line to you ~/.tvdrc file instead of the entire function: +!!! note + You can also add only following line to you ~/.tvdrc file instead of the entire function: **source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl** You need to do this step only once. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md index 8332377a295ac21a6175918d893043143dd6c669..bfcfc9a86aeb60b88cf8a06ce45fd741bd34768d 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md @@ -10,19 +10,19 @@ Valgind is an extremely useful tool for debugging memory errors such as [off-by- The main tools available in Valgrind are : -- **Memcheck**, the original, must used and default tool. Verifies memory access in you program and can detect use of unitialized memory, out of bounds memory access, memory leaks, double free, etc. -- **Massif**, a heap profiler. -- **Hellgrind** and **DRD** can detect race conditions in multi-threaded applications. -- **Cachegrind**, a cache profiler. -- **Callgrind**, a callgraph analyzer. -- For a full list and detailed documentation, please refer to the [official Valgrind documentation](http://valgrind.org/docs/). +* **Memcheck**, the original, must used and default tool. Verifies memory access in you program and can detect use of unitialized memory, out of bounds memory access, memory leaks, double free, etc. +* **Massif**, a heap profiler. +* **Hellgrind** and **DRD** can detect race conditions in multi-threaded applications. +* **Cachegrind**, a cache profiler. +* **Callgrind**, a callgraph analyzer. +* For a full list and detailed documentation, please refer to the [official Valgrind documentation](http://valgrind.org/docs/). ## Installed Versions There are two versions of Valgrind available on Anselm. -- Version 3.6.0, installed by operating system vendor in /usr/bin/valgrind. This version is available by default, without the need to load any module. This version however does not provide additional MPI support. -- Version 3.9.0 with support for Intel MPI, available in [module](../../environment-and-modules/) valgrind/3.9.0-impi. After loading the module, this version replaces the default valgrind. +* Version 3.6.0, installed by operating system vendor in /usr/bin/valgrind. This version is available by default, without the need to load any module. This version however does not provide additional MPI support. +* Version 3.9.0 with support for Intel MPI, available in [module](../../environment-and-modules/) valgrind/3.9.0-impi. After loading the module, this version replaces the default valgrind. ## Usage diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md index 50b8b005e4f65c2ba9eb51cd8bd21fc398979f76..df0b0a8a124a2f70d11a2e2adc4eb3d17cf227a0 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md @@ -32,5 +32,5 @@ Read more at <http://software.intel.com/sites/products/documentation/doclib/stdx Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon will use Haswell architecture. >The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. >To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags >designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing this: -- Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes. -- Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries. +* Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes. +* Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries. diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md index 7d874b4d9534379cd8e1c62fcf1069f4406df721..b92f8d05f62d9305f9624e592d388cf2744b5081 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md @@ -4,8 +4,8 @@ Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more. -!!! Note "Note" - Check out IPP before implementing own math functions for data processing, it is likely already there. +!!! note + Check out IPP before implementing own math functions for data processing, it is likely already there. ```bash $ module load ipp diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md index d887b4e595f8d6876dd56091c418e439e98f8aca..dcd4f6c7ea441ca6fed50a0da94ba9dfc974b1ae 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md @@ -4,14 +4,14 @@ Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels: -- BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations. -- The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations. -- ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). -- Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions. -- Vector Math Library (VML) routines for optimized mathematical operations on vectors. -- Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions. -- Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. -- Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver. +* BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations. +* The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations. +* ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). +* Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions. +* Vector Math Library (VML) routines for optimized mathematical operations on vectors. +* Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions. +* Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. +* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver. For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm). @@ -23,8 +23,8 @@ Intel MKL version 13.5.192 is available on Anselm The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES -!!! Note "Note" - The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. +!!! note + The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. ### Interfaces @@ -47,8 +47,8 @@ You will need the mkl module loaded to run the mkl enabled executable. This may ### Threading -!!! Note "Note" - Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. +!!! note + Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md index 24c6380f725105b68f25154d9175c82ecd99269e..3c2495ba8c0592df6556ab7c41c078dd3cedf5af 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md @@ -13,8 +13,8 @@ Intel TBB version 4.1 is available on Anselm The module sets up environment variables, required for linking and running tbb enabled applications. -!!! Note "Note" - Link the tbb library, using -ltbb +!!! note + Link the tbb library, using -ltbb ## Examples diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md index 5c0a71af18ba839622f6ae0d5eef8c9ec62ac285..0390dff9411db764274c0e8e8caaed9293aed1c6 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md @@ -103,7 +103,10 @@ For debugging purposes it is also recommended to set environment variable "OFFLO export OFFLOAD_REPORT=3 ``` -A very basic example of code that employs offload programming technique is shown in the next listing. Please note that this code is sequential and utilizes only single core of the accelerator. +A very basic example of code that employs offload programming technique is shown in the next listing. + +!!! note + This code is sequential and utilizes only single core of the accelerator. ```bash $ vim source-offload.cpp @@ -229,8 +232,8 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: -!!! Note "Note" - Debugging +!!! note + Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level @@ -326,8 +329,8 @@ Following example show how to automatically offload an SGEMM (single precision - } ``` -!!! Note "Note" - Please note: This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** +!!! note + This example is simplified version of an example from MKL. The expanded version can be found here: `$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c`. To compile a code using Intel compiler use: @@ -369,8 +372,8 @@ To compile a code user has to be connected to a compute with MIC and load Intel $ module load intel/13.5.192 ``` -!!! Note "Note" - Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. +!!! note + Particular version of the Intel module is specified. This information is used later to specify the correct library paths. To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: @@ -412,13 +415,13 @@ If the code is parallelized using OpenMP a set of additional libraries is requir mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` -!!! Note "Note" - Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. +!!! note + The path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: -!!! Note "Note" - /apps/intel/composer_xe_2013.5.192/compiler/lib/mic +!!! note + /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so - libimf.so @@ -498,8 +501,8 @@ After executing the complied binary file, following output should be displayed. ... ``` -!!! Note "Note" - More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> +!!! note + More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -538,8 +541,8 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ... ``` -!!! Note "Note" - Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. +!!! warning + GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. ## MPI @@ -600,8 +603,8 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: -!!! Note "Note" - **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) +!!! note + **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. @@ -647,10 +650,9 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` -!!! Note "Note" - Please note: - \- this file sets up both environmental variable for both MPI and OpenMP libraries. - \- this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. +!!! note + - this file sets up both environmental variable for both MPI and OpenMP libraries. + - this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. To access a MIC accelerator located on a node that user is currently connected to, use: @@ -701,10 +703,9 @@ or using mpirun $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` -!!! Note "Note" - Please note: - \- the full path to the binary has to specified (here: "**>~/mpi-test-mic**") - \- the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code +!!! note + - the full path to the binary has to specified (here: `>~/mpi-test-mic`) + - the `LD_LIBRARY_PATH` has to match with Intel MPI module used to compile the MPI code The output should be again similar to: @@ -715,8 +716,10 @@ The output should be again similar to: Hello world from process 0 of 4 on host cn207-mic0 ``` -!!! Note "Note" - Please note that the **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: +!!! note + `mpiexec.hydra` requires a file the MIC filesystem. If the file is missing please contact the system administrators. + +A simple test to see if the file is present is to execute: ```bash $ ssh mic0 ls /bin/pmi_proxy @@ -748,12 +751,11 @@ For example: This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. -!!! Note "Note" - Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - - - to connect to the second node : ** $ ssh cn205** - - to connect to the accelerator on the first node from the first node: **$ ssh cn204-mic0** or **$ ssh mic0** - - to connect to the accelerator on the second node from the first node: **$ ssh cn205-mic0** +!!! note + At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: + - to connect to the second node : `$ ssh cn205` + - to connect to the accelerator on the first node from the first node: `$ ssh cn204-mic0` or `$ ssh mic0` + - to connect to the accelerator on the second node from the first node: `$ ssh cn205-mic0` At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" @@ -871,7 +873,7 @@ To run the MPI code using mpirun and the machine file "hosts_file_mix" use: A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: ```bash - Hello world from process 0 of 8 on host cn204 + Hello world from process 0 of 8 on host cn204 Hello world from process 1 of 8 on host cn204 Hello world from process 2 of 8 on host cn204-mic0 Hello world from process 3 of 8 on host cn204-mic0 @@ -881,21 +883,21 @@ A possible output of the MPI "hello-world" example executed on two hosts and two Hello world from process 7 of 8 on host cn205-mic0 ``` -!!! Note "Note" - Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. +!!! note + At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. **Using the PBS automatically generated node-files** PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -!!! Note "Note" - **Host only node-file:** +!!! note + **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: - /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file: - /lscratch/${PBS_JOBID}/nodefile-mix -Please note each host or accelerator is listed only per files. User has to specify how many jobs should be executed per node using "-n" parameter of the mpirun command. +Each host or accelerator is listed only per files. User has to specify how many jobs should be executed per node using `-n` parameter of the mpirun command. ## Optimization diff --git a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md index 577478f9cb8475449ac0f6dfd4215a073609e2e5..d2512ae42bd0466f4daface9d4449d2fd0177d5c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md +++ b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md @@ -10,8 +10,8 @@ If an ISV application was purchased for educational (research) purposes and also ## Overview of the Licenses Usage -!!! Note "Note" - The overview is generated every minute and is accessible from web or command line interface. +!!! note + The overview is generated every minute and is accessible from web or command line interface. ### Web Interface @@ -61,21 +61,21 @@ The general format of the name is `feature__APP__FEATURE`. Names of applications (APP): -- ansys -- comsol -- comsol-edu -- matlab -- matlab-edu +* ansys +* comsol +* comsol-edu +* matlab +* matlab-edu To get the FEATUREs of a license take a look into the corresponding state file ([see above](isv_licenses/#Licence)), or use: **Application and List of provided features** -- **ansys** $ grep -v "#" /apps/user/licenses/ansys_features_state.txt | cut -f1 -d' ' -- **comsol** $ grep -v "#" /apps/user/licenses/comsol_features_state.txt | cut -f1 -d' ' -- **comsol-ed** $ grep -v "#" /apps/user/licenses/comsol-edu_features_state.txt | cut -f1 -d' ' -- **matlab** $ grep -v "#" /apps/user/licenses/matlab_features_state.txt | cut -f1 -d' ' -- **matlab-edu** $ grep -v "#" /apps/user/licenses/matlab-edu_features_state.txt | cut -f1 -d' ' +* **ansys** $ grep -v "#" /apps/user/licenses/ansys_features_state.txt | cut -f1 -d' ' +* **comsol** $ grep -v "#" /apps/user/licenses/comsol_features_state.txt | cut -f1 -d' ' +* **comsol-ed** $ grep -v "#" /apps/user/licenses/comsol-edu_features_state.txt | cut -f1 -d' ' +* **matlab** $ grep -v "#" /apps/user/licenses/matlab_features_state.txt | cut -f1 -d' ' +* **matlab-edu** $ grep -v "#" /apps/user/licenses/matlab-edu_features_state.txt | cut -f1 -d' ' Example of PBS Pro resource name, based on APP and FEATURE name: diff --git a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md index 259d6b094ea03ac911243bf3fcba411ee9bf555c..8ecfd71d5e62637b80eebd54f1cc32dedb818f5e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md +++ b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md @@ -6,11 +6,11 @@ Running virtual machines on compute nodes There are situations when Anselm's environment is not suitable for user needs. -- Application requires different operating system (e.g Windows), application is not available for Linux -- Application requires different versions of base system libraries and tools -- Application requires specific setup (installation, configuration) of complex software stack -- Application requires privileged access to operating system -- ... and combinations of above cases +* Application requires different operating system (e.g Windows), application is not available for Linux +* Application requires different versions of base system libraries and tools +* Application requires specific setup (installation, configuration) of complex software stack +* Application requires privileged access to operating system +* ... and combinations of above cases We offer solution for these cases - **virtualization**. Anselm's environment gives the possibility to run virtual machines on compute nodes. Users can create their own images of operating system with specific software stack and run instances of these images as virtual machines on compute nodes. Run of virtual machines is provided by standard mechanism of [Resource Allocation and Job Execution](../../resource-allocation-and-job-execution/introduction/). @@ -26,10 +26,10 @@ Virtualization has also some drawbacks, it is not so easy to setup efficient sol Solution described in chapter [HOWTO](virtualization/#howto) is suitable for single node tasks, does not introduce virtual machine clustering. -!!! Note - Please consider virtualization as last resort solution for your needs. +!!! note + Please consider virtualization as last resort solution for your needs. -!!! Warning +!!! warning Please consult use of virtualization with IT4Innovation's support. For running Windows application (when source code and Linux native application are not available) consider use of Wine, Windows compatibility layer. Many Windows applications can be run using Wine with less effort and better performance than when using virtualization. @@ -38,8 +38,8 @@ For running Windows application (when source code and Linux native application a IT4Innovations does not provide any licenses for operating systems and software of virtual machines. Users are ( in accordance with [Acceptable use policy document](http://www.it4i.cz/acceptable-use-policy.pdf)) fully responsible for licensing all software running in virtual machines on Anselm. Be aware of complex conditions of licensing software in virtual environments. -!!! Note "Note" - Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines. +!!! note + Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines. ## Howto @@ -65,13 +65,13 @@ You can either use your existing image or create new image from scratch. QEMU currently supports these image types or formats: -- raw -- cloop -- cow -- qcow -- qcow2 -- vmdk - VMware 3 & 4, or 6 image format, for exchanging images with that product -- vdi - VirtualBox 1.1 compatible image format, for exchanging images with VirtualBox. +* raw +* cloop +* cow +* qcow +* qcow2 +* vmdk - VMware 3 & 4, or 6 image format, for exchanging images with that product +* vdi - VirtualBox 1.1 compatible image format, for exchanging images with VirtualBox. You can convert your existing image using qemu-img convert command. Supported formats of this command are: blkdebug blkverify bochs cloop cow dmg file ftp ftps host_cdrom host_device host_floppy http https nbd parallels qcow qcow2 qed raw sheepdog tftp vdi vhdx vmdk vpc vvfat. @@ -97,10 +97,10 @@ Your image should run some kind of operating system startup script. Startup scri We recommend, that startup script -- maps Job Directory from host (from compute node) -- runs script (we call it "run script") from Job Directory and waits for application's exit - - for management purposes if run script does not exist wait for some time period (few minutes) -- shutdowns/quits OS +* maps Job Directory from host (from compute node) +* runs script (we call it "run script") from Job Directory and waits for application's exit + * for management purposes if run script does not exist wait for some time period (few minutes) +* shutdowns/quits OS For Windows operating systems we suggest using Local Group Policy Startup script, for Linux operating systems rc.local, runlevel init script or similar service. @@ -248,8 +248,8 @@ Run virtual machine using optimized devices, user network back-end with sharing Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 for SSH). You must use VPN network). -!!! Note "Note" - Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine. +!!! note + Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine. ### Networking and Data Sharing @@ -338,9 +338,9 @@ Interface tap0 has IP address 192.168.1.1 and network mask 255.255.255.0 (/24). Redirected ports: -- DNS udp/53->udp/3053, tcp/53->tcp3053 -- DHCP udp/67->udp3067 -- SMB tcp/139->tcp3139, tcp/445->tcp3445). +* DNS udp/53->udp/3053, tcp/53->tcp3053 +* DHCP udp/67->udp3067 +* SMB tcp/139->tcp3139, tcp/445->tcp3445). You can configure IP address of virtual machine statically or dynamically. For dynamic addressing provide your DHCP server on port 3067 of tap0 interface, you can also provide your DNS server on port 3053 of tap0 interface for example: diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md index 7d954569b14c633272ee4ee793f0a62703f7827c..8e11a3c163bcac6a711e18c4232a98a6acb5a16f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md @@ -6,8 +6,8 @@ The OpenMPI programs may be executed only via the PBS Workload manager, by enter ### Basic Usage -!!! Note - Use the mpiexec to run the OpenMPI code. +!!! note + Use the mpiexec to run the OpenMPI code. Example: @@ -27,8 +27,8 @@ Example: Hello world! from rank 3 of 4 on host cn110 ``` -!!! Note - Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node. +!!! note + Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node. In this example, we allocate 4 nodes via the express queue interactively. We set up the openmpi environment and interactively run the helloworld_mpi.x program. Note that the executable helloworld_mpi.x must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. @@ -48,8 +48,8 @@ You need to preload the executable, if running on the local scratch /lscratch fi In this example, we assume the executable helloworld_mpi.x is present on compute node cn17 on local scratch. We call the mpiexec whith the **--preload-binary** argument (valid for openmpi). The mpiexec will copy the executable from cn17 to the /lscratch/15210.srv11 directory on cn108, cn109 and cn110 and execute the program. -!!! Note - MPI process mapping may be controlled by PBS parameters. +!!! note + MPI process mapping may be controlled by PBS parameters. The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. @@ -97,8 +97,8 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note - Important! Bind every OpenMP thread to a core! +!!! note + Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: @@ -152,8 +152,8 @@ In this example, we see that ranks have been mapped on nodes according to the or Exact control of MPI process placement and resource binding is provided by specifying a rankfile -!!! Note - Appropriate binding may boost performance of your application. +!!! note + Appropriate binding may boost performance of your application. Example rankfile diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md index dd92c4e68922dfbbc267de5dbc088096b9fb919b..f164792863fdfb0b5fd83b41f5a8efd9328b301a 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md @@ -60,8 +60,8 @@ In this example, the openmpi 1.6.5 using intel compilers is activated ## Compiling MPI Programs -!!! Note "Note" - After setting up your MPI environment, compile your program using one of the mpi wrappers +!!! note + After setting up your MPI environment, compile your program using one of the mpi wrappers ```bash $ mpicc -v @@ -107,8 +107,8 @@ Compile the above example with ## Running MPI Programs -!!! Note "Note" - The MPI program executable must be compatible with the loaded MPI module. +!!! note + The MPI program executable must be compatible with the loaded MPI module. Always compile and execute using the very same MPI module. It is strongly discouraged to mix mpi implementations. Linking an application with one MPI implementation and running mpirun/mpiexec form other implementation may result in unexpected errors. @@ -119,8 +119,8 @@ The MPI program executable must be available within the same path on all nodes. Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern. -!!! Note "Note" - Consider these ways to run an MPI program: +!!! note + Consider these ways to run an MPI program: 1. One MPI process per node, 16 threads per process 2. Two MPI processes per node, 8 threads per process @@ -130,8 +130,8 @@ Optimal way to run an MPI program depends on its memory requirements, memory acc **Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and NUMA effect overheads. -!!! Note "Note" - Important! Bind every OpenMP thread to a core! +!!! note + Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md index 0d5f59454eccdfc59e69e49914d8327e6d140227..1a8972c390a62ba9e29cb67e3993b7b8c1ea412f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md @@ -6,8 +6,8 @@ The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS ### Basic Usage -!!! Note "Note" - Use the mpirun to execute the MPICH2 code. +!!! note + Use the mpirun to execute the MPICH2 code. Example: @@ -43,8 +43,8 @@ You need to preload the executable, if running on the local scratch /lscratch fi In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node. -!!! Note "Note" - MPI process mapping may be controlled by PBS parameters. +!!! note + MPI process mapping may be controlled by PBS parameters. The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. @@ -92,8 +92,8 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note "Note" - Important! Bind every OpenMP thread to a core! +!!! note + Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md index 602b51b4b908628360eb428e5b5850d65151388c..d693a1872e3cf23badce337d4715ee679b2f00e8 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md @@ -4,8 +4,8 @@ Matlab is available in versions R2015a and R2015b. There are always two variants of the release: -- Non commercial or so called EDU variant, which can be used for common research and educational purposes. -- Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. +* Non commercial or so called EDU variant, which can be used for common research and educational purposes. +* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. To load the latest version of Matlab load the module @@ -41,8 +41,8 @@ plots, images, etc... will be still available. ## Running Parallel Matlab Using Distributed Computing Toolbox / Engine -!!! Note "Note" - Distributed toolbox is available only for the EDU variant +!!! note + Distributed toolbox is available only for the EDU variant The MPIEXEC mode available in previous versions is no longer available in MATLAB 2015. Also, the programming interface has changed. Refer to [Release Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1). @@ -64,8 +64,8 @@ Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click I With the new mode, MATLAB itself launches the workers via PBS, so you can either use interactive mode or a batch mode on one node, but the actual parallel processing will be done in a separate job started by MATLAB itself. Alternatively, you can use "local" mode to run parallel code on just a single node. -!!! Note "Note" - The profile is confusingly named Salomon, but you can use it also on Anselm. +!!! note + The profile is confusingly named Salomon, but you can use it also on Anselm. ### Parallel Matlab Interactive Session @@ -133,8 +133,8 @@ The last part of the configuration is done directly in the user Matlab script be This script creates scheduler object "cluster" of type "local" that starts workers locally. -!!! Note "Note" - Please note: Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. +!!! note + Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. @@ -217,7 +217,8 @@ You can start this script using batch mode the same way as in Local mode example This method is a "hack" invented by us to emulate the mpiexec functionality found in previous MATLAB versions. We leverage the MATLAB Generic Scheduler interface, but instead of submitting the workers to PBS, we launch the workers directly within the running job, thus we avoid the issues with master script and workers running in separate jobs (issues with license not available, waiting for the worker's job to spawn etc.) -Please note that this method is experimental. +!!! warning + This method is experimental. For this method, you need to use SalomonDirect profile, import it using [the same way as SalomonPBSPro](matlab/#running-parallel-matlab-using-distributed-computing-toolbox---engine) diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md index d10c114ce7b0fb1f16f70b1310155a86f419dcb0..f9cf95feb5013a3843458fb22fd2f8eaa6e9f5e9 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md @@ -2,13 +2,13 @@ ## Introduction -!!! Note "Note" - This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/). +!!! note + This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/). Matlab is available in the latest stable version. There are always two variants of the release: -- Non commercial or so called EDU variant, which can be used for common research and educational purposes. -- Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. +* Non commercial or so called EDU variant, which can be used for common research and educational purposes. +* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. To load the latest version of Matlab load the module @@ -71,7 +71,7 @@ extras = {}; System MPI library allows Matlab to communicate through 40 Gbit/s InfiniBand QDR interconnect instead of slower 1 Gbit Ethernet network. -!!! Note "Note" +!!! note The path to MPI library in "mpiLibConf.m" has to match with version of loaded Intel MPI module. In this example the version 4.1.1.036 of Intel MPI is used by Matlab and therefore module impi/4.1.1.036 has to be loaded prior to starting Matlab. ### Parallel Matlab Interactive Session @@ -144,7 +144,7 @@ set(sched, 'EnvironmentSetMethod', 'setenv'); This script creates scheduler object "sched" of type "mpiexec" that starts workers using mpirun tool. To use correct version of mpirun, the second line specifies the path to correct version of system Intel MPI library. -!!! Note "Note" +!!! note Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling matlabpool(sched, ...) function. The last step is to start matlabpool with "sched" object and correct number of workers. In this case qsub asked for total number of 32 cores, therefore the number of workers is also set to 32. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md index e043f45a4e347ef6114955ef420c87b702278704..038de8aade954aa089d5e2878ef733861fde8fea 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md @@ -90,14 +90,14 @@ In this example, the calculation was automatically divided among the CPU cores a A version of [native](../intel-xeon-phi/#section-4) Octave is compiled for Xeon Phi accelerators. Some limitations apply for this version: -- Only command line support. GUI, graph plotting etc. is not supported. -- Command history in interactive mode is not supported. +* Only command line support. GUI, graph plotting etc. is not supported. +* Command history in interactive mode is not supported. Octave is linked with parallel Intel MKL, so it best suited for batch processing of tasks that utilize BLAS, LAPACK and FFT operations. By default, number of threads is set to 120, you can control this with > OMP_NUM_THREADS environment variable. -!!! Note "Note" - Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. +!!! note + Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. To use Octave on a node with Xeon Phi: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md index 48ac36cabf4ed594540a5bbfe6239cdfabacd25a..56426eb06591ed490d317fa14a65ddfd2bc4290f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md @@ -95,8 +95,8 @@ Download the package [parallell](package-parallel-vignette.pdf) vignette. The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. -!!! Note "Note" - Forking via package parallel provides functionality similar to OpenMP construct +!!! note + Forking via package parallel provides functionality similar to OpenMP construct omp parallel for @@ -146,8 +146,8 @@ Every evaluation of the integrad function runs in parallel on different process. ## Package Rmpi -!!! Note "Note" - package Rmpi provides an interface (wrapper) to MPI APIs. +!!! note + package Rmpi provides an interface (wrapper) to MPI APIs. It also provides interactive R slave environment. On Anselm, Rmpi provides interface to the [OpenMPI](../mpi-1/Running_OpenMPI/). @@ -296,8 +296,8 @@ Execute the example as: mpi.apply is a specific way of executing Dynamic Rmpi programs. -!!! Note "Note" - mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. +!!! note + mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. Execution is identical to other dynamic Rmpi programs. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md index 238d1cb9ad29a458d846ba0922b2f243ca711386..35ffe1775963e9580d860914e3d1f0c50343d8cb 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md @@ -22,8 +22,8 @@ Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, comp The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. -!!! Note "Note" - Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> +!!! note + Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> All GCC versions of **HDF5 1.8.13** are not affected by the bug, are compiled with -O3 optimizations and are recommended for production computations. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md index 6a91d61483a4e55e3a02d0c3da1deb5548a0c13e..c4b1c262b007a7b34eac140fa2e31a65d9513512 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -12,11 +12,11 @@ To be able to compile and link code with MAGMA library user has to load followin To make compilation more user friendly module also sets these two environment variables: -!!! Note "Note" - MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) +!!! note + MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) -!!! Note "Note" - MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). +!!! note + MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). Compilation example: @@ -30,17 +30,17 @@ Compilation example: MAGMA implementation for Intel MIC requires a MAGMA server running on accelerator prior to executing the user application. The server can be started and stopped using following scripts: -!!! Note "Note" - To start MAGMA server use: - **$MAGMAROOT/start_magma_server** +!!! note + To start MAGMA server use: + **$MAGMAROOT/start_magma_server** -!!! Note "Note" - To stop the server use: - **$MAGMAROOT/stop_magma_server** +!!! note + To stop the server use: + **$MAGMAROOT/stop_magma_server** -!!! Note "Note" - For deeper understanding how the MAGMA server is started, see the following script: - **$MAGMAROOT/launch_anselm_from_mic.sh** +!!! note + For deeper understanding how the MAGMA server is started, see the following script: + **$MAGMAROOT/launch_anselm_from_mic.sh** To test if the MAGMA server runs properly we can run one of examples that are part of the MAGMA installation: @@ -66,13 +66,11 @@ To test if the MAGMA server runs properly we can run one of examples that are pa 10304 10304 --- ( --- ) 500.70 ( 1.46) --- ``` -!!! Note "Note" - Please note: MAGMA contains several benchmarks and examples that can be found in: - **$MAGMAROOT/testing/** +!!! hint + MAGMA contains several benchmarks and examples in `$MAGMAROOT/testing/` -!!! Note "Note" - MAGMA relies on the performance of all CPU cores as well as on the performance of the accelerator. Therefore on Anselm number of CPU OpenMP threads has to be set to 16: - **export OMP_NUM_THREADS=16** +!!! note + MAGMA relies on the performance of all CPU cores as well as on the performance of the accelerator. Therefore on Anselm number of CPU OpenMP threads has to be set to 16 with `export OMP_NUM_THREADS=16`. See more details at [MAGMA home page](http://icl.cs.utk.edu/magma/). diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md index 5ea0936f7140691ade232c63c65a752018f96537..6d0b8fb58fae24e98cf4fe1f682e119890a12d67 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md @@ -8,11 +8,11 @@ PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of bu ## Resources -- [project webpage](http://www.mcs.anl.gov/petsc/) -- [documentation](http://www.mcs.anl.gov/petsc/documentation/) - - [PETSc Users Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) - - [index of all manual pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html) -- PRACE Video Tutorial [part1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part5](http://www.youtube.com/watch?v=iXkbLEBFjlM) +* [project webpage](http://www.mcs.anl.gov/petsc/) +* [documentation](http://www.mcs.anl.gov/petsc/documentation/) + * [PETSc Users Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) + * [index of all manual pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html) +* PRACE Video Tutorial [part1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part5](http://www.youtube.com/watch?v=iXkbLEBFjlM) ## Modules @@ -36,25 +36,25 @@ All these libraries can be used also alone, without PETSc. Their static or share ### Libraries Linked to PETSc on Anselm (As of 11 April 2015) -- dense linear algebra - - [Elemental](http://libelemental.org/) -- sparse linear system solvers - - [Intel MKL Pardiso](https://software.intel.com/en-us/node/470282) - - [MUMPS](http://mumps.enseeiht.fr/) - - [PaStiX](http://pastix.gforge.inria.fr/) - - [SuiteSparse](http://faculty.cse.tamu.edu/davis/suitesparse.html) - - [SuperLU](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu) - - [SuperLU_Dist](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu_dist) -- input/output - - [ExodusII](http://sourceforge.net/projects/exodusii/) - - [HDF5](http://www.hdfgroup.org/HDF5/) - - [NetCDF](http://www.unidata.ucar.edu/software/netcdf/) -- partitioning - - [Chaco](http://www.cs.sandia.gov/CRF/chac.html) - - [METIS](http://glaros.dtc.umn.edu/gkhome/metis/metis/overview) - - [ParMETIS](http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview) - - [PT-Scotch](http://www.labri.fr/perso/pelegrin/scotch/) -- preconditioners & multigrid - - [Hypre](http://www.nersc.gov/users/software/programming-libraries/math-libraries/petsc/) - - [Trilinos ML](http://trilinos.sandia.gov/packages/ml/) - - [SPAI - Sparse Approximate Inverse](https://bitbucket.org/petsc/pkg-spai) +* dense linear algebra + * [Elemental](http://libelemental.org/) +* sparse linear system solvers + * [Intel MKL Pardiso](https://software.intel.com/en-us/node/470282) + * [MUMPS](http://mumps.enseeiht.fr/) + * [PaStiX](http://pastix.gforge.inria.fr/) + * [SuiteSparse](http://faculty.cse.tamu.edu/davis/suitesparse.html) + * [SuperLU](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu) + * [SuperLU_Dist](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu_dist) +* input/output + * [ExodusII](http://sourceforge.net/projects/exodusii/) + * [HDF5](http://www.hdfgroup.org/HDF5/) + * [NetCDF](http://www.unidata.ucar.edu/software/netcdf/) +* partitioning + * [Chaco](http://www.cs.sandia.gov/CRF/chac.html) + * [METIS](http://glaros.dtc.umn.edu/gkhome/metis/metis/overview) + * [ParMETIS](http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview) + * [PT-Scotch](http://www.labri.fr/perso/pelegrin/scotch/) +* preconditioners & multigrid + * [Hypre](http://www.nersc.gov/users/software/programming-libraries/math-libraries/petsc/) + * [Trilinos ML](http://trilinos.sandia.gov/packages/ml/) + * [SPAI - Sparse Approximate Inverse](https://bitbucket.org/petsc/pkg-spai) diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md index 0fc553cd6e44ae92774deb9515fb216b9b79abc3..dbf7f01a5ec323eef138a163a2a89034d814065a 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md @@ -10,13 +10,13 @@ Trilinos is a collection of software packages for the numerical solution of larg Current Trilinos installation on ANSELM contains (among others) the following main packages -- **Epetra** - core linear algebra package containing classes for manipulation with serial and distributed vectors, matrices, and graphs. Dense linear solvers are supported via interface to BLAS and LAPACK (Intel MKL on ANSELM). Its extension **EpetraExt** contains e.g. methods for matrix-matrix multiplication. -- **Tpetra** - next-generation linear algebra package. Supports 64-bit indexing and arbitrary data type using C++ templates. -- **Belos** - library of various iterative solvers (CG, block CG, GMRES, block GMRES etc.). -- **Amesos** - interface to direct sparse solvers. -- **Anasazi** - framework for large-scale eigenvalue algorithms. -- **IFPACK** - distributed algebraic preconditioner (includes e.g. incomplete LU factorization) -- **Teuchos** - common tools packages. This package contains classes for memory management, output, performance monitoring, BLAS and LAPACK wrappers etc. +* **Epetra** - core linear algebra package containing classes for manipulation with serial and distributed vectors, matrices, and graphs. Dense linear solvers are supported via interface to BLAS and LAPACK (Intel MKL on ANSELM). Its extension **EpetraExt** contains e.g. methods for matrix-matrix multiplication. +* **Tpetra** - next-generation linear algebra package. Supports 64-bit indexing and arbitrary data type using C++ templates. +* **Belos** - library of various iterative solvers (CG, block CG, GMRES, block GMRES etc.). +* **Amesos** - interface to direct sparse solvers. +* **Anasazi** - framework for large-scale eigenvalue algorithms. +* **IFPACK** - distributed algebraic preconditioner (includes e.g. incomplete LU factorization) +* **Teuchos** - common tools packages. This package contains classes for memory management, output, performance monitoring, BLAS and LAPACK wrappers etc. For the full list of Trilinos packages, descriptions of their capabilities, and user manuals see [http://trilinos.sandia.gov.](http://trilinos.sandia.gov) diff --git a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md index 493eb91a3b8da7594a78c6a1186d7398f7cd0806..375d3732c504cd6d56d945aec1ce69711137efec 100644 --- a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md +++ b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md @@ -280,10 +280,9 @@ SAXPY function multiplies the vector x by the scalar alpha and adds it to the ve } ``` -!!! Note "Note" - Please note: cuBLAS has its own function for data transfers between CPU and GPU memory: - - - [cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) - transfers data from CPU to GPU memory +!!! note + cuBLAS has its own function for data transfers between CPU and GPU memory: + - [cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) - transfers data from CPU to GPU memory - [cublasGetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector) - transfers data from GPU to CPU memory To compile the code using NVCC compiler a "-lcublas" compiler flag has to be specified: diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md b/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md index 4db4854d169586a1850f0c28b858babc721e5fa9..c968fb56351fc78cfbcfb0576ccc0ef8063a898c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md +++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md @@ -95,13 +95,13 @@ BAM is the binary representation of SAM and keeps exactly the same information a Some features -- Quality control - - reads with N errors - - reads with multiple mappings - - strand bias - - paired-end insert -- Filtering: by number of errors, number of hits - - Comparator: stats, intersection, ... +* Quality control + * reads with N errors + * reads with multiple mappings + * strand bias + * paired-end insert +* Filtering: by number of errors, number of hits + * Comparator: stats, intersection, ... ** Input: ** BAM file. @@ -290,47 +290,47 @@ If we want to re-launch the pipeline from stage 4 until stage 20 we should use t The pipeline calls the following tools -- [fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), quality control tool for high throughput sequence data. -- [gatk](https://www.broadinstitute.org/gatk/), The Genome Analysis Toolkit or GATK is a software package developed at +* [fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), quality control tool for high throughput sequence data. +* [gatk](https://www.broadinstitute.org/gatk/), The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. -- [hpg-aligner](https://github.com/opencb-hpg/hpg-aligner), HPG Aligner has been designed to align short and long reads with high sensitivity, therefore any number of mismatches or indels are allowed. HPG Aligner implements and combines two well known algorithms: _Burrows-Wheeler Transform_ (BWT) to speed-up mapping high-quality reads, and _Smith-Waterman_> (SW) to increase sensitivity when reads cannot be mapped using BWT. -- [hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), a quality control tool for high throughput sequence data. -- [hpg-variant](http://docs.bioinfo.cipf.es/projects/hpg-variant/wiki), The HPG Variant suite is an ambitious project aimed to provide a complete suite of tools to work with genomic variation data, from VCF tools to variant profiling or genomic statistics. It is being implemented using High Performance Computing technologies to provide the best performance possible. -- [picard](http://picard.sourceforge.net/), Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (HTSJDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported. -- [samtools](http://samtools.sourceforge.net/samtools-c.shtml), SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. -- [snpEff](http://snpeff.sourceforge.net/), Genetic variant annotation and effect prediction toolbox. +* [hpg-aligner](https://github.com/opencb-hpg/hpg-aligner), HPG Aligner has been designed to align short and long reads with high sensitivity, therefore any number of mismatches or indels are allowed. HPG Aligner implements and combines two well known algorithms: _Burrows-Wheeler Transform_ (BWT) to speed-up mapping high-quality reads, and _Smith-Waterman_> (SW) to increase sensitivity when reads cannot be mapped using BWT. +* [hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), a quality control tool for high throughput sequence data. +* [hpg-variant](http://docs.bioinfo.cipf.es/projects/hpg-variant/wiki), The HPG Variant suite is an ambitious project aimed to provide a complete suite of tools to work with genomic variation data, from VCF tools to variant profiling or genomic statistics. It is being implemented using High Performance Computing technologies to provide the best performance possible. +* [picard](http://picard.sourceforge.net/), Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (HTSJDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported. +* [samtools](http://samtools.sourceforge.net/samtools-c.shtml), SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. +* [snpEff](http://snpeff.sourceforge.net/), Genetic variant annotation and effect prediction toolbox. This listing show which tools are used in each step of the pipeline -- stage-00: fastqc -- stage-01: hpg_fastq -- stage-02: fastqc -- stage-03: hpg_aligner and samtools -- stage-04: samtools -- stage-05: samtools -- stage-06: fastqc -- stage-07: picard -- stage-08: fastqc -- stage-09: picard -- stage-10: gatk -- stage-11: gatk -- stage-12: gatk -- stage-13: gatk -- stage-14: gatk -- stage-15: gatk -- stage-16: samtools -- stage-17: samtools -- stage-18: fastqc -- stage-19: gatk -- stage-20: gatk -- stage-21: gatk -- stage-22: gatk -- stage-23: gatk -- stage-24: hpg-variant -- stage-25: hpg-variant -- stage-26: snpEff -- stage-27: snpEff -- stage-28: hpg-variant +* stage-00: fastqc +* stage-01: hpg_fastq +* stage-02: fastqc +* stage-03: hpg_aligner and samtools +* stage-04: samtools +* stage-05: samtools +* stage-06: fastqc +* stage-07: picard +* stage-08: fastqc +* stage-09: picard +* stage-10: gatk +* stage-11: gatk +* stage-12: gatk +* stage-13: gatk +* stage-14: gatk +* stage-15: gatk +* stage-16: samtools +* stage-17: samtools +* stage-18: fastqc +* stage-19: gatk +* stage-20: gatk +* stage-21: gatk +* stage-22: gatk +* stage-23: gatk +* stage-24: hpg-variant +* stage-25: hpg-variant +* stage-26: snpEff +* stage-27: snpEff +* stage-28: hpg-variant ## Interpretation diff --git a/docs.it4i/anselm-cluster-documentation/software/openfoam.md b/docs.it4i/anselm-cluster-documentation/software/openfoam.md index d1b22d535ee900269a3f2d233a0dd4644e4ee297..e3509febc4636fb0b4058f1cac6ee510db3f5f14 100644 --- a/docs.it4i/anselm-cluster-documentation/software/openfoam.md +++ b/docs.it4i/anselm-cluster-documentation/software/openfoam.md @@ -22,10 +22,10 @@ Naming convection of the installed versions is following: openfoam\<VERSION\>-\<COMPILER\>\<openmpiVERSION\>-\<PRECISION\> -- \<VERSION\> - version of openfoam -- \<COMPILER\> - version of used compiler -- \<openmpiVERSION\> - version of used openmpi/impi -- \<PRECISION\> - DP/SP – double/single precision +* \<VERSION\> - version of openfoam +* \<COMPILER\> - version of used compiler +* \<openmpiVERSION\> - version of used openmpi/impi +* \<PRECISION\> - DP/SP – double/single precision ### Available OpenFOAM Modules @@ -57,8 +57,8 @@ To create OpenFOAM environment on ANSELM give the commands: $ source $FOAM_BASHRC ``` -!!! Note "Note" - Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SP”. +!!! note + Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SP”. Create a project directory within the $HOME/OpenFOAM directory named \<USER\>-\<OFversion\> and create a directory named run within it, e.g. by typing: @@ -120,8 +120,8 @@ Run the second case for example external incompressible turbulent flow - case - First we must run serial application bockMesh and decomposePar for preparation of parallel computation. -!!! Note "Note" - Create a Bash scrip test.sh: +!!! note + Create a Bash scrip test.sh: ```bash #!/bin/bash @@ -145,8 +145,8 @@ Job submission This job create simple block mesh and domain decomposition. Check your decomposition, and submit parallel computation: -!!! Note "Note" - Create a PBS script testParallel.pbs: +!!! note + Create a PBS script testParallel.pbs: ```bash #!/bin/bash diff --git a/docs.it4i/anselm-cluster-documentation/storage.md b/docs.it4i/anselm-cluster-documentation/storage.md index 7c3b9ef7404c00248a0ef970b68ad7931663328b..436e6141781c3d7d207e7a30718be58db8995316 100644 --- a/docs.it4i/anselm-cluster-documentation/storage.md +++ b/docs.it4i/anselm-cluster-documentation/storage.md @@ -26,8 +26,8 @@ There is default stripe configuration for Anselm Lustre filesystems. However, us 2. stripe_count the number of OSTs to stripe across; default is 1 for Anselm Lustre filesystems one can specify -1 to use all OSTs in the filesystem. 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. -!!! Note "Note" - Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. +!!! note + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -60,15 +60,15 @@ $ man lfs ### Hints on Lustre Stripping -!!! Note "Note" - Increase the stripe_count for parallel I/O to the same file. +!!! note + Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. -!!! Note "Note" - Using a large stripe size can improve performance when accessing very large files +!!! note + Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -80,30 +80,30 @@ The architecture of Lustre on Anselm is composed of two metadata servers (MDS) Configuration of the storages -- HOME Lustre object storage - - One disk array NetApp E5400 - - 22 OSTs - - 227 2TB NL-SAS 7.2krpm disks - - 22 groups of 10 disks in RAID6 (8+2) - - 7 hot-spare disks -- SCRATCH Lustre object storage - - Two disk arrays NetApp E5400 - - 10 OSTs - - 106 2TB NL-SAS 7.2krpm disks - - 10 groups of 10 disks in RAID6 (8+2) - - 6 hot-spare disks -- Lustre metadata storage - - One disk array NetApp E2600 - - 12 300GB SAS 15krpm disks - - 2 groups of 5 disks in RAID5 - - 2 hot-spare disks +* HOME Lustre object storage + * One disk array NetApp E5400 + * 22 OSTs + * 227 2TB NL-SAS 7.2krpm disks + * 22 groups of 10 disks in RAID6 (8+2) + * 7 hot-spare disks +* SCRATCH Lustre object storage + * Two disk arrays NetApp E5400 + * 10 OSTs + * 106 2TB NL-SAS 7.2krpm disks + * 10 groups of 10 disks in RAID6 (8+2) + * 6 hot-spare disks +* Lustre metadata storage + * One disk array NetApp E2600 + * 12 300GB SAS 15krpm disks + * 2 groups of 5 disks in RAID5 + * 2 hot-spare disks \###HOME The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 320TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 250GB per user. If 250GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" - The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. +!!! note + The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME filesystem should not be used to archive data of past Projects or other unrelated data. @@ -114,8 +114,8 @@ The filesystem is backed up, such that it can be restored in case of catasthropi The HOME filesystem is realized as Lustre parallel filesystem and is available on all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for the HOME filesystem. -!!! Note "Note" - Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. +!!! note + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | HOME filesystem | | | -------------------- | ------ | @@ -131,8 +131,8 @@ Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for t The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" - The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. +!!! note + The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. >Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. @@ -140,8 +140,8 @@ The SCRATCH filesystem is mounted in directory /scratch. Users may freely create The SCRATCH filesystem is realized as Lustre parallel filesystem and is available from all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 10 OSTs dedicated for the SCRATCH filesystem. -!!! Note "Note" - Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. +!!! note + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | SCRATCH filesystem | | | -------------------- | -------- | @@ -260,8 +260,8 @@ Default ACL mechanism can be used to replace setuid/setgid permissions on direct ### Local Scratch -!!! Note "Note" - Every computational node is equipped with 330GB local scratch disk. +!!! note + Every computational node is equipped with 330GB local scratch disk. Use local scratch in case you need to access large amount of small files during your calculation. @@ -269,8 +269,8 @@ The local scratch disk is mounted as /lscratch and is accessible to user at /lsc The local scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs that access large number of small files within the calculation must use the local scratch filesystem as their working directory. This is required for performance reasons, as frequent access to number of small files may overload the metadata servers (MDS) of the Lustre filesystem. -!!! Note "Note" - The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. +!!! note + The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | local SCRATCH filesystem | | | ------------------------ | -------------------- | @@ -284,15 +284,15 @@ The local scratch filesystem is intended for temporary scratch data generated d Every computational node is equipped with filesystem realized in memory, so called RAM disk. -!!! Note "Note" - Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. +!!! note + Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. Size of RAM disk filesystem is limited. Be very careful, use of RAM disk filesystem is at the expense of operational memory. It is not recommended to allocate large amount of memory and use large amount of data in RAM disk filesystem at the same time. -!!! Note "Note" - The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. +!!! note + The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | RAM disk | | | ----------- | ------------------------------------------------------------------------------------------------------- | @@ -320,8 +320,8 @@ Each node is equipped with local /tmp directory of few GB capacity. The /tmp dir Do not use shared filesystems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. -!!! Note "Note" - The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). +!!! note + The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -339,15 +339,15 @@ The procedure to obtain the CESNET access is quick and trouble-free. ### Understanding CESNET Storage -!!! Note "Note" - It is very important to understand the CESNET storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. +!!! note + It is very important to understand the CESNET storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access -!!! Note "Note" - SSHFS: The storage will be mounted like a local hard drive +!!! note + SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -391,8 +391,8 @@ Once done, please remember to unmount the storage ### Rsync Access -!!! Note "Note" - Rsync provides delta transfer for best performance, can resume interrupted transfers +!!! note + Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md index 5ed9f564ccc3d206b2d6e3e929f35448ef2ddc1e..7d243fc01535188dfc754a950c09ed78204146b9 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md @@ -6,7 +6,7 @@ The recommended clients are [TightVNC](http://www.tightvnc.com) or [TigerVNC](ht ## Create VNC Password -!!! Note "Note" +!!! note Local VNC password should be set before the first login. Do use a strong password. ```bash @@ -17,7 +17,7 @@ Verify: ## Start Vncserver -!!! Note "Note" +!!! note To access VNC a local vncserver must be started first and also a tunnel using SSH port forwarding must be established. [See below](vnc.md#linux-example-of-creating-a-tunnel) for the details on SSH tunnels. In this example we use port 61. @@ -63,7 +63,7 @@ username 10296 0.0 0.0 131772 21076 pts/29 SN 13:01 0:01 /usr/bin/Xvn To access the VNC server you have to create a tunnel between the login node using TCP **port 5961** and your machine using a free TCP port (for simplicity the very same, in this case). -!!! Note "Note" +!!! note The tunnel must point to the same login node where you launched the VNC server, eg. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin. ## Linux/Mac OS Example of Creating a Tunnel diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md index 94daef1f97113bcd0a536b3419731fb5ef26859b..9c1d75b807e8c1e7fa62da076875749d695ef045 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md @@ -52,7 +52,7 @@ Read more on [http://www.math.umn.edu/systems_guide/putty_xwin32.html](http://ww ## Running GUI Enabled Applications -!!! Note "Note" +!!! note Make sure that X forwarding is activated and the X server is running. Then launch the application as usual. Use the & to run the application in background. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md index ed4dd2c5e9023776f1042e6590f8bc24b1759d3c..75aca80cb0cbffea7d3d22a883e7ee14ae2d8cae 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md @@ -2,7 +2,7 @@ The IT4Innovations clusters are accessed by SSH protocol via login nodes. -!!! Note "Note" +!!! note Read more on [Accessing the Salomon Cluster](../../salomon/shell-and-data-access.md) or [Accessing the Anselm Cluster](../../anselm-cluster-documentation/shell-and-data-access.md) pages. ## PuTTY diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md index 0ab1e6ef705f39e4437397c58d86d3c150fd0f63..e4faa9fc2bddc63c4d708483e5960e52c13145ec 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md @@ -4,7 +4,7 @@ We recommned you to download "**A Windows installer for everything except PuTTYtel**" with **Pageant** (SSH authentication agent) and **PuTTYgen** (PuTTY key generator) which is available [here](http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html). -!!! Note "Note" +!!! note After installation you can proceed directly to private keys authentication using ["Putty"](putty#putty). "Change Password for Existing Private Key" is optional. @@ -15,43 +15,43 @@ We recommned you to download "**A Windows installer for everything except PuTTYt ## PuTTY - How to Connect to the IT4Innovations Cluster -- Run PuTTY -- Enter Host name and Save session fields with [Login address](../../../salomon/shell-and-data-access.md) and browse Connection - SSH - Auth menu. The _Host Name_ input may be in the format **"username@clustername.it4i.cz"** so you don't have to type your login each time.In this example we will connect to the Salomon cluster using **"salomon.it4i.cz"**. +* Run PuTTY +* Enter Host name and Save session fields with [Login address](../../../salomon/shell-and-data-access.md) and browse Connection - SSH - Auth menu. The _Host Name_ input may be in the format **"username@clustername.it4i.cz"** so you don't have to type your login each time.In this example we will connect to the Salomon cluster using **"salomon.it4i.cz"**.  -- Category - Connection - SSH - Auth: +* Category - Connection - SSH - Auth: Select Attempt authentication using Pageant. Select Allow agent forwarding. Browse and select your [private key](ssh-keys/) file.  -- Return to Session page and Save selected configuration with _Save_ button. +* Return to Session page and Save selected configuration with _Save_ button.  -- Now you can log in using _Open_ button. +* Now you can log in using _Open_ button.  -- Enter your username if the _Host Name_ input is not in the format "username@salomon.it4i.cz". -- Enter passphrase for selected [private key](ssh-keys/) file if Pageant **SSH authentication agent is not used.** +* Enter your username if the _Host Name_ input is not in the format "username@salomon.it4i.cz". +* Enter passphrase for selected [private key](ssh-keys/) file if Pageant **SSH authentication agent is not used.** ## Another PuTTY Settings -- Category - Windows - Translation - Remote character set and select **UTF-8**. -- Category - Terminal - Features and select **Disable application keypad mode** (enable numpad) -- Save your configuration on Session page in to Default Settings with _Save_ button. +* Category - Windows - Translation - Remote character set and select **UTF-8**. +* Category - Terminal - Features and select **Disable application keypad mode** (enable numpad) +* Save your configuration on Session page in to Default Settings with _Save_ button. ## Pageant SSH Agent Pageant holds your private key in memory without needing to retype a passphrase on every login. -- Run Pageant. -- On Pageant Key List press _Add key_ and select your private key (id_rsa.ppk). -- Enter your passphrase. -- Now you have your private key in memory without needing to retype a passphrase on every login. +* Run Pageant. +* On Pageant Key List press _Add key_ and select your private key (id_rsa.ppk). +* Enter your passphrase. +* Now you have your private key in memory without needing to retype a passphrase on every login.  @@ -63,11 +63,11 @@ PuTTYgen is the PuTTY key generator. You can load in an existing private key and You can change the password of your SSH key with "PuTTY Key Generator". Make sure to backup the key. -- Load your [private key](../shell-access-and-data-transfer/ssh-keys/) file with _Load_ button. -- Enter your current passphrase. -- Change key passphrase. -- Confirm key passphrase. -- Save your private key with _Save private key_ button. +* Load your [private key](../shell-access-and-data-transfer/ssh-keys/) file with _Load_ button. +* Enter your current passphrase. +* Change key passphrase. +* Confirm key passphrase. +* Save your private key with _Save private key_ button.  @@ -75,33 +75,33 @@ You can change the password of your SSH key with "PuTTY Key Generator". Make sur You can generate an additional public/private key pair and insert public key into authorized_keys file for authentication with your own private key. -- Start with _Generate_ button. +* Start with _Generate_ button.  -- Generate some randomness. +* Generate some randomness.  -- Wait. +* Wait.  -- Enter a _comment_ for your key using format 'username@organization.example.com'. +* Enter a _comment_ for your key using format 'username@organization.example.com'. Enter key passphrase. Confirm key passphrase. Save your new private key in "_.ppk" format with _Save private key\* button.  -- Save the public key with _Save public key_ button. +* Save the public key with _Save public key_ button. You can copy public key out of the â€Public key for pasting into authorized_keys file’ box.  -- Export private key in OpenSSH format "id_rsa" using Conversion - Export OpenSSH key +* Export private key in OpenSSH format "id_rsa" using Conversion - Export OpenSSH key  -- Now you can insert additional public key into authorized_keys file for authentication with your own private key. +* Now you can insert additional public key into authorized_keys file for authentication with your own private key. You must log in using ssh key received after registration. Then proceed to [How to add your own key](../shell-access-and-data-transfer/ssh-keys/). diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md index ec5b7ffb4c6e7264d9cef6a8666e40943b04e9ee..85ef36b73ec669306fcc3753509d7a612a619813 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md @@ -16,14 +16,14 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys -rw-r--r-- 1 username username 392 May 21 2014 id_rsa.pub ``` -!!! Hint +!!! hint Private keys in .ssh directory are without passphrase and allow you to connect within the cluster. ## Access Privileges on .ssh Folder -- .ssh directory: 700 (drwx------) -- Authorized_keys, known_hosts and public key (.pub file): 644 (-rw-r--r--) -- Private key (id_rsa/id_rsa.ppk): 600 (-rw-------) +* .ssh directory: 700 (drwx------) +* Authorized_keys, known_hosts and public key (.pub file): 644 (-rw-r--r--) +* Private key (id_rsa/id_rsa.ppk): 600 (-rw-------) ```bash cd /home/username/ @@ -37,7 +37,7 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys ## Private Key -!!! Note "Note" +!!! note The path to a private key is usually /home/username/.ssh/ Private key file in "id_rsa" or `*.ppk` format is used to authenticate with the servers. Private key is present locally on local side and used for example in SSH agent Pageant (for Windows users). The private key should always be kept in a safe place. @@ -92,7 +92,7 @@ First, generate a new keypair of your public and private key: local $ ssh-keygen -C 'username@organization.example.com' -f additional_key ``` -!!! Note "Note" +!!! note Please, enter **strong** **passphrase** for securing your private key. You can insert additional public key into authorized_keys file for authentication with your own private key. Additional records in authorized_keys file must be delimited by new line. Users are not advised to remove the default public key from authorized_keys file. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md index f6c526e40b8dce0f55b6e44264476658734f8a22..0de78fa9f87ddffc9f5ec75dd0e4292a0c9b3fa9 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md @@ -6,12 +6,12 @@ AnyConnect users on Windows 8.1 will receive a "Failed to initialize connection ## Workaround -- Close the Cisco AnyConnect Window and the taskbar mini-icon -- Right click vpnui.exe in the 'Cisco AnyConnect Secure Mobility Client' folder. (C:Program Files (x86)CiscoCisco AnyConnect Secure Mobility Client) -- Click on the 'Run compatibility troubleshooter' button -- Choose 'Try recommended settings' -- The wizard suggests Windows 8 compatibility. -- Click 'Test Program'. This will open the program. -- Close +* Close the Cisco AnyConnect Window and the taskbar mini-icon +* Right click vpnui.exe in the 'Cisco AnyConnect Secure Mobility Client' folder. (C:Program Files (x86)CiscoCisco AnyConnect Secure Mobility Client) +* Click on the 'Run compatibility troubleshooter' button +* Choose 'Try recommended settings' +* The wizard suggests Windows 8 compatibility. +* Click 'Test Program'. This will open the program. +* Close  diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn-access.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn-access.md index 6c1e908d908e3124248255831e85f105b203a76d..d33b2e20b89686147f0b850c31c17748e762d32c 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn-access.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn-access.md @@ -4,12 +4,12 @@ For using resources and licenses which are located at IT4Innovations local network, it is necessary to VPN connect to this network. We use Cisco AnyConnect Secure Mobility Client, which is supported on the following operating systems: -- Windows XP -- Windows Vista -- Windows 7 -- Windows 8 -- Linux -- MacOS +* Windows XP +* Windows Vista +* Windows 7 +* Windows 8 +* Linux +* MacOS It is impossible to connect to VPN from other operating systems. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md index d1b5cdb19072e04b39f2de6d58d87ec17fdff777..e0a21d4700a16011afb67d8ef1b19983f6ef8db2 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md @@ -2,19 +2,19 @@ ## Accessing IT4Innovations Internal Resources via VPN -!!! Note "Note" +!!! note **Failed to initialize connection subsystem Win 8.1 - 02-10-15 MS patch** Workaround can be found at [vpn-connection-fail-in-win-8.1](../../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.html) For using resources and licenses which are located at IT4Innovations local network, it is necessary to VPN connect to this network. We use Cisco AnyConnect Secure Mobility Client, which is supported on the following operating systems: -- Windows XP -- Windows Vista -- Windows 7 -- Windows 8 -- Linux -- MacOS +* Windows XP +* Windows Vista +* Windows 7 +* Windows 8 +* Linux +* MacOS It is impossible to connect to VPN from other operating systems. diff --git a/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md b/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md index b4d4ceda0b0265fef832d8e5298f462fb4074acb..c0d9c23231a434c803f8718ab853dfa0c54b8392 100644 --- a/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md +++ b/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md @@ -8,10 +8,10 @@ IT4Innovations employs X.509 certificates for secure communication (e. g. creden There are different kinds of certificates, each with a different scope of use. We mention here: -- User (Private) certificates -- Certificate Authority (CA) certificates -- Host certificates -- Service certificates +* User (Private) certificates +* Certificate Authority (CA) certificates +* Host certificates +* Service certificates However, users need only manage User and CA certificates. Note that your user certificate is protected by an associated private key, and this **private key must never be disclosed**. diff --git a/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md b/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md index bc3a4c49b270d9c9f979d2c043dbd68025a05fff..7213ec31c1fc03b7cd770cb35c4cd77070ecdbb8 100644 --- a/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md +++ b/docs.it4i/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md @@ -24,8 +24,8 @@ This is a preferred way of granting access to project resources. Please, use thi Log in to the [IT4I Extranet portal](https://extranet.it4i.cz) using IT4I credentials and go to the **Projects** section. -- **Users:** Please, submit your requests for becoming a project member. -- **Primary Investigators:** Please, approve or deny users' requests in the same section. +* **Users:** Please, submit your requests for becoming a project member. +* **Primary Investigators:** Please, approve or deny users' requests in the same section. ## Authorization by E-Mail (An Alternative Approach) @@ -120,7 +120,7 @@ We accept personal certificates issued by any widely respected certification aut Certificate generation process is well-described here: -- [How to generate a personal TCS certificate in Mozilla Firefox web browser (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-gen) +* [How to generate a personal TCS certificate in Mozilla Firefox web browser (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-gen) A FAQ about certificates can be found here: [Certificates FAQ](certificates-faq/). @@ -128,19 +128,19 @@ A FAQ about certificates can be found here: [Certificates FAQ](certificates-faq/ Follow these steps **only** if you can not obtain your certificate in a standard way. In case you choose this procedure, please attach a **scan of photo ID** (personal ID or passport or drivers license) when applying for [login credentials](obtaining-login-credentials/#the-login-credentials). -- Go to [CAcert](www.cacert.org). - - If there's a security warning, just acknowledge it. -- Click _Join_. -- Fill in the form and submit it by the _Next_ button. - - Type in the e-mail address which you use for communication with us. - - Don't forget your chosen _Pass Phrase_. -- You will receive an e-mail verification link. Follow it. -- After verifying, go to the CAcert's homepage and login using _Password Login_. -- Go to _Client Certificates_ _New_. -- Tick _Add_ for your e-mail address and click the _Next_ button. -- Click the _Create Certificate Request_ button. -- You'll be redirected to a page from where you can download/install your certificate. - - Simultaneously you'll get an e-mail with a link to the certificate. +* Go to [CAcert](www.cacert.org). + * If there's a security warning, just acknowledge it. +* Click _Join_. +* Fill in the form and submit it by the _Next_ button. + * Type in the e-mail address which you use for communication with us. + * Don't forget your chosen _Pass Phrase_. +* You will receive an e-mail verification link. Follow it. +* After verifying, go to the CAcert's homepage and login using _Password Login_. +* Go to _Client Certificates_ _New_. +* Tick _Add_ for your e-mail address and click the _Next_ button. +* Click the _Create Certificate Request_ button. +* You'll be redirected to a page from where you can download/install your certificate. + * Simultaneously you'll get an e-mail with a link to the certificate. ## Installation of the Certificate Into Your Mail Client @@ -148,13 +148,13 @@ The procedure is similar to the following guides: MS Outlook 2010 -- [How to Remove, Import, and Export Digital certificates](http://support.microsoft.com/kb/179380) -- [Importing a PKCS #12 certificate (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/outl-cert-imp) +* [How to Remove, Import, and Export Digital certificates](http://support.microsoft.com/kb/179380) +* [Importing a PKCS #12 certificate (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/outl-cert-imp) Mozilla Thudnerbird -- [Installing an SMIME certificate](http://kb.mozillazine.org/Installing_an_SMIME_certificate) -- [Importing a PKCS #12 certificate (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-imp) +* [Installing an SMIME certificate](http://kb.mozillazine.org/Installing_an_SMIME_certificate) +* [Importing a PKCS #12 certificate (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-imp) ## End of User Account Lifecycle @@ -162,8 +162,8 @@ User accounts are supported by membership in active Project(s) or by affiliation User will get 3 automatically generated warning e-mail messages of the pending removal:. -- First message will be sent 3 months before the removal -- Second message will be sent 1 month before the removal -- Third message will be sent 1 week before the removal. +* First message will be sent 3 months before the removal +* Second message will be sent 1 month before the removal +* Third message will be sent 1 week before the removal. The messages will inform about the projected removal date and will challenge the user to migrate her/his data diff --git a/docs.it4i/index.md b/docs.it4i/index.md index 5041509bcaadee45eb218c3eb45f61542d4ee2c5..958b9bc77f31faec1b88ba25ea38eeea25d28233 100644 --- a/docs.it4i/index.md +++ b/docs.it4i/index.md @@ -17,29 +17,29 @@ Use your IT4Innotations username and password to log in to the [support](http:// ## Required Proficiency -!!! Note "Note" +!!! note You need basic proficiency in Linux environment. In order to use the system for your calculations, you need basic proficiency in Linux environment. To gain the proficiency, we recommend you reading the [introduction to Linux](http://www.tldp.org/LDP/intro-linux/html/) operating system environment and installing a Linux distribution on your personal computer. A good choice might be the [CentOS](http://www.centos.org/) distribution, as it is similar to systems on the clusters at IT4Innovations. It's easy to install and use. In fact, any distribution would do. -!!! Note "Note" +!!! note Learn how to parallelize your code! In many cases, you will run your own code on the cluster. In order to fully exploit the cluster, you will need to carefully consider how to utilize all the cores available on the node and how to use multiple nodes at the same time. You need to **parallelize** your code. Proficieny in MPI, OpenMP, CUDA, UPC or GPI2 programming may be gained via the [training provided by IT4Innovations.](http://prace.it4i.cz) ## Terminology Frequently Used on These Pages -- **node:** a computer, interconnected by network to other computers - Computational nodes are powerful computers, designed and dedicated for executing demanding scientific computations. -- **core:** processor core, a unit of processor, executing computations -- **corehours:** wall clock hours of processor core time - Each node is equipped with **X** processor cores, provides **X** corehours per 1 wall clock hour. -- **job:** a calculation running on the supercomputer - The job allocates and utilizes resources of the supercomputer for certain time. -- **HPC:** High Performance Computing -- **HPC (computational) resources:** corehours, storage capacity, software licences -- **code:** a program -- **primary investigator (PI):** a person responsible for execution of computational project and utilization of computational resources allocated to that project -- **collaborator:** a person participating on execution of computational project and utilization of computational resources allocated to that project -- **project:** a computational project under investigation by the PI - The project is identified by the project ID. The computational resources are allocated and charged per project. -- **jobscript:** a script to be executed by the PBS Professional workload manager +* **node:** a computer, interconnected by network to other computers - Computational nodes are powerful computers, designed and dedicated for executing demanding scientific computations. +* **core:** processor core, a unit of processor, executing computations +* **corehours:** wall clock hours of processor core time - Each node is equipped with **X** processor cores, provides **X** corehours per 1 wall clock hour. +* **job:** a calculation running on the supercomputer - The job allocates and utilizes resources of the supercomputer for certain time. +* **HPC:** High Performance Computing +* **HPC (computational) resources:** corehours, storage capacity, software licences +* **code:** a program +* **primary investigator (PI):** a person responsible for execution of computational project and utilization of computational resources allocated to that project +* **collaborator:** a person participating on execution of computational project and utilization of computational resources allocated to that project +* **project:** a computational project under investigation by the PI - The project is identified by the project ID. The computational resources are allocated and charged per project. +* **jobscript:** a script to be executed by the PBS Professional workload manager ## Conventions diff --git a/docs.it4i/pbspro.md b/docs.it4i/pbspro.md index 9dd4ccdab63753ad2fe475a90a857e9c50cdd4df..e89ddfe72d54ff6b0e3fce2ab53f47bb2c6bbac5 100644 --- a/docs.it4i/pbspro.md +++ b/docs.it4i/pbspro.md @@ -1,4 +1,4 @@ -- [PBS Pro Programmer's Guide](http://www.pbsworks.com/pdfs/PBSProgramGuide13.0.pdf) -- [PBS Pro Quick Start Guide](http://www.pbsworks.com/pdfs/PBSQuickStartGuide13.0.pdf) -- [PBS Pro Reference Guide](http://www.pbsworks.com/pdfs/PBSReferenceGuide13.0.pdf) -- [PBS Pro User's Guide](http://www.pbsworks.com/pdfs/PBSUserGuide13.0.pdf) +* [PBS Pro Programmer's Guide](http://www.pbsworks.com/pdfs/PBSProgramGuide13.0.pdf) +* [PBS Pro Quick Start Guide](http://www.pbsworks.com/pdfs/PBSQuickStartGuide13.0.pdf) +* [PBS Pro Reference Guide](http://www.pbsworks.com/pdfs/PBSReferenceGuide13.0.pdf) +* [PBS Pro User's Guide](http://www.pbsworks.com/pdfs/PBSUserGuide13.0.pdf) diff --git a/docs.it4i/salomon/capacity-computing.md b/docs.it4i/salomon/capacity-computing.md index 90dfc25cdfdec3146e81208fbaf7730d5826a54e..c5ae6b385bbe260340d5e69257f0d3d0854ee40a 100644 --- a/docs.it4i/salomon/capacity-computing.md +++ b/docs.it4i/salomon/capacity-computing.md @@ -6,12 +6,12 @@ In many cases, it is useful to submit huge (100+) number of computational jobs i However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1500 per job array** -!!! Note "Note" - Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. +!!! note + Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. -- Use [Job arrays](capacity-computing.md#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs -- Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs -- Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs +* Use [Job arrays](capacity-computing.md#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs +* Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs +* Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs ## Policy @@ -20,14 +20,14 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays -!!! Note "Note" - Huge number of jobs may be easily submitted and managed as a job array. +!!! note + Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: -- each subjob has a unique index, $PBS_ARRAY_INDEX -- job Identifiers of subjobs only differ by their indices -- the state of subjobs can differ (R,Q,...etc.) +* each subjob has a unique index, $PBS_ARRAY_INDEX +* job Identifiers of subjobs only differ by their indices +* the state of subjobs can differ (R,Q,...etc.) All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job. @@ -151,8 +151,8 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel -!!! Note "Note" - Use GNU parallel to run many single core tasks on one node. +!!! note + Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. @@ -218,17 +218,18 @@ $ qsub -N JOBNAME jobscript In this example, we submit a job of 101 tasks. 24 input files will be processed in parallel. The 101 tasks on 24 cores are assumed to complete in less than 2 hours. -Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. +!!! note + Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Job Arrays and GNU Parallel -!!! Note "Note" - Combine the Job arrays and GNU parallel for best throughput of single core jobs +!!! note + Combine the Job arrays and GNU parallel for best throughput of single core jobs While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. -!!! Note "Note" - Every subjob in an array runs GNU parallel to utilize all cores on the node +!!! note + Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU Parallel, Shared jobscript @@ -282,8 +283,8 @@ cp output $PBS_O_WORKDIR/$TASK.out In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. -!!! Note "Note" - Select subjob walltime and number of tasks per subjob carefully +!!! note + Select subjob walltime and number of tasks per subjob carefully When deciding this values, think about following guiding rules : @@ -302,7 +303,8 @@ $ qsub -N JOBNAME -J 1-992:32 jobscript In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**48**, this must be the same as the number sent to numtasks file. Each subjob will run on full node and process 24 input files in parallel, 48 in total per subjob. Every subjob is assumed to complete in less than 2 hours. -Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. +!!! note + Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Examples diff --git a/docs.it4i/salomon/compute-nodes.md b/docs.it4i/salomon/compute-nodes.md index ddcc4dc559013716f1e6eb30e85a011fb6940ebe..83bca5c4045e93800fe922accb9882508f0aa0b1 100644 --- a/docs.it4i/salomon/compute-nodes.md +++ b/docs.it4i/salomon/compute-nodes.md @@ -9,22 +9,22 @@ Compute nodes with MIC accelerator **contains two Intel Xeon Phi 7120P accelerat ### Compute Nodes Without Accelerator -- codename "grafton" -- 576 nodes -- 13 824 cores in total -- two Intel Xeon E5-2680v3, 12-core, 2.5 GHz processors per node -- 128 GB of physical memory per node +* codename "grafton" +* 576 nodes +* 13 824 cores in total +* two Intel Xeon E5-2680v3, 12-core, 2.5 GHz processors per node +* 128 GB of physical memory per node  ### Compute Nodes With MIC Accelerator -- codename "perrin" -- 432 nodes -- 10 368 cores in total -- two Intel Xeon E5-2680v3, 12-core, 2.5 GHz processors per node -- 128 GB of physical memory per node -- MIC accelerator 2 x Intel Xeon Phi 7120P per node, 61-cores, 16 GB per accelerator +* codename "perrin" +* 432 nodes +* 10 368 cores in total +* two Intel Xeon E5-2680v3, 12-core, 2.5 GHz processors per node +* 128 GB of physical memory per node +* MIC accelerator 2 x Intel Xeon Phi 7120P per node, 61-cores, 16 GB per accelerator  @@ -34,12 +34,12 @@ Compute nodes with MIC accelerator **contains two Intel Xeon Phi 7120P accelerat ### Uv 2000 -- codename "UV2000" -- 1 node -- 112 cores in total -- 14 x Intel Xeon E5-4627v2, 8-core, 3.3 GHz processors, in 14 NUMA nodes -- 3328 GB of physical memory per node -- 1 x NVIDIA GM200 (GeForce GTX TITAN X), 12 GB RAM +* codename "UV2000" +* 1 node +* 112 cores in total +* 14 x Intel Xeon E5-4627v2, 8-core, 3.3 GHz processors, in 14 NUMA nodes +* 3328 GB of physical memory per node +* 1 x NVIDIA GM200 (GeForce GTX TITAN X), 12 GB RAM  @@ -57,22 +57,22 @@ Salomon is equipped with Intel Xeon processors Intel Xeon E5-2680v3. Processors ### Intel Xeon E5-2680v3 Processor -- 12-core -- speed: 2.5 GHz, up to 3.3 GHz using Turbo Boost Technology -- peak performance: 19.2 GFLOP/s per core -- caches: - - Intel® Smart Cache: 30 MB -- memory bandwidth at the level of the processor: 68 GB/s +* 12-core +* speed: 2.5 GHz, up to 3.3 GHz using Turbo Boost Technology +* peak performance: 19.2 GFLOP/s per core +* caches: + * Intel® Smart Cache: 30 MB +* memory bandwidth at the level of the processor: 68 GB/s ### MIC Accelerator Intel Xeon Phi 7120P Processor -- 61-core -- speed: 1.238 +* 61-core +* speed: 1.238 GHz, up to 1.333 GHz using Turbo Boost Technology -- peak performance: 18.4 GFLOP/s per core -- caches: - - L2: 30.5 MB -- memory bandwidth at the level of the processor: 352 GB/s +* peak performance: 18.4 GFLOP/s per core +* caches: + * L2: 30.5 MB +* memory bandwidth at the level of the processor: 352 GB/s ## Memory Architecture @@ -80,28 +80,28 @@ Memory is equally distributed across all CPUs and cores for optimal performance. ### Compute Node Without Accelerator -- 2 sockets -- Memory Controllers are integrated into processors. - - 8 DDR4 DIMMs per node - - 4 DDR4 DIMMs per CPU - - 1 DDR4 DIMMs per channel -- Populated memory: 8 x 16 GB DDR4 DIMM >2133 MHz +* 2 sockets +* Memory Controllers are integrated into processors. + * 8 DDR4 DIMMs per node + * 4 DDR4 DIMMs per CPU + * 1 DDR4 DIMMs per channel +* Populated memory: 8 x 16 GB DDR4 DIMM >2133 MHz ### Compute Node With MIC Accelerator 2 sockets Memory Controllers are integrated into processors. -- 8 DDR4 DIMMs per node -- 4 DDR4 DIMMs per CPU -- 1 DDR4 DIMMs per channel +* 8 DDR4 DIMMs per node +* 4 DDR4 DIMMs per CPU +* 1 DDR4 DIMMs per channel Populated memory: 8 x 16 GB DDR4 DIMM 2133 MHz MIC Accelerator Intel Xeon Phi 7120P Processor -- 2 sockets -- Memory Controllers are are connected via an +* 2 sockets +* Memory Controllers are are connected via an Interprocessor Network (IPN) ring. - - 16 GDDR5 DIMMs per node - - 8 GDDR5 DIMMs per CPU - - 2 GDDR5 DIMMs per channel + * 16 GDDR5 DIMMs per node + * 8 GDDR5 DIMMs per CPU + * 2 GDDR5 DIMMs per channel diff --git a/docs.it4i/salomon/environment-and-modules.md b/docs.it4i/salomon/environment-and-modules.md index 0a452049bd61f9913449ee45b0c951873777cec9..a9a6def4dfeb499e8daf6ad3cd8fc8bd707d7d91 100644 --- a/docs.it4i/salomon/environment-and-modules.md +++ b/docs.it4i/salomon/environment-and-modules.md @@ -23,8 +23,8 @@ then fi ``` -!!! Note "Note" - Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example. +!!! note + Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example. ### Application Modules @@ -56,8 +56,8 @@ Application modules on Salomon cluster are built using [EasyBuild](http://hpcuge vis: Visualization, plotting, documentation and typesetting ``` -!!! Note "Note" - The modules set up the application paths, library paths and environment variables for running particular application. +!!! note + The modules set up the application paths, library paths and environment variables for running particular application. The modules may be loaded, unloaded and switched, according to momentary needs. @@ -107,9 +107,9 @@ The EasyBuild framework prepares the build environment for the different toolcha Recent releases of EasyBuild include out-of-the-box toolchain support for: -- various compilers, including GCC, Intel, Clang, CUDA -- common MPI libraries, such as Intel MPI, MPICH, MVAPICH2, Open MPI -- various numerical libraries, including ATLAS, Intel MKL, OpenBLAS, ScaLAPACK, FFTW +* various compilers, including GCC, Intel, Clang, CUDA +* common MPI libraries, such as Intel MPI, MPICH, MVAPICH2, Open MPI +* various numerical libraries, including ATLAS, Intel MKL, OpenBLAS, ScaLAPACK, FFTW On Salomon, we have currently following toolchains installed: diff --git a/docs.it4i/salomon/ib-single-plane-topology.md b/docs.it4i/salomon/ib-single-plane-topology.md index 7ba5d80bc336e41ee5823fde72ea5807f0bd40b2..9456b83b37aaa5ae54b9a97b49e3019eabc09bda 100644 --- a/docs.it4i/salomon/ib-single-plane-topology.md +++ b/docs.it4i/salomon/ib-single-plane-topology.md @@ -4,9 +4,9 @@ A complete M-Cell assembly consists of four compute racks. Each rack contains 4 The SGI ICE X IB Premium Blade provides the first level of interconnection via dual 36-port Mellanox FDR InfiniBand ASIC switch with connections as follows: -- 9 ports from each switch chip connect to the unified backplane, to connect the 18 compute node slots -- 3 ports on each chip provide connectivity between the chips -- 24 ports from each switch chip connect to the external bulkhead, for a total of 48 +* 9 ports from each switch chip connect to the unified backplane, to connect the 18 compute node slots +* 3 ports on each chip provide connectivity between the chips +* 24 ports from each switch chip connect to the external bulkhead, for a total of 48 ### IB Single-Plane Topology - ICEX M-Cell @@ -22,9 +22,9 @@ Each of the 3 inter-connected D racks are equivalent to one half of M-Cell rack. As shown in a diagram  -- Racks 21, 22, 23, 24, 25, 26 are equivalent to one M-Cell rack. -- Racks 27, 28, 29, 30, 31, 32 are equivalent to one M-Cell rack. -- Racks 33, 34, 35, 36, 37, 38 are equivalent to one M-Cell rack. +* Racks 21, 22, 23, 24, 25, 26 are equivalent to one M-Cell rack. +* Racks 27, 28, 29, 30, 31, 32 are equivalent to one M-Cell rack. +* Racks 33, 34, 35, 36, 37, 38 are equivalent to one M-Cell rack. [IB single-plane topology - Accelerated nodes.pdf](<../src/IB single-plane topology - Accelerated nodes.pdf>) diff --git a/docs.it4i/salomon/job-priority.md b/docs.it4i/salomon/job-priority.md index d13c8f5f4c3f8d98d3cc418ec2695f99a54ce9bc..bb762398529c1a919a5f4b36442ed06522616b44 100644 --- a/docs.it4i/salomon/job-priority.md +++ b/docs.it4i/salomon/job-priority.md @@ -36,8 +36,8 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut # Jobs Queued in Queue qexp Are Not Calculated to Project's Usage. -!!! Note "Note" - Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>. +!!! note + Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -65,8 +65,8 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. -!!! Note "Note" - It is **very beneficial to specify the walltime** when submitting jobs. +!!! note + It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/salomon/job-submission-and-execution.md b/docs.it4i/salomon/job-submission-and-execution.md index 96f8d21875ace8c7e2b51a647f8d8707661af175..0865e9c21b44c7b755e810d9a81da902de452183 100644 --- a/docs.it4i/salomon/job-submission-and-execution.md +++ b/docs.it4i/salomon/job-submission-and-execution.md @@ -11,8 +11,8 @@ When allocating computational resources for the job, please specify 5. Project ID 6. Jobscript or interactive switch -!!! Note "Note" - Use the **qsub** command to submit your job to a queue for allocation of the computational resources. +!!! note + Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: @@ -22,8 +22,8 @@ $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] job The qsub submits the job into the queue, in another words the qsub command creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to above described policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** -!!! Note "Note" - PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster. +!!! note + PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster. ### Job Submission Examples @@ -71,8 +71,8 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores ### UV2000 SMP -!!! Note "Note" - 14 NUMA nodes available on UV2000 +!!! note + 14 NUMA nodes available on UV2000 Per NUMA node allocation. Jobs are isolated by cpusets. @@ -108,8 +108,8 @@ $ qsub -m n ### Placement by Name -!!! Note "Note" - Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks. +!!! note + Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks. Specific nodes may be selected using PBS resource attribute host (for hostnames): @@ -135,8 +135,8 @@ For communication intensive jobs it is possible to set stricter requirement - to Nodes directly connected to the same InifiBand switch can communicate most efficiently. Using the same switch prevents hops in the network and provides for unbiased, most efficient network communication. There are 9 nodes directly connected to every InifiBand switch. -!!! Note "Note" - We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. +!!! note + We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. Nodes directly connected to the one InifiBand switch can be allocated using node grouping on PBS resource attribute switch. @@ -148,8 +148,8 @@ $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob ### Placement by Specific InifiBand Switch -!!! Note "Note" - Not useful for ordinary computing, suitable for testing and management tasks. +!!! note + Not useful for ordinary computing, suitable for testing and management tasks. Nodes directly connected to the specific InifiBand switch can be selected using the PBS resource attribute _switch_. @@ -233,8 +233,8 @@ r1i0n11 ## Job Management -!!! Note "Note" - Check status of your jobs using the **qstat** and **check-pbs-jobs** commands +!!! note + Check status of your jobs using the **qstat** and **check-pbs-jobs** commands ```bash $ qstat -a @@ -312,8 +312,8 @@ Run loop 3 In this example, we see actual output (some iteration loops) of the job 35141.dm2 -!!! Note "Note" - Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel,** **qsig** or **qalter** commands +!!! note + Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel,** **qsig** or **qalter** commands You may release your allocation at any time, using qdel command @@ -337,13 +337,13 @@ $ man pbs_professional ### Jobscript -!!! Note "Note" - Prepare the jobscript to run batch jobs in the PBS queue system +!!! note + Prepare the jobscript to run batch jobs in the PBS queue system The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager. -!!! Note "Note" - The jobscript or interactive shell is executed on first of the allocated nodes. +!!! note + The jobscript or interactive shell is executed on first of the allocated nodes. ```bash $ qsub -q qexp -l select=4:ncpus=24 -N Name0 ./myjob @@ -359,8 +359,8 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time In this example, the nodes r21u01n577, r21u02n578, r21u03n579, r21u04n580 were allocated for 1 hour via the qexp queue. The jobscript myjob will be executed on the node r21u01n577, while the nodes r21u02n578, r21u03n579, r21u04n580 are available for use as well. -!!! Note "Note" - The jobscript or interactive shell is by default executed in home directory +!!! note + The jobscript or interactive shell is by default executed in home directory ```bash $ qsub -q qexp -l select=4:ncpus=24 -I @@ -373,8 +373,8 @@ $ pwd In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory. -!!! Note "Note" - All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. +!!! note + All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well. @@ -405,8 +405,8 @@ In this example, the hostname program is executed via pdsh from the interactive ### Example Jobscript for MPI Calculation -!!! Note "Note" - Production jobs must use the /scratch directory for I/O +!!! note + Production jobs must use the /scratch directory for I/O The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. @@ -437,13 +437,13 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. -!!! Note "Note" - Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. +!!! note + Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -!!! Note "Note" - Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. +!!! note + Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. ### Example Jobscript for MPI Calculation With Preloaded Inputs @@ -476,8 +476,8 @@ HTML commented section #2 (examples need to be reworked) ### Example Jobscript for Single Node Calculation -!!! Note "Note" - Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory. +!!! note + Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory. Example jobscript for single node calculation, using [local scratch](storage/) on the node: diff --git a/docs.it4i/salomon/prace.md b/docs.it4i/salomon/prace.md index 5dfd1dfa47ce4254001b33a9d990855d10ea9875..76b9186711bd7477e714a2621d60eb46d4ef290d 100644 --- a/docs.it4i/salomon/prace.md +++ b/docs.it4i/salomon/prace.md @@ -28,11 +28,11 @@ The user will need a valid certificate and to be present in the PRACE LDAP (plea Most of the information needed by PRACE users accessing the Salomon TIER-1 system can be found here: -- [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs) -- [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ) -- [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) -- [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) -- [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) +* [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs) +* [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ) +* [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) +* [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) +* [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) Before you start to use any of the services don't forget to create a proxy certificate from your certificate: @@ -202,7 +202,8 @@ Generally both shared file systems are available through GridFTP: More information about the shared file systems is available [here](storage/). -Please note, that for PRACE users a "prace" directory is used also on the SCRATCH file system. +!!! hint + `prace` directory is used for PRACE users on the SCRATCH file system. | Data type | Default path | | ---------------------------- | ------------------------------- | @@ -245,10 +246,10 @@ The resources that are currently subject to accounting are the core hours. The c PRACE users should check their project accounting using the [PRACE Accounting Tool (DART)](http://www.prace-ri.eu/accounting-report-tool/). -Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". Please note that you need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". +Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". You need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". -!!! Note "Note" - The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> +!!! note + The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> ```bash $ it4ifree diff --git a/docs.it4i/salomon/resource-allocation-and-job-execution.md b/docs.it4i/salomon/resource-allocation-and-job-execution.md index fa47d4a4203249ad77764deb590ba5fbedc99659..940e43a91b0e389a2758eae8ac3d51ff1e9f2f08 100644 --- a/docs.it4i/salomon/resource-allocation-and-job-execution.md +++ b/docs.it4i/salomon/resource-allocation-and-job-execution.md @@ -6,22 +6,22 @@ To run a [job](job-submission-and-execution/), [computational resources](resourc The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](job-priority/) at Salomon ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are available to Anselm users: -- **qexp**, the Express queue -- **qprod**, the Production queue -- **qlong**, the Long queue -- **qmpp**, the Massively parallel queue -- **qfat**, the queue to access SMP UV2000 machine -- **qfree**, the Free resource utilization queue +* **qexp**, the Express queue +* **qprod**, the Production queue +* **qlong**, the Long queue +* **qmpp**, the Massively parallel queue +* **qfat**, the queue to access SMP UV2000 machine +* **qfree**, the Free resource utilization queue -!!! Note "Note" - Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> +!!! note + Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> Read more on the [Resource Allocation Policy](resources-allocation-policy/) page. ## Job Submission and Execution -!!! Note "Note" - Use the **qsub** command to submit your jobs. +!!! note + Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 24 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** diff --git a/docs.it4i/salomon/resources-allocation-policy.md b/docs.it4i/salomon/resources-allocation-policy.md index 8f77c70f2eaa9f26629e5418c3b39c661427cf5a..ab5f32a4f3a6327d7cb262ac06deff646783a8db 100644 --- a/docs.it4i/salomon/resources-allocation-policy.md +++ b/docs.it4i/salomon/resources-allocation-policy.md @@ -4,8 +4,8 @@ The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: -!!! Note "Note" - Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> +!!! note + Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | | ------------------------------- | -------------- | ----------------- | ------------------------------------------------------------- | --------- | -------- | ------------- | --------- | @@ -17,19 +17,19 @@ The resources are allocated to the job in a fair-share fashion, subject to const | **qfree** Free resource queue | yes | none required | 752 nodes, max 86 per job | 24 | -1024 | no | 12 / 12h | | **qviz** Visualization queue | yes | none required | 2 (with NVIDIA Quadro K5000) | 4 | 150 | no | 1 / 8h | -!!! Note "Note" - **The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy/#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. +!!! note + **The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy/#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. -- **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. -- **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, however only 86 per job. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. -- **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 336 nodes without acceleration may be accessed via the qlong queue. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 \* 48 h) -- **qmpp**, the massively parallel queue. This queue is intended for massively parallel runs. It is required that active project with nonzero remaining resources is specified to enter the qmpp. All nodes may be accessed via the qmpp queue. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qmpp is 4 hours. An PI needs explicitly ask support for authorization to enter the queue for all users associated to her/his Project. -- **qfat**, the UV2000 queue. This queue is dedicated to access the fat SGI UV2000 SMP machine. The machine (uv1) has 112 Intel IvyBridge cores at 3.3GHz and 3.25TB RAM. An PI needs explicitly ask support for authorization to enter the queue for all users associated to her/his Project. -- **qfree**, the Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 24 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. -- **qviz**, the Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 73 GB of RAM and 1/7 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 4 cores each) up to one whole node per user, so that all 28 cores, 512 GB RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. +* **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. +* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, however only 86 per job. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. +* **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 336 nodes without acceleration may be accessed via the qlong queue. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 \* 48 h) +* **qmpp**, the massively parallel queue. This queue is intended for massively parallel runs. It is required that active project with nonzero remaining resources is specified to enter the qmpp. All nodes may be accessed via the qmpp queue. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qmpp is 4 hours. An PI needs explicitly ask support for authorization to enter the queue for all users associated to her/his Project. +* **qfat**, the UV2000 queue. This queue is dedicated to access the fat SGI UV2000 SMP machine. The machine (uv1) has 112 Intel IvyBridge cores at 3.3GHz and 3.25TB RAM. An PI needs explicitly ask support for authorization to enter the queue for all users associated to her/his Project. +* **qfree**, the Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 24 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. +* **qviz**, the Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 73 GB of RAM and 1/7 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 4 cores each) up to one whole node per user, so that all 28 cores, 512 GB RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. -!!! Note "Note" - To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/). +!!! note + To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/). ### Notes @@ -41,8 +41,8 @@ Salomon users may check current queue configuration at <https://extranet.it4i.cz ### Queue Status -!!! Note "Note" - Check the status of jobs, queues and compute nodes at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon) +!!! note + Check the status of jobs, queues and compute nodes at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon)  @@ -119,8 +119,8 @@ The resources that are currently subject to accounting are the core-hours. The c ### Check Consumed Resources -!!! Note "Note" - The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> +!!! note + The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. diff --git a/docs.it4i/salomon/shell-and-data-access.md b/docs.it4i/salomon/shell-and-data-access.md index 06f79c205296b7ada5818d2e4993ff3d4dc843fc..fef1df3bca4bf1da985055af4bb386667c7a076e 100644 --- a/docs.it4i/salomon/shell-and-data-access.md +++ b/docs.it4i/salomon/shell-and-data-access.md @@ -4,8 +4,8 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, login3 and login4 at address salomon.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address. -!!! Note "Note" - The alias salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to VPN. +!!! note + The alias salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to VPN. | Login address | Port | Protocol | Login node | | ---------------------- | ---- | -------- | ------------------------------------- | @@ -17,10 +17,10 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) -!!! Note "Note" - Please verify SSH fingerprints during the first logon. They are identical on all login nodes: - f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) - 70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) +!!! note + Please verify SSH fingerprints during the first logon. They are identical on all login nodes: + f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) + 70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) Private key authentication: @@ -56,8 +56,8 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com [username@login2.salomon ~]$ ``` -!!! Note "Note" - The environment is **not** shared between login nodes, except for [shared filesystems](storage/). +!!! note + The environment is **not** shared between login nodes, except for [shared filesystems](storage/). ## Data Transfer @@ -120,8 +120,8 @@ Outgoing connections, from Salomon Cluster login nodes to the outside world, are | 443 | https | | 9418 | git | -!!! Note "Note" - Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. +!!! note + Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. Outgoing connections, from Salomon Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. @@ -129,8 +129,8 @@ Outgoing connections, from Salomon Cluster compute nodes are restricted to the i ### Port Forwarding From Login Nodes -!!! Note "Note" - Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. +!!! note + Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. It works by tunneling the connection from Salomon back to users workstation and forwarding from the workstation to the remote host. @@ -170,8 +170,8 @@ In this example, we assume that port forwarding from login1:6000 to remote.host. Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. -!!! Note "Note" - Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. +!!! note + Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: @@ -191,9 +191,9 @@ Now, configure the applications proxy settings to **localhost:6000**. Use port f ## Graphical User Interface -- The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters. -- The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer). +* The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters. +* The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer). ## VPN Access -- Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/). +* Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/). diff --git a/docs.it4i/salomon/software/ansys/licensing.md b/docs.it4i/salomon/software/ansys/licensing.md index ba4405f1a2ca525a338f2aadc09fc893a6ff1958..8709ba86478c7bb933fb2cd586cd9eaa3c8729bc 100644 --- a/docs.it4i/salomon/software/ansys/licensing.md +++ b/docs.it4i/salomon/software/ansys/licensing.md @@ -2,9 +2,9 @@ ## ANSYS Licence Can Be Used By: -- all persons in the carrying out of the CE IT4Innovations Project (In addition to the primary licensee, which is VSB - Technical University of Ostrava, users are CE IT4Innovations third parties - CE IT4Innovations project partners, particularly the University of Ostrava, the Brno University of Technology - Faculty of Informatics, the Silesian University in Opava, Institute of Geonics AS CR.) -- all persons who have a valid license -- students of the Technical University +* all persons in the carrying out of the CE IT4Innovations Project (In addition to the primary licensee, which is VSB - Technical University of Ostrava, users are CE IT4Innovations third parties - CE IT4Innovations project partners, particularly the University of Ostrava, the Brno University of Technology - Faculty of Informatics, the Silesian University in Opava, Institute of Geonics AS CR.) +* all persons who have a valid license +* students of the Technical University ## ANSYS Academic Research @@ -16,8 +16,8 @@ The licence intended to be used for science and research, publications, students ## Available Versions -- 16.1 -- 17.0 +* 16.1 +* 17.0 ## License Preferences diff --git a/docs.it4i/salomon/software/chemistry/molpro.md b/docs.it4i/salomon/software/chemistry/molpro.md index 308ed266f8da52511be04f46da0a07355b4cb1ab..eb0ffb2db199699a64c6aa853417068fc0d773d9 100644 --- a/docs.it4i/salomon/software/chemistry/molpro.md +++ b/docs.it4i/salomon/software/chemistry/molpro.md @@ -32,8 +32,8 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. -!!! Note "Note" - The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. +!!! note + The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH filesystem](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch filesystem. diff --git a/docs.it4i/salomon/software/chemistry/nwchem.md b/docs.it4i/salomon/software/chemistry/nwchem.md index 5ed6e3ccf9041476adaf12a722b91076950b7967..be4e95f060601b302b8a4a9a677672256001ba5e 100644 --- a/docs.it4i/salomon/software/chemistry/nwchem.md +++ b/docs.it4i/salomon/software/chemistry/nwchem.md @@ -12,8 +12,8 @@ NWChem aims to provide its users with computational chemistry tools that are sca The following versions are currently installed: -- NWChem/6.3.revision2-2013-10-17-Python-2.7.8, current release. Compiled with Intel compilers, MKL and Intel MPI -- NWChem/6.5.revision26243-intel-2015b-2014-09-10-Python-2.7.8 +* NWChem/6.3.revision2-2013-10-17-Python-2.7.8, current release. Compiled with Intel compilers, MKL and Intel MPI +* NWChem/6.5.revision26243-intel-2015b-2014-09-10-Python-2.7.8 For a current list of installed versions, execute: @@ -41,5 +41,5 @@ The recommend to use version 6.5. Version 6.3 fails on Salomon nodes with accele Please refer to [the documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and in the input file set the following directives : -- MEMORY : controls the amount of memory NWChem will use -- SCRATCH_DIR : set this to a directory in [SCRATCH filesystem](../../storage/storage/) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, eg. "scf direct" +* MEMORY : controls the amount of memory NWChem will use +* SCRATCH_DIR : set this to a directory in [SCRATCH filesystem](../../storage/storage/) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, eg. "scf direct" diff --git a/docs.it4i/salomon/software/chemistry/phono3py.md b/docs.it4i/salomon/software/chemistry/phono3py.md index 35a5d1313797af3cde15ea042a39327ca00d66f7..b1f575f6758847682894d66598b48a04d725a237 100644 --- a/docs.it4i/salomon/software/chemistry/phono3py.md +++ b/docs.it4i/salomon/software/chemistry/phono3py.md @@ -4,8 +4,8 @@ This GPL software calculates phonon-phonon interactions via the third order force constants. It allows to obtain lattice thermal conductivity, phonon lifetime/linewidth, imaginary part of self energy at the lowest order, joint density of states (JDOS) and weighted-JDOS. For details see Phys. Rev. B 91, 094306 (2015) and <http://atztogo.github.io/phono3py/index.html> -!!! Note "Note" - Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module +!!! note + Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module ```bash $ module load phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 @@ -109,41 +109,41 @@ $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" --sigma 0.1 --wgp $ grep grid_point ir_grid_points.yaml num_reduced_ir_grid_points: 35 ir_grid_points: # [address, weight] -- grid_point: 0 -- grid_point: 1 -- grid_point: 2 -- grid_point: 3 -- grid_point: 4 -- grid_point: 10 -- grid_point: 11 -- grid_point: 12 -- grid_point: 13 -- grid_point: 20 -- grid_point: 21 -- grid_point: 22 -- grid_point: 30 -- grid_point: 31 -- grid_point: 40 -- grid_point: 91 -- grid_point: 92 -- grid_point: 93 -- grid_point: 94 -- grid_point: 101 -- grid_point: 102 -- grid_point: 103 -- grid_point: 111 -- grid_point: 112 -- grid_point: 121 -- grid_point: 182 -- grid_point: 183 -- grid_point: 184 -- grid_point: 192 -- grid_point: 193 -- grid_point: 202 -- grid_point: 273 -- grid_point: 274 -- grid_point: 283 -- grid_point: 364 +* grid_point: 0 +* grid_point: 1 +* grid_point: 2 +* grid_point: 3 +* grid_point: 4 +* grid_point: 10 +* grid_point: 11 +* grid_point: 12 +* grid_point: 13 +* grid_point: 20 +* grid_point: 21 +* grid_point: 22 +* grid_point: 30 +* grid_point: 31 +* grid_point: 40 +* grid_point: 91 +* grid_point: 92 +* grid_point: 93 +* grid_point: 94 +* grid_point: 101 +* grid_point: 102 +* grid_point: 103 +* grid_point: 111 +* grid_point: 112 +* grid_point: 121 +* grid_point: 182 +* grid_point: 183 +* grid_point: 184 +* grid_point: 192 +* grid_point: 193 +* grid_point: 202 +* grid_point: 273 +* grid_point: 274 +* grid_point: 283 +* grid_point: 364 ``` one finds which grid points needed to be calculated, for instance using following diff --git a/docs.it4i/salomon/software/compilers.md b/docs.it4i/salomon/software/compilers.md index 5f9a9ccbb74efe1c624ea26a5e003717c102a26b..da785d173bb9cf361c2f5d062b3d269b0e530293 100644 --- a/docs.it4i/salomon/software/compilers.md +++ b/docs.it4i/salomon/software/compilers.md @@ -4,22 +4,22 @@ Available compilers, including GNU, INTEL and UPC compilers There are several compilers for different programming languages available on the cluster: -- C/C++ -- Fortran 77/90/95/HPF -- Unified Parallel C -- Java +* C/C++ +* Fortran 77/90/95/HPF +* Unified Parallel C +* Java The C/C++ and Fortran compilers are provided by: Opensource: -- GNU GCC -- Clang/LLVM +* GNU GCC +* Clang/LLVM Commercial licenses: -- Intel -- PGI +* Intel +* PGI ## Intel Compilers @@ -81,8 +81,8 @@ For more information about the possibilities of the compilers, please see the ma UPC is supported by two compiler/runtime implementations: -- GNU - SMP/multi-threading support only -- Berkley - multi-node support as well as SMP/multi-threading support +* GNU - SMP/multi-threading support only +* Berkley - multi-node support as well as SMP/multi-threading support ### GNU UPC Compiler @@ -138,7 +138,10 @@ To use the Berkley UPC compiler and runtime environment to run the binaries use As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only. -For production runs, it is recommended to use the native InfiniBand implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. Please note, that the selection of the network is done at the compile time and not at runtime (as expected)! +For production runs, it is recommended to use the native InfiniBand implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. + +!!! warning + Selection of the network is done at the compile time and not at runtime (as expected)! Example UPC code: diff --git a/docs.it4i/salomon/software/comsol/comsol-multiphysics.md b/docs.it4i/salomon/software/comsol/comsol-multiphysics.md index d3c84a193a723d9042ba788ef687cde5290992be..fd40c1e4aefe6acfc79aff06425ebf5ee7594fe5 100644 --- a/docs.it4i/salomon/software/comsol/comsol-multiphysics.md +++ b/docs.it4i/salomon/software/comsol/comsol-multiphysics.md @@ -4,11 +4,11 @@ [COMSOL](http://www.comsol.com) is a powerful environment for modelling and solving various engineering and scientific problems based on partial differential equations. COMSOL is designed to solve coupled or multiphysics phenomena. For many standard engineering problems COMSOL provides add-on products such as electrical, mechanical, fluid flow, and chemical applications. -- [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module), -- [Heat Transfer Module](http://www.comsol.com/heat-transfer-module), -- [CFD Module](http://www.comsol.com/cfd-module), -- [Acoustics Module](http://www.comsol.com/acoustics-module), -- and [many others](http://www.comsol.com/products) +* [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module), +* [Heat Transfer Module](http://www.comsol.com/heat-transfer-module), +* [CFD Module](http://www.comsol.com/cfd-module), +* [Acoustics Module](http://www.comsol.com/acoustics-module), +* and [many others](http://www.comsol.com/products) COMSOL also allows an interface support for equation-based modelling of partial differential equations. @@ -16,9 +16,9 @@ COMSOL also allows an interface support for equation-based modelling of partial On the clusters COMSOL is available in the latest stable version. There are two variants of the release: -- **Non commercial** or so called >**EDU variant**>, which can be used for research and educational purposes. +* **Non commercial** or so called >**EDU variant**>, which can be used for research and educational purposes. -- **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon. +* **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon. To load the of COMSOL load the module diff --git a/docs.it4i/salomon/software/comsol/licensing-and-available-versions.md b/docs.it4i/salomon/software/comsol/licensing-and-available-versions.md index 41972882c7d3154e6474953e91aaf250a3b2b91b..6e9d290c73257580869df79364c7cca8d6ae72e5 100644 --- a/docs.it4i/salomon/software/comsol/licensing-and-available-versions.md +++ b/docs.it4i/salomon/software/comsol/licensing-and-available-versions.md @@ -2,9 +2,9 @@ ## Comsol Licence Can Be Used By: -- all persons in the carrying out of the CE IT4Innovations Project (In addition to the primary licensee, which is VSB - Technical University of Ostrava, users are CE IT4Innovations third parties - CE IT4Innovations project partners, particularly the University of Ostrava, the Brno University of Technology - Faculty of Informatics, the Silesian University in Opava, Institute of Geonics AS CR.) -- all persons who have a valid license -- students of the Technical University +* all persons in the carrying out of the CE IT4Innovations Project (In addition to the primary licensee, which is VSB - Technical University of Ostrava, users are CE IT4Innovations third parties - CE IT4Innovations project partners, particularly the University of Ostrava, the Brno University of Technology - Faculty of Informatics, the Silesian University in Opava, Institute of Geonics AS CR.) +* all persons who have a valid license +* students of the Technical University ## Comsol EDU Network Licence @@ -16,4 +16,4 @@ The licence intended to be used for science and research, publications, students ## Available Versions -- ver. 51 +* ver. 51 diff --git a/docs.it4i/salomon/software/debuggers/aislinn.md b/docs.it4i/salomon/software/debuggers/aislinn.md index cf3c57b62db4975663eefb539a14436c55886773..7db8ebc34343b03af0449b4790f0cd880ced5c6a 100644 --- a/docs.it4i/salomon/software/debuggers/aislinn.md +++ b/docs.it4i/salomon/software/debuggers/aislinn.md @@ -1,12 +1,12 @@ # Aislinn -- Aislinn is a dynamic verifier for MPI programs. For a fixed input it covers all possible runs with respect to nondeterminism introduced by MPI. It allows to detect bugs (for sure) that occurs very rare in normal runs. -- Aislinn detects problems like invalid memory accesses, deadlocks, misuse of MPI, and resource leaks. -- Aislinn is open-source software; you can use it without any licensing limitations. -- Web page of the project: <http://verif.cs.vsb.cz/aislinn/> +* Aislinn is a dynamic verifier for MPI programs. For a fixed input it covers all possible runs with respect to nondeterminism introduced by MPI. It allows to detect bugs (for sure) that occurs very rare in normal runs. +* Aislinn detects problems like invalid memory accesses, deadlocks, misuse of MPI, and resource leaks. +* Aislinn is open-source software; you can use it without any licensing limitations. +* Web page of the project: <http://verif.cs.vsb.cz/aislinn/> -!!! Note "Note" - Aislinn is software developed at IT4Innovations and some parts are still considered experimental. If you have any questions or experienced any problems, please contact the author: <mailto:stanislav.bohm@vsb.cz>. +!!! note + Aislinn is software developed at IT4Innovations and some parts are still considered experimental. If you have any questions or experienced any problems, please contact the author: <mailto:stanislav.bohm@vsb.cz>. ### Usage @@ -83,20 +83,20 @@ At the beginning of the report there are some basic summaries of the verificatio It shows us: -- Error occurs in process 0 in test.cpp on line 16. -- Stdout and stderr streams are empty. (The program does not write anything). -- The last part shows MPI calls for each process that occurs in the invalid run. The more detailed information about each call can be obtained by mouse cursor. +* Error occurs in process 0 in test.cpp on line 16. +* Stdout and stderr streams are empty. (The program does not write anything). +* The last part shows MPI calls for each process that occurs in the invalid run. The more detailed information about each call can be obtained by mouse cursor. ### Limitations Since the verification is a non-trivial process there are some of limitations. -- The verified process has to terminate in all runs, i.e. we cannot answer the halting problem. -- The verification is a computationally and memory demanding process. We put an effort to make it efficient and it is an important point for further research. However covering all runs will be always more demanding than techniques that examines only a single run. The good practise is to start with small instances and when it is feasible, make them bigger. The Aislinn is good to find bugs that are hard to find because they occur very rarely (only in a rare scheduling). Such bugs often do not need big instances. -- Aislinn expects that your program is a "standard MPI" program, i.e. processes communicate only through MPI, the verified program does not interacts with the system in some unusual ways (e.g. opening sockets). +* The verified process has to terminate in all runs, i.e. we cannot answer the halting problem. +* The verification is a computationally and memory demanding process. We put an effort to make it efficient and it is an important point for further research. However covering all runs will be always more demanding than techniques that examines only a single run. The good practise is to start with small instances and when it is feasible, make them bigger. The Aislinn is good to find bugs that are hard to find because they occur very rarely (only in a rare scheduling). Such bugs often do not need big instances. +* Aislinn expects that your program is a "standard MPI" program, i.e. processes communicate only through MPI, the verified program does not interacts with the system in some unusual ways (e.g. opening sockets). There are also some limitations bounded to the current version and they will be removed in the future: -- All files containing MPI calls have to be recompiled by MPI implementation provided by Aislinn. The files that does not contain MPI calls, they do not have to recompiled. Aislinn MPI implementation supports many commonly used calls from MPI-2 and MPI-3 related to point-to-point communication, collective communication, and communicator management. Unfortunately, MPI-IO and one-side communication is not implemented yet. -- Each MPI can use only one thread (if you use OpenMP, set OMP_NUM_THREADS to 1). -- There are some limitations for using files, but if the program just reads inputs and writes results, it is ok. +* All files containing MPI calls have to be recompiled by MPI implementation provided by Aislinn. The files that does not contain MPI calls, they do not have to recompiled. Aislinn MPI implementation supports many commonly used calls from MPI-2 and MPI-3 related to point-to-point communication, collective communication, and communicator management. Unfortunately, MPI-IO and one-side communication is not implemented yet. +* Each MPI can use only one thread (if you use OpenMP, set OMP_NUM_THREADS to 1). +* There are some limitations for using files, but if the program just reads inputs and writes results, it is ok. diff --git a/docs.it4i/salomon/software/debuggers/allinea-ddt.md b/docs.it4i/salomon/software/debuggers/allinea-ddt.md index 94890c0568db7fc65df273c068b3356c2c23b44e..3315d6deecb54892b0dfa059ca86f949a7385ca5 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-ddt.md +++ b/docs.it4i/salomon/software/debuggers/allinea-ddt.md @@ -10,13 +10,13 @@ Allinea MAP is a profiler for C/C++/Fortran HPC codes. It is designed for profil On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that: -- 1 user can debug up 64 processes, or -- 32 users can debug 2 processes, etc. +* 1 user can debug up 64 processes, or +* 32 users can debug 2 processes, etc. In case of debugging on accelerators: -- 1 user can debug on up to 8 accelerators, or -- 8 users can debug on single accelerator. +* 1 user can debug on up to 8 accelerators, or +* 8 users can debug on single accelerator. ## Compiling Code to Run With DDT @@ -47,8 +47,8 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: -!!! Note "Note" - \- **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. +!!! note + \- **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - - **O0** : Suppress all optimizations. diff --git a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md index 918ffca1ae77f0c22bbce07cf4ca3f4a95058684..224dca9612556aeb83c14fe58554782b55af297b 100644 --- a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md @@ -4,10 +4,10 @@ Intel *®* VTune™ Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A highlight of the features: -- Hotspot analysis -- Locks and waits analysis -- Low level specific counters, such as branch analysis and memory bandwidth -- Power usage analysis - frequency and sleep states. +* Hotspot analysis +* Locks and waits analysis +* Low level specific counters, such as branch analysis and memory bandwidth +* Power usage analysis - frequency and sleep states.  @@ -68,8 +68,8 @@ This mode is useful for native Xeon Phi applications launched directly on the ca This mode is useful for applications that are launched from the host and use offload, OpenCL or mpirun. In *Analysis Target* window, select *Intel Xeon Phi coprocessor (native)*, choose path to the binaryand MIC card to run on. -!!! Note "Note" - If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. +!!! note + If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : diff --git a/docs.it4i/salomon/software/debuggers/total-view.md b/docs.it4i/salomon/software/debuggers/total-view.md index 7f1cd15dfdc270cde7b3c92244ab04842ab066c5..17a2d42344ffa0ccc1c34ec4c369bfcca8341e79 100644 --- a/docs.it4i/salomon/software/debuggers/total-view.md +++ b/docs.it4i/salomon/software/debuggers/total-view.md @@ -45,8 +45,8 @@ Compile the code: Before debugging, you need to compile your code with theses flags: -!!! Note "Note" - **-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. +!!! note + **-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. **-O0** : Suppress all optimizations. @@ -80,8 +80,8 @@ To debug a serial code use: To debug a parallel code compiled with **OpenMPI** you need to setup your TotalView environment: -!!! Note "Note" - **Please note:** To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file: +!!! hint + To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file. ```bash proc mpi_auto_run_starter {loaded_id} { diff --git a/docs.it4i/salomon/software/debuggers/valgrind.md b/docs.it4i/salomon/software/debuggers/valgrind.md index a5d52269cc0e835f77752fc0ed8be3d3afe40b24..af97d2b617e4af5f9b1db30fbfbcad4650575289 100644 --- a/docs.it4i/salomon/software/debuggers/valgrind.md +++ b/docs.it4i/salomon/software/debuggers/valgrind.md @@ -8,20 +8,20 @@ Valgind is an extremely useful tool for debugging memory errors such as [off-by- The main tools available in Valgrind are : -- **Memcheck**, the original, must used and default tool. Verifies memory access in you program and can detect use of unitialized memory, out of bounds memory access, memory leaks, double free, etc. -- **Massif**, a heap profiler. -- **Hellgrind** and **DRD** can detect race conditions in multi-threaded applications. -- **Cachegrind**, a cache profiler. -- **Callgrind**, a callgraph analyzer. -- For a full list and detailed documentation, please refer to the [official Valgrind documentation](http://valgrind.org/docs/). +* **Memcheck**, the original, must used and default tool. Verifies memory access in you program and can detect use of unitialized memory, out of bounds memory access, memory leaks, double free, etc. +* **Massif**, a heap profiler. +* **Hellgrind** and **DRD** can detect race conditions in multi-threaded applications. +* **Cachegrind**, a cache profiler. +* **Callgrind**, a callgraph analyzer. +* For a full list and detailed documentation, please refer to the [official Valgrind documentation](http://valgrind.org/docs/). ## Installed Versions There are two versions of Valgrind available on the cluster. -- Version 3.8.1, installed by operating system vendor in /usr/bin/valgrind. This version is available by default, without the need to load any module. This version however does not provide additional MPI support. Also, it does not support AVX2 instructions, debugging of an AVX2-enabled executable with this version will fail -- Version 3.11.0 built by ICC with support for Intel MPI, available in module Valgrind/3.11.0-intel-2015b. After loading the module, this version replaces the default valgrind. -- Version 3.11.0 built by GCC with support for Open MPI, module Valgrind/3.11.0-foss-2015b +* Version 3.8.1, installed by operating system vendor in /usr/bin/valgrind. This version is available by default, without the need to load any module. This version however does not provide additional MPI support. Also, it does not support AVX2 instructions, debugging of an AVX2-enabled executable with this version will fail +* Version 3.11.0 built by ICC with support for Intel MPI, available in module Valgrind/3.11.0-intel-2015b. After loading the module, this version replaces the default valgrind. +* Version 3.11.0 built by GCC with support for Open MPI, module Valgrind/3.11.0-foss-2015b ## Usage diff --git a/docs.it4i/salomon/software/intel-suite/intel-compilers.md b/docs.it4i/salomon/software/intel-suite/intel-compilers.md index 43b816102543988f8b462d2ab0e3d0c408b971f5..1a122dbae163406643dc6c3d5fde62a397cff3af 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-compilers.md +++ b/docs.it4i/salomon/software/intel-suite/intel-compilers.md @@ -32,5 +32,5 @@ Read more at <https://software.intel.com/en-us/intel-cplusplus-compiler-16.0-use Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon compute nodes are equipped with Haswell based architecture. The UV1 SMP compute server has Ivy Bridge CPUs, which are equivalent to Sandy Bridge (only smaller manufacturing technology). The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing this: -- Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge/Ivy Bridge nodes. -- Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge/Ivy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries. +* Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge/Ivy Bridge nodes. +* Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge/Ivy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries. diff --git a/docs.it4i/salomon/software/intel-suite/intel-mkl.md b/docs.it4i/salomon/software/intel-suite/intel-mkl.md index 83f109e7d25d1eeb3bc3876fcf79750db6897cbc..ca0cdcc619554869f74f13d9d25895b28530c330 100644 --- a/docs.it4i/salomon/software/intel-suite/intel-mkl.md +++ b/docs.it4i/salomon/software/intel-suite/intel-mkl.md @@ -4,14 +4,14 @@ Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels: -- BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations. -- The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations. -- ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). -- Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions. -- Vector Math Library (VML) routines for optimized mathematical operations on vectors. -- Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions. -- Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. -- Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver. +* BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations. +* The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations. +* ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). +* Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions. +* Vector Math Library (VML) routines for optimized mathematical operations on vectors. +* Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions. +* Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. +* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver. For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm). diff --git a/docs.it4i/salomon/software/intel-xeon-phi.md b/docs.it4i/salomon/software/intel-xeon-phi.md index ecddeea4e09bed949585a740cc028017608eaadc..65457058ce668c1ee7a92f1d19bbceec427dc051 100644 --- a/docs.it4i/salomon/software/intel-xeon-phi.md +++ b/docs.it4i/salomon/software/intel-xeon-phi.md @@ -103,7 +103,10 @@ For debugging purposes it is also recommended to set environment variable "OFFLO export OFFLOAD_REPORT=3 ``` -A very basic example of code that employs offload programming technique is shown in the next listing. Please note that this code is sequential and utilizes only single core of the accelerator. +A very basic example of code that employs offload programming technique is shown in the next listing. + +!!! note + This code is sequential and utilizes only single core of the accelerator. ```bash $ vim source-offload.cpp @@ -229,8 +232,8 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: -!!! Note "Note" - Debugging +!!! note + Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level @@ -325,8 +328,8 @@ Following example show how to automatically offload an SGEMM (single precision - } ``` -!!! Note "Note" - Please note: This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** +!!! note + This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: @@ -368,8 +371,8 @@ To compile a code user has to be connected to a compute with MIC and load Intel $ module load intel/13.5.192 ``` -!!! Note "Note" - Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. +!!! note + Particular version of the Intel module is specified. This information is used later to specify the correct library paths. To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: @@ -411,13 +414,13 @@ If the code is parallelized using OpenMP a set of additional libraries is requir mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` -!!! Note "Note" - Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. +!!! note + The path exported contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: -!!! Note "Note" - /apps/intel/composer_xe_2013.5.192/compiler/lib/mic +!!! note + /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so - libimf.so @@ -497,8 +500,8 @@ After executing the complied binary file, following output should be displayed. ... ``` -!!! Note "Note" - More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> +!!! note + More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -537,8 +540,8 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ... ``` -!!! Note "Note" - Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. +!!! hint + GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. ## MPI @@ -599,7 +602,7 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: -!!! Note "Note" +!!! note **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. @@ -646,9 +649,7 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` -!!! Note "Note" - Please note: - +!!! note - this file sets up both environmental variable for both MPI and OpenMP libraries. - this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. @@ -701,10 +702,9 @@ or using mpirun $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` -!!! Note "Note" - Please note: - \- the full path to the binary has to specified (here: "**>~/mpi-test-mic**") - \- the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code +!!! note + - the full path to the binary has to specified (here: "**>~/mpi-test-mic**") + - the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code The output should be again similar to: @@ -715,8 +715,10 @@ The output should be again similar to: Hello world from process 0 of 4 on host cn207-mic0 ``` -!!! Note "Note" - Please note that the **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: +!!! hint + **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. + +A simple test to see if the file is present is to execute: ```bash $ ssh mic0 ls /bin/pmi_proxy @@ -748,12 +750,11 @@ For example: This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. -!!! Note "Note" - Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - - - to connect to the second node : ** $ ssh cn205** - - to connect to the accelerator on the first node from the first node: **$ ssh cn204-mic0** or **$ ssh mic0** - - to connect to the accelerator on the second node from the first node: **$ ssh cn205-mic0** +!!! note + At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: + - to connect to the second node : `$ ssh cn205` + - to connect to the accelerator on the first node from the first node: `$ ssh cn204-mic0` or `$ ssh mic0` + - to connect to the accelerator on the second node from the first node: `$ ssh cn205-mic0` At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" @@ -871,7 +872,7 @@ To run the MPI code using mpirun and the machine file "hosts_file_mix" use: A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: ```bash - Hello world from process 0 of 8 on host cn204 + Hello world from process 0 of 8 on host cn204 Hello world from process 1 of 8 on host cn204 Hello world from process 2 of 8 on host cn204-mic0 Hello world from process 3 of 8 on host cn204-mic0 @@ -881,21 +882,21 @@ A possible output of the MPI "hello-world" example executed on two hosts and two Hello world from process 7 of 8 on host cn205-mic0 ``` -!!! Note "Note" - Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. +!!! note + At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. **Using the PBS automatically generated node-files** PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -!!! Note "Note" - **Host only node-file:** +!!! note + **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: - /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file: - /lscratch/${PBS_JOBID}/nodefile-mix -Please note each host or accelerator is listed only per files. User has to specify how many jobs should be executed per node using "-n" parameter of the mpirun command. +Each host or accelerator is listed only per files. User has to specify how many jobs should be executed per node using "-n" parameter of the mpirun command. ## Optimization diff --git a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md index da78ee38db2fca6229aae19400cf10aec121e4b5..0af557ecf054b258b83ff3b0f1046a2e7d932e54 100644 --- a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md @@ -94,8 +94,8 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note "Note" - Important! Bind every OpenMP thread to a core! +!!! note + Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/salomon/software/mpi/mpi.md b/docs.it4i/salomon/software/mpi/mpi.md index 3f89096cbf2c3b4dfc29daee1728f214c7e75169..411d54ddabae7b32ef32f894f2cc466e93eeb866 100644 --- a/docs.it4i/salomon/software/mpi/mpi.md +++ b/docs.it4i/salomon/software/mpi/mpi.md @@ -126,8 +126,8 @@ Consider these ways to run an MPI program: **Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and numa effect overheads. -!!! Note "Note" - Important! Bind every OpenMP thread to a core! +!!! note + Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/salomon/software/numerical-languages/matlab.md b/docs.it4i/salomon/software/numerical-languages/matlab.md index 95f0e3dde69ad160c495b8b4e5c9cc6dbe0effb0..8cfbdf31afc0155eee6b84a64f43eb2bf2f35fef 100644 --- a/docs.it4i/salomon/software/numerical-languages/matlab.md +++ b/docs.it4i/salomon/software/numerical-languages/matlab.md @@ -4,8 +4,8 @@ Matlab is available in versions R2015a and R2015b. There are always two variants of the release: -- Non commercial or so called EDU variant, which can be used for common research and educational purposes. -- Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. +* Non commercial or so called EDU variant, which can be used for common research and educational purposes. +* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. To load the latest version of Matlab load the module @@ -129,7 +129,8 @@ The last part of the configuration is done directly in the user Matlab script be This script creates scheduler object "cluster" of type "local" that starts workers locally. -Please note: Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. +!!! hint + Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. @@ -212,7 +213,8 @@ You can start this script using batch mode the same way as in Local mode example This method is a "hack" invented by us to emulate the mpiexec functionality found in previous MATLAB versions. We leverage the MATLAB Generic Scheduler interface, but instead of submitting the workers to PBS, we launch the workers directly within the running job, thus we avoid the issues with master script and workers running in separate jobs (issues with license not available, waiting for the worker's job to spawn etc.) -Please note that this method is experimental. +!!! warning + This method is experimental. For this method, you need to use SalomonDirect profile, import it using [the same way as SalomonPBSPro](matlab.md#running-parallel-matlab-using-distributed-computing-toolbox---engine) diff --git a/docs.it4i/salomon/software/numerical-languages/octave.md b/docs.it4i/salomon/software/numerical-languages/octave.md index a9a82dfc0e88d777754465e602ec9a18cf40b188..eda9196ae8972e946d32361067e3bd43c4762721 100644 --- a/docs.it4i/salomon/software/numerical-languages/octave.md +++ b/docs.it4i/salomon/software/numerical-languages/octave.md @@ -9,7 +9,7 @@ Two versions of octave are available on the cluster, via module | **Stable** | Octave 3.8.2 | Octave | ```bash - $ module load Octave + $ module load Octave ``` The octave on the cluster is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, octave would parallelize to 24 threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. @@ -50,7 +50,7 @@ This script may be submitted directly to the PBS workload manager via the qsub c The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. ```bash - $ mkoctfile -v + $ mkoctfile -v ``` Octave may use MPI for interprocess communication This functionality is currently not supported on the cluster cluster. In case you require the octave interface to MPI, please contact our [cluster support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/salomon/software/numerical-languages/r.md b/docs.it4i/salomon/software/numerical-languages/r.md index 9afa31655aa34f07ff217c5ece8f6de298e691e2..138e4da07151f4e9e802ef447c8ad7bdad7ec190 100644 --- a/docs.it4i/salomon/software/numerical-languages/r.md +++ b/docs.it4i/salomon/software/numerical-languages/r.md @@ -96,7 +96,7 @@ Download the package [parallell](package-parallel-vignette.pdf) vignette. The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. !!! warning - Forking via package parallel provides functionality similar to OpenMP construct omp parallel for + Forking via package parallel provides functionality similar to OpenMP construct omp parallel for Only cores of single node can be utilized this way! diff --git a/docs.it4i/salomon/storage.md b/docs.it4i/salomon/storage.md index 27cbead8d13496015adeb99aad9425115370c5fe..0edbcd5db799a8517bbe8ffff128f9c552d70832 100644 --- a/docs.it4i/salomon/storage.md +++ b/docs.it4i/salomon/storage.md @@ -34,15 +34,15 @@ The architecture of Lustre on Salomon is composed of two metadata servers (MDS) Configuration of the SCRATCH Lustre storage -- SCRATCH Lustre object storage - - Disk array SFA12KX - - 540 x 4 TB SAS 7.2krpm disk - - 54 x OST of 10 disks in RAID6 (8+2) - - 15 x hot-spare disk - - 4 x 400 GB SSD cache -- SCRATCH Lustre metadata storage - - Disk array EF3015 - - 12 x 600 GB SAS 15 krpm disk +* SCRATCH Lustre object storage + * Disk array SFA12KX + * 540 x 4 TB SAS 7.2krpm disk + * 54 x OST of 10 disks in RAID6 (8+2) + * 15 x hot-spare disk + * 4 x 400 GB SSD cache +* SCRATCH Lustre metadata storage + * Disk array EF3015 + * 12 x 600 GB SAS 15 krpm disk ### Understanding the Lustre File Systems @@ -60,7 +60,7 @@ There is default stripe configuration for Salomon Lustre file systems. However, 2. stripe_count the number of OSTs to stripe across; default is 1 for Salomon Lustre file systems one can specify -1 to use all OSTs in the file system. 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. -!!! Note "Note" +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -94,14 +94,14 @@ $ man lfs ### Hints on Lustre Stripping -!!! Note "Note" +!!! note Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. -!!! Note "Note" +!!! note Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -219,7 +219,7 @@ Default ACL mechanism can be used to replace setuid/setgid permissions on direct Users home directories /home/username reside on HOME file system. Accessible capacity is 0.5 PB, shared among all users. Individual users are restricted by file system usage quotas, set to 250 GB per user. If 250 GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" +!!! note The HOME file system is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME should not be used to archive data of past Projects or other unrelated data. @@ -240,14 +240,14 @@ The workspace is backed up, such that it can be restored in case of catasthropic The WORK workspace resides on SCRATCH file system. Users may create subdirectories and files in directories **/scratch/work/user/username** and **/scratch/work/project/projectid. **The /scratch/work/user/username is private to user, much like the home directory. The /scratch/work/project/projectid is accessible to all users involved in project projectid. -!!! Note "Note" +!!! note The WORK workspace is intended to store users project data as well as for high performance access to input and output files. All project data should be removed once the project is finished. The data on the WORK workspace are not backed up. Files on the WORK file system are **persistent** (not automatically deleted) throughout duration of the project. The WORK workspace is hosted on SCRATCH file system. The SCRATCH is realized as Lustre parallel file system and is available from all login and computational nodes. Default stripe size is 1 MB, stripe count is 1. There are 54 OSTs dedicated for the SCRATCH file system. -!!! Note "Note" +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | WORK workspace | | @@ -265,7 +265,7 @@ The WORK workspace is hosted on SCRATCH file system. The SCRATCH is realized as The TEMP workspace resides on SCRATCH file system. The TEMP workspace accesspoint is /scratch/temp. Users may freely create subdirectories and files on the workspace. Accessible capacity is 1.6 PB, shared among all users on TEMP and WORK. Individual users are restricted by file system usage quotas, set to 100 TB per user. The purpose of this quota is to prevent runaway programs from filling the entire file system and deny service to other users. >If 100 TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" +!!! note The TEMP workspace is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the TEMP workspace as their working directory. Users are advised to save the necessary data from the TEMP workspace to HOME or WORK after the calculations and clean up the scratch files. @@ -274,7 +274,7 @@ The TEMP workspace resides on SCRATCH file system. The TEMP workspace accesspoin The TEMP workspace is hosted on SCRATCH file system. The SCRATCH is realized as Lustre parallel file system and is available from all login and computational nodes. Default stripe size is 1 MB, stripe count is 1. There are 54 OSTs dedicated for the SCRATCH file system. -!!! Note "Note" +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | TEMP workspace | | @@ -292,14 +292,14 @@ The TEMP workspace is hosted on SCRATCH file system. The SCRATCH is realized as Every computational node is equipped with file system realized in memory, so called RAM disk. -!!! Note "Note" +!!! note Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk file system is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk file system is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. Size of RAM disk file system is limited. Be very careful, use of RAM disk file system is at the expense of operational memory. It is not recommended to allocate large amount of memory and use large amount of data in RAM disk file system at the same time. -!!! Note +!!! note The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | RAM disk | | @@ -323,7 +323,7 @@ The local RAM disk file system is intended for temporary scratch data generated Do not use shared file systems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. -!!! Note "Note" +!!! note The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -342,15 +342,15 @@ The procedure to obtain the CESNET access is quick and trouble-free. ### Understanding CESNET Storage -!!! Note "Note" +!!! note It is very important to understand the CESNET storage before uploading data. [Please read](<https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first>) Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access -!!! Note "Note" - SSHFS: The storage will be mounted like a local hard drive +!!! note + SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -394,8 +394,8 @@ Once done, please remember to unmount the storage ### Rsync Access -!!! Note "Note" - Rsync provides delta transfer for best performance, can resume interrupted transfers +!!! note + Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. diff --git a/docs.it4i/software/lmod.md b/docs.it4i/software/lmod.md index 5ba63f7e03762e356a0d74cfb4eb4826682314a6..00e70819ce8c8bac76f0d14d42b39c10afcd0a67 100644 --- a/docs.it4i/software/lmod.md +++ b/docs.it4i/software/lmod.md @@ -108,7 +108,7 @@ $ ml spider gcc ``` !!! tip - spider is case-insensitive. + spider is case-insensitive. If you use spider on a full module name like GCC/6.2.0-2.27 it will tell on which cluster(s) that module available: @@ -148,7 +148,7 @@ Use "module keyword key1 key2 ..." to search for all possible modules matching a ``` !!! tip - the specified software name is treated case-insensitively. + the specified software name is treated case-insensitively. Lmod does a partial match on the module name, so sometimes you need to use / to indicate the end of the software name you are interested in: @@ -196,7 +196,7 @@ setenv("EBEXTSLISTPYTHON","setuptools-20.1.1,pip-8.0.2,nose-1.3.7") ``` !!! tip - Note that both the direct changes to the environment as well as other modules that will be loaded are shown. + Note that both the direct changes to the environment as well as other modules that will be loaded are shown. If you're not sure what all of this means: don't worry, you don't have to know; just try loading the module as try using the software. @@ -224,12 +224,12 @@ Currently Loaded Modules: ``` !!! tip - Note that even though we only loaded a single module, the output of ml shows that a whole bunch of modules were loaded, which are required dependencies for intel/2017.00. + Note that even though we only loaded a single module, the output of ml shows that a whole bunch of modules were loaded, which are required dependencies for intel/2017.00. ## Conflicting Modules !!! warning - It is important to note that **only modules that are compatible with each other can be loaded together. In particular, modules must be installed either with the same toolchain as the modules that** are already loaded, or with a compatible (sub)toolchain. + It is important to note that **only modules that are compatible with each other can be loaded together. In particular, modules must be installed either with the same toolchain as the modules that** are already loaded, or with a compatible (sub)toolchain. For example, once you have loaded one or more modules that were installed with the intel/2017.00 toolchain, all other modules that you load should have been installed with the same toolchain.