From 45410dfae77fd2a7930d940e1ac0d32ad071be32 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Hrb=C3=A1=C4=8D?= <david@hrbac.cz> Date: Fri, 27 Jan 2017 08:47:12 +0100 Subject: [PATCH] Lowercase --- .../capacity-computing.md | 16 +++++----- .../environment-and-modules.md | 4 +-- .../job-priority.md | 4 +-- .../job-submission-and-execution.md | 20 ++++++------ .../anselm-cluster-documentation/network.md | 2 +- .../anselm-cluster-documentation/prace.md | 4 +-- .../remote-visualization.md | 4 +-- .../resource-allocation-and-job-execution.md | 6 ++-- .../resources-allocation-policy.md | 6 ++-- .../shell-and-data-access.md | 12 +++---- .../software/chemistry/molpro.md | 2 +- .../software/compilers.md | 2 +- .../software/debuggers/allinea-ddt.md | 2 +- .../debuggers/allinea-performance-reports.md | 2 +- .../software/debuggers/cube.md | 2 +- .../intel-performance-counter-monitor.md | 2 +- .../debuggers/intel-vtune-amplifier.md | 6 ++-- .../software/debuggers/papi.md | 2 +- .../software/debuggers/scalasca.md | 2 +- .../software/debuggers/total-view.md | 6 ++-- ...intel-integrated-performance-primitives.md | 2 +- .../software/intel-suite/intel-mkl.md | 4 +-- .../software/intel-suite/intel-tbb.md | 2 +- .../software/intel-xeon-phi.md | 30 ++++++++--------- .../software/isv_licenses.md | 2 +- .../software/kvirtualization.md | 8 ++--- .../software/mpi/Running_OpenMPI.md | 10 +++--- .../software/mpi/mpi.md | 8 ++--- .../software/mpi/running-mpich2.md | 6 ++-- .../software/numerical-languages/matlab.md | 8 ++--- .../numerical-languages/matlab_1314.md | 6 ++-- .../software/numerical-languages/octave.md | 2 +- .../software/numerical-languages/r.md | 6 ++-- .../software/numerical-libraries/hdf5.md | 2 +- .../magma-for-intel-xeon-phi.md | 14 ++++---- .../software/nvidia-cuda.md | 2 +- .../software/openfoam.md | 6 ++-- .../anselm-cluster-documentation/storage.md | 30 ++++++++--------- .../graphical-user-interface/vnc.md | 6 ++-- .../x-window-system.md | 2 +- .../accessing-the-clusters/introduction.md | 2 +- .../shell-access-and-data-transfer/putty.md | 2 +- .../ssh-keys.md | 6 ++-- .../accessing-the-clusters/vpn1-access.md | 2 +- docs.it4i/index.md | 4 +-- docs.it4i/salomon/capacity-computing.md | 16 +++++----- docs.it4i/salomon/environment-and-modules.md | 4 +-- docs.it4i/salomon/job-priority.md | 4 +-- .../salomon/job-submission-and-execution.md | 32 +++++++++---------- docs.it4i/salomon/prace.md | 4 +-- .../resource-allocation-and-job-execution.md | 4 +-- .../salomon/resources-allocation-policy.md | 10 +++--- docs.it4i/salomon/shell-and-data-access.md | 12 +++---- .../salomon/software/chemistry/molpro.md | 2 +- .../salomon/software/chemistry/phono3py.md | 2 +- docs.it4i/salomon/software/compilers.md | 2 +- .../salomon/software/debuggers/aislinn.md | 2 +- .../salomon/software/debuggers/allinea-ddt.md | 2 +- .../debuggers/intel-vtune-amplifier.md | 2 +- .../salomon/software/debuggers/total-view.md | 4 +-- docs.it4i/salomon/software/intel-xeon-phi.md | 30 ++++++++--------- .../salomon/software/mpi/Running_OpenMPI.md | 2 +- docs.it4i/salomon/software/mpi/mpi.md | 2 +- .../software/numerical-languages/matlab.md | 4 +-- docs.it4i/salomon/storage.md | 28 ++++++++-------- 65 files changed, 223 insertions(+), 223 deletions(-) diff --git a/docs.it4i/anselm-cluster-documentation/capacity-computing.md b/docs.it4i/anselm-cluster-documentation/capacity-computing.md index 3180cd447..13e578aaa 100644 --- a/docs.it4i/anselm-cluster-documentation/capacity-computing.md +++ b/docs.it4i/anselm-cluster-documentation/capacity-computing.md @@ -6,7 +6,7 @@ In many cases, it is useful to submit huge (>100+) number of computational jobs However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array** -!!! Note +!!! note Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. - Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs @@ -20,7 +20,7 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays -!!! Note +!!! note Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: @@ -149,7 +149,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel -!!! Note +!!! note Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. @@ -216,17 +216,17 @@ $ qsub -N JOBNAME jobscript In this example, we submit a job of 101 tasks. 16 input files will be processed in parallel. The 101 tasks on 16 cores are assumed to complete in less than 2 hours. -!!! Hint +!!! hint Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Job Arrays and GNU Parallel -!!! Note +!!! note Combine the Job arrays and GNU parallel for best throughput of single core jobs While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. -!!! Note +!!! note Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU Parallel, Shared jobscript @@ -281,7 +281,7 @@ cp output $PBS_O_WORKDIR/$TASK.out In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. -!!! Note +!!! note Select subjob walltime and number of tasks per subjob carefully When deciding this values, think about following guiding rules: @@ -301,7 +301,7 @@ $ qsub -N JOBNAME -J 1-992:32 jobscript In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**, this must be the same as the number sent to numtasks file. Each subjob will run on full node and process 16 input files in parallel, 32 in total per subjob. Every subjob is assumed to complete in less than 2 hours. -!!! Hint +!!! hint Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Examples diff --git a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md index 2506efb2e..1439c6733 100644 --- a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md +++ b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md @@ -23,14 +23,14 @@ then fi ``` -!!! Note +!!! note Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example. ### Application Modules In order to configure your shell for running particular application on Anselm we use Module package interface. -!!! Note +!!! note The modules set up the application paths, library paths and environment variables for running particular application. We have also second modules repository. This modules repository is created using tool called EasyBuild. On Salomon cluster, all modules will be build by this tool. If you want to use software from this modules repository, please follow instructions in section [Application Modules Path Expansion](environment-and-modules/#EasyBuild). diff --git a/docs.it4i/anselm-cluster-documentation/job-priority.md b/docs.it4i/anselm-cluster-documentation/job-priority.md index 8d72dde77..2eebe9d54 100644 --- a/docs.it4i/anselm-cluster-documentation/job-priority.md +++ b/docs.it4i/anselm-cluster-documentation/job-priority.md @@ -35,7 +35,7 @@ usage<sub>Total</sub> is total usage by all users, by all projects. Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut in half periodically, at the interval 168 hours (one week). Jobs queued in queue qexp are not calculated to project's usage. -!!! Note +!!! note Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -64,7 +64,7 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. -!!! Note +!!! note It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md index ebb2b7cd5..2f76f9280 100644 --- a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md +++ b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md @@ -11,7 +11,7 @@ When allocating computational resources for the job, please specify 5. Project ID 6. Jobscript or interactive switch -!!! Note +!!! note Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: @@ -132,7 +132,7 @@ Although this example is somewhat artificial, it demonstrates the flexibility of ## Job Management -!!! Note +!!! note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands ```bash @@ -213,7 +213,7 @@ Run loop 3 In this example, we see actual output (some iteration loops) of the job 35141.dm2 -!!! Note +!!! note Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel**, **qsig** or **qalter** commands You may release your allocation at any time, using qdel command @@ -238,12 +238,12 @@ $ man pbs_professional ### Jobscript -!!! Note +!!! note Prepare the jobscript to run batch jobs in the PBS queue system The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager. -!!! Note +!!! note The jobscript or interactive shell is executed on first of the allocated nodes. ```bash @@ -273,7 +273,7 @@ $ pwd In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory. -!!! Note +!!! note All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well. @@ -305,7 +305,7 @@ In this example, the hostname program is executed via pdsh from the interactive ### Example Jobscript for MPI Calculation -!!! Note +!!! note Production jobs must use the /scratch directory for I/O The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. @@ -337,12 +337,12 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. -!!! Note +!!! note Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -!!! Note +!!! note Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. Example jobscript for an MPI job with preloaded inputs and executables, options for qsub are stored within the script : @@ -375,7 +375,7 @@ sections. ### Example Jobscript for Single Node Calculation -!!! Note +!!! note Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Example jobscript for single node calculation, using [local scratch](storage/) on the node: diff --git a/docs.it4i/anselm-cluster-documentation/network.md b/docs.it4i/anselm-cluster-documentation/network.md index 9a4a341c3..a682f44ff 100644 --- a/docs.it4i/anselm-cluster-documentation/network.md +++ b/docs.it4i/anselm-cluster-documentation/network.md @@ -8,7 +8,7 @@ All compute and login nodes of Anselm are interconnected by a high-bandwidth, lo The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes. -!!! Note +!!! note The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol. The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently. diff --git a/docs.it4i/anselm-cluster-documentation/prace.md b/docs.it4i/anselm-cluster-documentation/prace.md index 9904b34c9..1240fc018 100644 --- a/docs.it4i/anselm-cluster-documentation/prace.md +++ b/docs.it4i/anselm-cluster-documentation/prace.md @@ -235,10 +235,10 @@ PRACE users should check their project accounting using the [PRACE Accounting To Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". -!!! Note +!!! note You need to know your user password to use the command. Displayed core hours are "system core hours" which differ from PRACE "standardized core hours". -!!! Hint +!!! hint The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> ```bash diff --git a/docs.it4i/anselm-cluster-documentation/remote-visualization.md b/docs.it4i/anselm-cluster-documentation/remote-visualization.md index b448ef682..fc01d1879 100644 --- a/docs.it4i/anselm-cluster-documentation/remote-visualization.md +++ b/docs.it4i/anselm-cluster-documentation/remote-visualization.md @@ -41,7 +41,7 @@ Please [follow the documentation](shell-and-data-access/). To have the OpenGL acceleration, **24 bit color depth must be used**. Otherwise only the geometry (desktop size) definition is needed. -!!! Hint +!!! hint At first VNC server run you need to define a password. This example defines desktop with dimensions 1200x700 pixels and 24 bit color depth. @@ -138,7 +138,7 @@ qviz**. The queue has following properties: Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 16 GB of RAM and 1/4 of the GPU capacity. -!!! Note +!!! note If more GPU power or RAM is required, it is recommended to allocate one whole node per user, so that all 16 cores, whole RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. To access the visualization node, follow these steps: diff --git a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md index 767e5bcdd..23e5ba2c6 100644 --- a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md +++ b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md @@ -12,14 +12,14 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qnvidia**, **qmic**, **qfat**, the Dedicated queues - **qfree**, the Free resource utilization queue -!!! Note +!!! note Check the queue status at <https://extranet.it4i.cz/anselm/> Read more on the [Resource AllocationPolicy](resources-allocation-policy/) page. ## Job Submission and Execution -!!! Note +!!! note Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 16 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** @@ -28,7 +28,7 @@ Read more on the [Job submission and execution](job-submission-and-execution/) p ## Capacity Computing -!!! Note +!!! note Use Job arrays when running huge number of jobs. Use GNU Parallel and/or Job arrays when running (many) single core jobs. diff --git a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md index eab7a56ad..b54f94b13 100644 --- a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md +++ b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md @@ -4,7 +4,7 @@ The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The Fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: -!!! Note +!!! note Check the queue status at <https://extranet.it4i.cz/anselm/> | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | @@ -15,7 +15,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const | qnvidia, qmic, qfat | yes | 0 | 23 total qnvidia4 total qmic2 total qfat | 16 | 200 | yes | 24/48 h | | qfree | yes | none required | 178 w/o accelerator | 16 | -1024 | no | 12 h | -!!! Note +!!! note **The qfree queue is not free of charge**. [Normal accounting](#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. **The qexp queue is equipped with the nodes not having the very same CPU clock speed.** Should you need the very same CPU speed, you have to select the proper nodes during the PSB job submission. @@ -113,7 +113,7 @@ The resources that are currently subject to accounting are the core-hours. The c ### Check Consumed Resources -!!! Note +!!! note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. diff --git a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md index d830fa79d..d4cdbf854 100644 --- a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md +++ b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md @@ -53,7 +53,7 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com Example to the cluster login: -!!! Note +!!! note The environment is **not** shared between login nodes, except for [shared filesystems](storage/#shared-filesystems). ## Data Transfer @@ -69,14 +69,14 @@ Data in and out of the system may be transferred by the [scp](http://en.wikipedi The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) -!!! Note +!!! note Data transfer rates up to **160MB/s** can be achieved with scp or sftp. 1TB may be transferred in 1:50h. To achieve 160MB/s transfer rates, the end user must be connected by 10G line all the way to IT4Innovations and use computer with fast processor for the transfer. Using Gigabit ethernet connection, up to 110MB/s may be expected. Fast cipher (aes128-ctr) should be used. -!!! Note +!!! note If you experience degraded data transfer performance, consult your local network provider. On linux or Mac, use scp or sftp client to transfer the data to Anselm: @@ -126,7 +126,7 @@ Outgoing connections, from Anselm Cluster login nodes to the outside world, are | 443 | https | | 9418 | git | -!!! Note +!!! note Please use **ssh port forwarding** and proxy servers to connect from Anselm to all other remote ports. Outgoing connections, from Anselm Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. @@ -135,7 +135,7 @@ Outgoing connections, from Anselm Cluster compute nodes are restricted to the in ### Port Forwarding From Login Nodes -!!! Note +!!! note Port forwarding allows an application running on Anselm to connect to arbitrary remote host and port. It works by tunneling the connection from Anselm back to users workstation and forwarding from the workstation to the remote host. @@ -177,7 +177,7 @@ In this example, we assume that port forwarding from login1:6000 to remote.host. Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. -!!! Note +!!! note Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: diff --git a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md index 7dc65a08d..e8827e17c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md +++ b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md @@ -32,7 +32,7 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. -!!! Note +!!! note The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH file system](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch file system. diff --git a/docs.it4i/anselm-cluster-documentation/software/compilers.md b/docs.it4i/anselm-cluster-documentation/software/compilers.md index 86f354ba1..829b840b1 100644 --- a/docs.it4i/anselm-cluster-documentation/software/compilers.md +++ b/docs.it4i/anselm-cluster-documentation/software/compilers.md @@ -104,7 +104,7 @@ As default UPC network the "smp" is used. This is very quick and easy way for te For production runs, it is recommended to use the native Infiband implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. -!!! Warning +!!! warning Selection of the network is done at the compile time and not at runtime (as expected)! Example UPC code: diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md index 7d2fe37aa..bdd6b6390 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md @@ -47,7 +47,7 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: -!!! Note +!!! note - **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - **O0** : Suppress all optimizations. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md index e27f426d6..ad8d74d77 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md @@ -20,7 +20,7 @@ The module sets up environment variables, required for using the Allinea Perform ## Usage -!!! Note +!!! note Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi/), use the the perf report wrapper: diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md index 78ad34845..98280c4b3 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md @@ -27,7 +27,7 @@ Currently, there are two versions of CUBE 4.2.3 available as [modules](../../env CUBE is a graphical application. Refer to Graphical User Interface documentation for a list of methods to launch graphical applications on Anselm. -!!! Note +!!! note Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes. After loading the appropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md index ea4e99fed..d9b878254 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md @@ -192,7 +192,7 @@ Can be used as a sensor for ksysguard GUI, which is currently not installed on A In a similar fashion to PAPI, PCM provides a C++ API to access the performance counter from within your application. Refer to the [Doxygen documentation](http://intel-pcm-api-documentation.github.io/classPCM.html) for details of the API. -!!! Note +!!! note Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user) Sample program using the API : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md index d1d65bb38..b8f8cb8f3 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md @@ -26,7 +26,7 @@ and launch the GUI : $ amplxe-gui ``` -!!! Note +!!! note To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications. The GUI will open in new window. Click on "_New Project..._" to create a new project. After clicking _OK_, a new window with project properties will appear. At "_Application:_", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "_Managed code profiling mode:_" select "_Native_" (unless you want to profile managed mode .NET/Mono applications). After clicking _OK_, your project is created. @@ -47,7 +47,7 @@ Copy the line to clipboard and then you can paste it in your jobscript or in com ## Xeon Phi -!!! Note +!!! note This section is outdated. It will be updated with new information soon. It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties: @@ -58,7 +58,7 @@ Application parameters: mic0 source ~/.profile && /path/to/your/bin Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/). -!!! Note +!!! note If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md index 28542810b..150abec75 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md @@ -190,7 +190,7 @@ Now the compiler won't remove the multiplication loop. (However it is still not ### Intel Xeon Phi -!!! Note +!!! note PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing. To use PAPI in [Intel Xeon Phi](../intel-xeon-phi/) native applications, you need to load module with " -mic" suffix, for example " papi/5.3.2-mic" : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md index fa784f688..3202123d3 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md @@ -42,7 +42,7 @@ Some notable Scalasca options are: - **-t Enable trace data collection. By default, only summary data are collected.** - **-e <directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.** -!!! Note +!!! note Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/). ### Analysis of Reports diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md index fa8343300..2265a89b6 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md @@ -57,7 +57,7 @@ Compile the code: Before debugging, you need to compile your code with theses flags: -!!! Note +!!! note - **-g** : Generates extra debugging information usable by GDB. **-g3** includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - **-O0** : Suppress all optimizations. @@ -91,7 +91,7 @@ To debug a serial code use: To debug a parallel code compiled with **OpenMPI** you need to setup your TotalView environment: -!!! Hint +!!! hint To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your `~/.tvdrc` file: ```bash @@ -120,7 +120,7 @@ The source code of this function can be also found in /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl ``` -!!! Note +!!! note You can also add only following line to you ~/.tvdrc file instead of the entire function: **source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl** diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md index 2cc7fc404..b92f8d05f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md @@ -4,7 +4,7 @@ Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more. -!!! Note +!!! note Check out IPP before implementing own math functions for data processing, it is likely already there. ```bash diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md index a1c64a4bc..a86f6667f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md @@ -23,7 +23,7 @@ Intel MKL version 13.5.192 is available on Anselm The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES -!!! Note +!!! note The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. ### Interfaces @@ -47,7 +47,7 @@ You will need the mkl module loaded to run the mkl enabled executable. This may ### Threading -!!! Note +!!! note Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md index 1dfd536d2..3c2495ba8 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md @@ -13,7 +13,7 @@ Intel TBB version 4.1 is available on Anselm The module sets up environment variables, required for linking and running tbb enabled applications. -!!! Note +!!! note Link the tbb library, using -ltbb ## Examples diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md index 75424f383..0390dff94 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md @@ -105,7 +105,7 @@ export OFFLOAD_REPORT=3 A very basic example of code that employs offload programming technique is shown in the next listing. -!!! Note +!!! note This code is sequential and utilizes only single core of the accelerator. ```bash @@ -232,7 +232,7 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: -!!! Note +!!! note Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level @@ -329,7 +329,7 @@ Following example show how to automatically offload an SGEMM (single precision - } ``` -!!! Note +!!! note This example is simplified version of an example from MKL. The expanded version can be found here: `$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c`. To compile a code using Intel compiler use: @@ -372,7 +372,7 @@ To compile a code user has to be connected to a compute with MIC and load Intel $ module load intel/13.5.192 ``` -!!! Note +!!! note Particular version of the Intel module is specified. This information is used later to specify the correct library paths. To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: @@ -415,12 +415,12 @@ If the code is parallelized using OpenMP a set of additional libraries is requir mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` -!!! Note +!!! note The path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: -!!! Note +!!! note /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so @@ -501,7 +501,7 @@ After executing the complied binary file, following output should be displayed. ... ``` -!!! Note +!!! note More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -541,7 +541,7 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ... ``` -!!! Warning +!!! warning GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. ## MPI @@ -603,7 +603,7 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: -!!! Note +!!! note **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. @@ -650,7 +650,7 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` -!!! Note +!!! note - this file sets up both environmental variable for both MPI and OpenMP libraries. - this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. @@ -703,7 +703,7 @@ or using mpirun $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` -!!! Note +!!! note - the full path to the binary has to specified (here: `>~/mpi-test-mic`) - the `LD_LIBRARY_PATH` has to match with Intel MPI module used to compile the MPI code @@ -716,7 +716,7 @@ The output should be again similar to: Hello world from process 0 of 4 on host cn207-mic0 ``` -!!! Note +!!! note `mpiexec.hydra` requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: @@ -751,7 +751,7 @@ For example: This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. -!!! Note +!!! note At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - to connect to the second node : `$ ssh cn205` - to connect to the accelerator on the first node from the first node: `$ ssh cn204-mic0` or `$ ssh mic0` @@ -883,14 +883,14 @@ A possible output of the MPI "hello-world" example executed on two hosts and two Hello world from process 7 of 8 on host cn205-mic0 ``` -!!! Note +!!! note At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. **Using the PBS automatically generated node-files** PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -!!! Note +!!! note **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: diff --git a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md index d930dfc50..883ceea45 100644 --- a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md +++ b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md @@ -10,7 +10,7 @@ If an ISV application was purchased for educational (research) purposes and also ## Overview of the Licenses Usage -!!! Note +!!! note The overview is generated every minute and is accessible from web or command line interface. ### Web Interface diff --git a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md index ea20bafc3..e3dcaaf3d 100644 --- a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md +++ b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md @@ -26,10 +26,10 @@ Virtualization has also some drawbacks, it is not so easy to setup efficient sol Solution described in chapter [HOWTO](virtualization/#howto) is suitable for single node tasks, does not introduce virtual machine clustering. -!!! Note +!!! note Please consider virtualization as last resort solution for your needs. -!!! Warning +!!! warning Please consult use of virtualization with IT4Innovation's support. For running Windows application (when source code and Linux native application are not available) consider use of Wine, Windows compatibility layer. Many Windows applications can be run using Wine with less effort and better performance than when using virtualization. @@ -38,7 +38,7 @@ For running Windows application (when source code and Linux native application a IT4Innovations does not provide any licenses for operating systems and software of virtual machines. Users are ( in accordance with [Acceptable use policy document](http://www.it4i.cz/acceptable-use-policy.pdf)) fully responsible for licensing all software running in virtual machines on Anselm. Be aware of complex conditions of licensing software in virtual environments. -!!! Note +!!! note Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines. ## Howto @@ -248,7 +248,7 @@ Run virtual machine using optimized devices, user network back-end with sharing Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 for SSH). You must use VPN network). -!!! Note +!!! note Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine. ### Networking and Data Sharing diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md index 37d9eab16..8e11a3c16 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md @@ -6,7 +6,7 @@ The OpenMPI programs may be executed only via the PBS Workload manager, by enter ### Basic Usage -!!! Note +!!! note Use the mpiexec to run the OpenMPI code. Example: @@ -27,7 +27,7 @@ Example: Hello world! from rank 3 of 4 on host cn110 ``` -!!! Note +!!! note Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node. In this example, we allocate 4 nodes via the express queue interactively. We set up the openmpi environment and interactively run the helloworld_mpi.x program. Note that the executable helloworld_mpi.x must be available within the @@ -48,7 +48,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi In this example, we assume the executable helloworld_mpi.x is present on compute node cn17 on local scratch. We call the mpiexec whith the **--preload-binary** argument (valid for openmpi). The mpiexec will copy the executable from cn17 to the /lscratch/15210.srv11 directory on cn108, cn109 and cn110 and execute the program. -!!! Note +!!! note MPI process mapping may be controlled by PBS parameters. The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. @@ -97,7 +97,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note +!!! note Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: @@ -152,7 +152,7 @@ In this example, we see that ranks have been mapped on nodes according to the or Exact control of MPI process placement and resource binding is provided by specifying a rankfile -!!! Note +!!! note Appropriate binding may boost performance of your application. Example rankfile diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md index 80f695b06..f16479286 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md @@ -60,7 +60,7 @@ In this example, the openmpi 1.6.5 using intel compilers is activated ## Compiling MPI Programs -!!! Note +!!! note After setting up your MPI environment, compile your program using one of the mpi wrappers ```bash @@ -107,7 +107,7 @@ Compile the above example with ## Running MPI Programs -!!! Note +!!! note The MPI program executable must be compatible with the loaded MPI module. Always compile and execute using the very same MPI module. @@ -119,7 +119,7 @@ The MPI program executable must be available within the same path on all nodes. Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern. -!!! Note +!!! note Consider these ways to run an MPI program: 1. One MPI process per node, 16 threads per process @@ -130,7 +130,7 @@ Optimal way to run an MPI program depends on its memory requirements, memory acc **Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and NUMA effect overheads. -!!! Note +!!! note Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md index b8ec1b2d4..1a8972c39 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md @@ -6,7 +6,7 @@ The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS ### Basic Usage -!!! Note +!!! note Use the mpirun to execute the MPICH2 code. Example: @@ -43,7 +43,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node. -!!! Note +!!! note MPI process mapping may be controlled by PBS parameters. The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. @@ -92,7 +92,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note +!!! note Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md index c46ea8160..0f2ea4921 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md @@ -41,7 +41,7 @@ plots, images, etc... will be still available. ## Running Parallel Matlab Using Distributed Computing Toolbox / Engine -!!! Note +!!! note Distributed toolbox is available only for the EDU variant The MPIEXEC mode available in previous versions is no longer available in MATLAB 2015. Also, the programming interface has changed. Refer to [Release Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1). @@ -64,7 +64,7 @@ Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click I With the new mode, MATLAB itself launches the workers via PBS, so you can either use interactive mode or a batch mode on one node, but the actual parallel processing will be done in a separate job started by MATLAB itself. Alternatively, you can use "local" mode to run parallel code on just a single node. -!!! Note +!!! note The profile is confusingly named Salomon, but you can use it also on Anselm. ### Parallel Matlab Interactive Session @@ -133,7 +133,7 @@ The last part of the configuration is done directly in the user Matlab script be This script creates scheduler object "cluster" of type "local" that starts workers locally. -!!! Note +!!! note Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. @@ -217,7 +217,7 @@ You can start this script using batch mode the same way as in Local mode example This method is a "hack" invented by us to emulate the mpiexec functionality found in previous MATLAB versions. We leverage the MATLAB Generic Scheduler interface, but instead of submitting the workers to PBS, we launch the workers directly within the running job, thus we avoid the issues with master script and workers running in separate jobs (issues with license not available, waiting for the worker's job to spawn etc.) -!!! Warning +!!! warning This method is experimental. For this method, you need to use SalomonDirect profile, import it using [the same way as SalomonPBSPro](matlab/#running-parallel-matlab-using-distributed-computing-toolbox---engine) diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md index 9ab8ff3f2..d7a2373b5 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md @@ -2,7 +2,7 @@ ## Introduction -!!! Note +!!! note This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/). Matlab is available in the latest stable version. There are always two variants of the release: @@ -71,7 +71,7 @@ extras = {}; System MPI library allows Matlab to communicate through 40 Gbit/s InfiniBand QDR interconnect instead of slower 1 Gbit Ethernet network. -!!! Note +!!! note The path to MPI library in "mpiLibConf.m" has to match with version of loaded Intel MPI module. In this example the version 4.1.1.036 of Intel MPI is used by Matlab and therefore module impi/4.1.1.036 has to be loaded prior to starting Matlab. ### Parallel Matlab Interactive Session @@ -144,7 +144,7 @@ set(sched, 'EnvironmentSetMethod', 'setenv'); This script creates scheduler object "sched" of type "mpiexec" that starts workers using mpirun tool. To use correct version of mpirun, the second line specifies the path to correct version of system Intel MPI library. -!!! Note +!!! note Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling matlabpool(sched, ...) function. The last step is to start matlabpool with "sched" object and correct number of workers. In this case qsub asked for total number of 32 cores, therefore the number of workers is also set to 32. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md index 76dcc3b30..306cc1a17 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md @@ -96,7 +96,7 @@ A version of [native](../intel-xeon-phi/#section-4) Octave is compiled for Xeon Octave is linked with parallel Intel MKL, so it best suited for batch processing of tasks that utilize BLAS, LAPACK and FFT operations. By default, number of threads is set to 120, you can control this with > OMP_NUM_THREADS environment variable. -!!! Note +!!! note Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. To use Octave on a node with Xeon Phi: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md index 39a8dbf1c..56426eb06 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md @@ -95,7 +95,7 @@ Download the package [parallell](package-parallel-vignette.pdf) vignette. The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. -!!! Note +!!! note Forking via package parallel provides functionality similar to OpenMP construct omp parallel for @@ -146,7 +146,7 @@ Every evaluation of the integrad function runs in parallel on different process. ## Package Rmpi -!!! Note +!!! note package Rmpi provides an interface (wrapper) to MPI APIs. It also provides interactive R slave environment. On Anselm, Rmpi provides interface to the [OpenMPI](../mpi-1/Running_OpenMPI/). @@ -296,7 +296,7 @@ Execute the example as: mpi.apply is a specific way of executing Dynamic Rmpi programs. -!!! Note +!!! note mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. Execution is identical to other dynamic Rmpi programs. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md index c222f768b..35ffe1775 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md @@ -22,7 +22,7 @@ Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, comp The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. -!!! Note +!!! note Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> All GCC versions of **HDF5 1.8.13** are not affected by the bug, are compiled with -O3 optimizations and are recommended for production computations. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md index 8637723c6..c4b1c262b 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -12,10 +12,10 @@ To be able to compile and link code with MAGMA library user has to load followin To make compilation more user friendly module also sets these two environment variables: -!!! Note +!!! note MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) -!!! Note +!!! note MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). Compilation example: @@ -30,15 +30,15 @@ Compilation example: MAGMA implementation for Intel MIC requires a MAGMA server running on accelerator prior to executing the user application. The server can be started and stopped using following scripts: -!!! Note +!!! note To start MAGMA server use: **$MAGMAROOT/start_magma_server** -!!! Note +!!! note To stop the server use: **$MAGMAROOT/stop_magma_server** -!!! Note +!!! note For deeper understanding how the MAGMA server is started, see the following script: **$MAGMAROOT/launch_anselm_from_mic.sh** @@ -66,10 +66,10 @@ To test if the MAGMA server runs properly we can run one of examples that are pa 10304 10304 --- ( --- ) 500.70 ( 1.46) --- ``` -!!! Hint +!!! hint MAGMA contains several benchmarks and examples in `$MAGMAROOT/testing/` -!!! Note +!!! note MAGMA relies on the performance of all CPU cores as well as on the performance of the accelerator. Therefore on Anselm number of CPU OpenMP threads has to be set to 16 with `export OMP_NUM_THREADS=16`. See more details at [MAGMA home page](http://icl.cs.utk.edu/magma/). diff --git a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md index 062f0a69b..375d3732c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md +++ b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md @@ -280,7 +280,7 @@ SAXPY function multiplies the vector x by the scalar alpha and adds it to the ve } ``` -!!! Note +!!! note cuBLAS has its own function for data transfers between CPU and GPU memory: - [cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) - transfers data from CPU to GPU memory - [cublasGetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector) - transfers data from GPU to CPU memory diff --git a/docs.it4i/anselm-cluster-documentation/software/openfoam.md b/docs.it4i/anselm-cluster-documentation/software/openfoam.md index d52c0d9e7..33dd04165 100644 --- a/docs.it4i/anselm-cluster-documentation/software/openfoam.md +++ b/docs.it4i/anselm-cluster-documentation/software/openfoam.md @@ -57,7 +57,7 @@ To create OpenFOAM environment on ANSELM give the commands: $ source $FOAM_BASHRC ``` -!!! Note +!!! note Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SPâ€. Create a project directory within the $HOME/OpenFOAM directory named \<USER\>-\<OFversion\> and create a directory named run within it, e.g. by typing: @@ -120,7 +120,7 @@ Run the second case for example external incompressible turbulent flow - case - First we must run serial application bockMesh and decomposePar for preparation of parallel computation. -!!! Note +!!! note Create a Bash scrip test.sh: ```bash @@ -145,7 +145,7 @@ Job submission This job create simple block mesh and domain decomposition. Check your decomposition, and submit parallel computation: -!!! Note +!!! note Create a PBS script testParallel.pbs: ```bash diff --git a/docs.it4i/anselm-cluster-documentation/storage.md b/docs.it4i/anselm-cluster-documentation/storage.md index 264fe8e05..9d8f8385e 100644 --- a/docs.it4i/anselm-cluster-documentation/storage.md +++ b/docs.it4i/anselm-cluster-documentation/storage.md @@ -26,7 +26,7 @@ There is default stripe configuration for Anselm Lustre filesystems. However, us 2. stripe_count the number of OSTs to stripe across; default is 1 for Anselm Lustre filesystems one can specify -1 to use all OSTs in the filesystem. 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. -!!! Note +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -60,14 +60,14 @@ $ man lfs ### Hints on Lustre Stripping -!!! Note +!!! note Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. -!!! Note +!!! note Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -102,7 +102,7 @@ The architecture of Lustre on Anselm is composed of two metadata servers (MDS) The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 320TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 250GB per user. If 250GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note +!!! note The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME filesystem should not be used to archive data of past Projects or other unrelated data. @@ -114,7 +114,7 @@ The filesystem is backed up, such that it can be restored in case of catasthropi The HOME filesystem is realized as Lustre parallel filesystem and is available on all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for the HOME filesystem. -!!! Note +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | HOME filesystem | | @@ -131,7 +131,7 @@ Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for t The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note +!!! note The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. >Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. @@ -140,7 +140,7 @@ The SCRATCH filesystem is mounted in directory /scratch. Users may freely create The SCRATCH filesystem is realized as Lustre parallel filesystem and is available from all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 10 OSTs dedicated for the SCRATCH filesystem. -!!! Note +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | SCRATCH filesystem | | @@ -260,7 +260,7 @@ Default ACL mechanism can be used to replace setuid/setgid permissions on direct ### Local Scratch -!!! Note +!!! note Every computational node is equipped with 330GB local scratch disk. Use local scratch in case you need to access large amount of small files during your calculation. @@ -269,7 +269,7 @@ The local scratch disk is mounted as /lscratch and is accessible to user at /lsc The local scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs that access large number of small files within the calculation must use the local scratch filesystem as their working directory. This is required for performance reasons, as frequent access to number of small files may overload the metadata servers (MDS) of the Lustre filesystem. -!!! Note +!!! note The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | local SCRATCH filesystem | | @@ -284,14 +284,14 @@ The local scratch filesystem is intended for temporary scratch data generated d Every computational node is equipped with filesystem realized in memory, so called RAM disk. -!!! Note +!!! note Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. Size of RAM disk filesystem is limited. Be very careful, use of RAM disk filesystem is at the expense of operational memory. It is not recommended to allocate large amount of memory and use large amount of data in RAM disk filesystem at the same time. -!!! Note +!!! note The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | RAM disk | | @@ -320,7 +320,7 @@ Each node is equipped with local /tmp directory of few GB capacity. The /tmp dir Do not use shared filesystems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. -!!! Note +!!! note The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -339,14 +339,14 @@ The procedure to obtain the CESNET access is quick and trouble-free. ### Understanding CESNET Storage -!!! Note +!!! note It is very important to understand the CESNET storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access -!!! Note +!!! note SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -391,7 +391,7 @@ Once done, please remember to unmount the storage ### Rsync Access -!!! Note +!!! note Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md index 0947d1963..7d243fc01 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md @@ -6,7 +6,7 @@ The recommended clients are [TightVNC](http://www.tightvnc.com) or [TigerVNC](ht ## Create VNC Password -!!! Note +!!! note Local VNC password should be set before the first login. Do use a strong password. ```bash @@ -17,7 +17,7 @@ Verify: ## Start Vncserver -!!! Note +!!! note To access VNC a local vncserver must be started first and also a tunnel using SSH port forwarding must be established. [See below](vnc.md#linux-example-of-creating-a-tunnel) for the details on SSH tunnels. In this example we use port 61. @@ -63,7 +63,7 @@ username 10296 0.0 0.0 131772 21076 pts/29 SN 13:01 0:01 /usr/bin/Xvn To access the VNC server you have to create a tunnel between the login node using TCP **port 5961** and your machine using a free TCP port (for simplicity the very same, in this case). -!!! Note +!!! note The tunnel must point to the same login node where you launched the VNC server, eg. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin. ## Linux/Mac OS Example of Creating a Tunnel diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md index 1882a7515..9c1d75b80 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md @@ -52,7 +52,7 @@ Read more on [http://www.math.umn.edu/systems_guide/putty_xwin32.html](http://ww ## Running GUI Enabled Applications -!!! Note +!!! note Make sure that X forwarding is activated and the X server is running. Then launch the application as usual. Use the & to run the application in background. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md index 6f806156f..75aca80cb 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md @@ -2,7 +2,7 @@ The IT4Innovations clusters are accessed by SSH protocol via login nodes. -!!! Note +!!! note Read more on [Accessing the Salomon Cluster](../../salomon/shell-and-data-access.md) or [Accessing the Anselm Cluster](../../anselm-cluster-documentation/shell-and-data-access.md) pages. ## PuTTY diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md index 517076b07..63f10e882 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md @@ -4,7 +4,7 @@ We recommned you to download "**A Windows installer for everything except PuTTYtel**" with **Pageant** (SSH authentication agent) and **PuTTYgen** (PuTTY key generator) which is available [here](http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html). -!!! Note +!!! note After installation you can proceed directly to private keys authentication using ["Putty"](putty#putty). "Change Password for Existing Private Key" is optional. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md index 4fbb8aabf..ce59d33f2 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md @@ -16,7 +16,7 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys -rw-r--r-- 1 username username 392 May 21 2014 id_rsa.pub ``` -!!! Hint +!!! hint Private keys in .ssh directory are without passphrase and allow you to connect within the cluster. ## Access Privileges on .ssh Folder @@ -37,7 +37,7 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys ## Private Key -!!! Note +!!! note The path to a private key is usually /home/username/.ssh/ Private key file in "id_rsa" or `*.ppk` format is used to authenticate with the servers. Private key is present locally on local side and used for example in SSH agent Pageant (for Windows users). The private key should always be kept in a safe place. @@ -92,7 +92,7 @@ First, generate a new keypair of your public and private key: local $ ssh-keygen -C 'username@organization.example.com' -f additional_key ``` -!!! Note +!!! note Please, enter **strong** **passphrase** for securing your private key. You can insert additional public key into authorized_keys file for authentication with your own private key. Additional records in authorized_keys file must be delimited by new line. Users are not advised to remove the default public key from authorized_keys file. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md index 376f97240..3f8c1d955 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md @@ -2,7 +2,7 @@ ## Accessing IT4Innovations Internal Resources via VPN -!!! Note +!!! note **Failed to initialize connection subsystem Win 8.1 - 02-10-15 MS patch** Workaround can be found at [vpn-connection-fail-in-win-8.1](../../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.html) diff --git a/docs.it4i/index.md b/docs.it4i/index.md index 1ed5efa20..d6857876f 100644 --- a/docs.it4i/index.md +++ b/docs.it4i/index.md @@ -17,12 +17,12 @@ Use your IT4Innotations username and password to log in to the [support](http:// ## Required Proficiency -!!! Note +!!! note You need basic proficiency in Linux environment. In order to use the system for your calculations, you need basic proficiency in Linux environment. To gain the proficiency, we recommend you reading the [introduction to Linux](http://www.tldp.org/LDP/intro-linux/html/) operating system environment and installing a Linux distribution on your personal computer. A good choice might be the [CentOS](http://www.centos.org/) distribution, as it is similar to systems on the clusters at IT4Innovations. It's easy to install and use. In fact, any distribution would do. -!!! Note +!!! note Learn how to parallelize your code! In many cases, you will run your own code on the cluster. In order to fully exploit the cluster, you will need to carefully consider how to utilize all the cores available on the node and how to use multiple nodes at the same time. You need to **parallelize** your code. Proficieny in MPI, OpenMP, CUDA, UPC or GPI2 programming may be gained via the [training provided by IT4Innovations.](http://prace.it4i.cz) diff --git a/docs.it4i/salomon/capacity-computing.md b/docs.it4i/salomon/capacity-computing.md index 7c0404228..b72eb6fad 100644 --- a/docs.it4i/salomon/capacity-computing.md +++ b/docs.it4i/salomon/capacity-computing.md @@ -6,7 +6,7 @@ In many cases, it is useful to submit huge (100+) number of computational jobs i However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1500 per job array** -!!! Note +!!! note Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. - Use [Job arrays](capacity-computing.md#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs @@ -20,7 +20,7 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays -!!! Note +!!! note Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: @@ -151,7 +151,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel -!!! Note +!!! note Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. @@ -218,17 +218,17 @@ $ qsub -N JOBNAME jobscript In this example, we submit a job of 101 tasks. 24 input files will be processed in parallel. The 101 tasks on 24 cores are assumed to complete in less than 2 hours. -!!! Note +!!! note Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Job Arrays and GNU Parallel -!!! Note +!!! note Combine the Job arrays and GNU parallel for best throughput of single core jobs While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. -!!! Note +!!! note Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU Parallel, Shared jobscript @@ -283,7 +283,7 @@ cp output $PBS_O_WORKDIR/$TASK.out In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. -!!! Note +!!! note Select subjob walltime and number of tasks per subjob carefully When deciding this values, think about following guiding rules : @@ -303,7 +303,7 @@ $ qsub -N JOBNAME -J 1-992:32 jobscript In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**48**, this must be the same as the number sent to numtasks file. Each subjob will run on full node and process 24 input files in parallel, 48 in total per subjob. Every subjob is assumed to complete in less than 2 hours. -!!! Note +!!! note Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. ## Examples diff --git a/docs.it4i/salomon/environment-and-modules.md b/docs.it4i/salomon/environment-and-modules.md index f94fa017b..852b13ffb 100644 --- a/docs.it4i/salomon/environment-and-modules.md +++ b/docs.it4i/salomon/environment-and-modules.md @@ -23,7 +23,7 @@ then fi ``` -!!! Note +!!! note Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example. ### Application Modules @@ -56,7 +56,7 @@ Application modules on Salomon cluster are built using [EasyBuild](http://hpcuge vis: Visualization, plotting, documentation and typesetting ``` -!!! Note +!!! note The modules set up the application paths, library paths and environment variables for running particular application. The modules may be loaded, unloaded and switched, according to momentary needs. diff --git a/docs.it4i/salomon/job-priority.md b/docs.it4i/salomon/job-priority.md index 3f2693588..bb7623985 100644 --- a/docs.it4i/salomon/job-priority.md +++ b/docs.it4i/salomon/job-priority.md @@ -36,7 +36,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut # Jobs Queued in Queue qexp Are Not Calculated to Project's Usage. -!!! Note +!!! note Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -65,7 +65,7 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. -!!! Note +!!! note It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/salomon/job-submission-and-execution.md b/docs.it4i/salomon/job-submission-and-execution.md index 01d187007..0865e9c21 100644 --- a/docs.it4i/salomon/job-submission-and-execution.md +++ b/docs.it4i/salomon/job-submission-and-execution.md @@ -11,7 +11,7 @@ When allocating computational resources for the job, please specify 5. Project ID 6. Jobscript or interactive switch -!!! Note +!!! note Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: @@ -22,7 +22,7 @@ $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] job The qsub submits the job into the queue, in another words the qsub command creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to above described policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** -!!! Note +!!! note PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster. ### Job Submission Examples @@ -71,7 +71,7 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores ### UV2000 SMP -!!! Note +!!! note 14 NUMA nodes available on UV2000 Per NUMA node allocation. Jobs are isolated by cpusets. @@ -108,7 +108,7 @@ $ qsub -m n ### Placement by Name -!!! Note +!!! note Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks. Specific nodes may be selected using PBS resource attribute host (for hostnames): @@ -135,7 +135,7 @@ For communication intensive jobs it is possible to set stricter requirement - to Nodes directly connected to the same InifiBand switch can communicate most efficiently. Using the same switch prevents hops in the network and provides for unbiased, most efficient network communication. There are 9 nodes directly connected to every InifiBand switch. -!!! Note +!!! note We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. Nodes directly connected to the one InifiBand switch can be allocated using node grouping on PBS resource attribute switch. @@ -148,7 +148,7 @@ $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob ### Placement by Specific InifiBand Switch -!!! Note +!!! note Not useful for ordinary computing, suitable for testing and management tasks. Nodes directly connected to the specific InifiBand switch can be selected using the PBS resource attribute _switch_. @@ -233,7 +233,7 @@ r1i0n11 ## Job Management -!!! Note +!!! note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands ```bash @@ -312,7 +312,7 @@ Run loop 3 In this example, we see actual output (some iteration loops) of the job 35141.dm2 -!!! Note +!!! note Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel,** **qsig** or **qalter** commands You may release your allocation at any time, using qdel command @@ -337,12 +337,12 @@ $ man pbs_professional ### Jobscript -!!! Note +!!! note Prepare the jobscript to run batch jobs in the PBS queue system The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager. -!!! Note +!!! note The jobscript or interactive shell is executed on first of the allocated nodes. ```bash @@ -359,7 +359,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time In this example, the nodes r21u01n577, r21u02n578, r21u03n579, r21u04n580 were allocated for 1 hour via the qexp queue. The jobscript myjob will be executed on the node r21u01n577, while the nodes r21u02n578, r21u03n579, r21u04n580 are available for use as well. -!!! Note +!!! note The jobscript or interactive shell is by default executed in home directory ```bash @@ -373,7 +373,7 @@ $ pwd In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory. -!!! Note +!!! note All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well. @@ -405,7 +405,7 @@ In this example, the hostname program is executed via pdsh from the interactive ### Example Jobscript for MPI Calculation -!!! Note +!!! note Production jobs must use the /scratch directory for I/O The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. @@ -437,12 +437,12 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. -!!! Note +!!! note Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -!!! Note +!!! note Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. ### Example Jobscript for MPI Calculation With Preloaded Inputs @@ -476,7 +476,7 @@ HTML commented section #2 (examples need to be reworked) ### Example Jobscript for Single Node Calculation -!!! Note +!!! note Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory. Example jobscript for single node calculation, using [local scratch](storage/) on the node: diff --git a/docs.it4i/salomon/prace.md b/docs.it4i/salomon/prace.md index eb90adea5..f50e9f8d8 100644 --- a/docs.it4i/salomon/prace.md +++ b/docs.it4i/salomon/prace.md @@ -202,7 +202,7 @@ Generally both shared file systems are available through GridFTP: More information about the shared file systems is available [here](storage/). -!!! Hint +!!! hint `prace` directory is used for PRACE users on the SCRATCH file system. | Data type | Default path | @@ -248,7 +248,7 @@ PRACE users should check their project accounting using the [PRACE Accounting To Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". You need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". -!!! Note +!!! note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> ```bash diff --git a/docs.it4i/salomon/resource-allocation-and-job-execution.md b/docs.it4i/salomon/resource-allocation-and-job-execution.md index 7f8c1e3dc..4452501c5 100644 --- a/docs.it4i/salomon/resource-allocation-and-job-execution.md +++ b/docs.it4i/salomon/resource-allocation-and-job-execution.md @@ -13,14 +13,14 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qfat**, the queue to access SMP UV2000 machine - **qfree**, the Free resource utilization queue -!!! Note +!!! note Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> Read more on the [Resource Allocation Policy](resources-allocation-policy/) page. ## Job Submission and Execution -!!! Note +!!! note Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 24 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** diff --git a/docs.it4i/salomon/resources-allocation-policy.md b/docs.it4i/salomon/resources-allocation-policy.md index 1ceaf4846..a9f54e99f 100644 --- a/docs.it4i/salomon/resources-allocation-policy.md +++ b/docs.it4i/salomon/resources-allocation-policy.md @@ -4,7 +4,7 @@ The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: -!!! Note +!!! note Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | @@ -17,7 +17,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const | **qfree** Free resource queue | yes | none required | 752 nodes, max 86 per job | 24 | -1024 | no | 12 / 12h | | **qviz** Visualization queue | yes | none required | 2 (with NVIDIA Quadro K5000) | 4 | 150 | no | 1 / 8h | -!!! Note +!!! note **The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy/#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. - **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. @@ -28,7 +28,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qfree**, the Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 24 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. - **qviz**, the Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 73 GB of RAM and 1/7 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 4 cores each) up to one whole node per user, so that all 28 cores, 512 GB RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. -!!! Note +!!! note To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/). ### Notes @@ -41,7 +41,7 @@ Salomon users may check current queue configuration at <https://extranet.it4i.cz ### Queue Status -!!! Note +!!! note Check the status of jobs, queues and compute nodes at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon)  @@ -119,7 +119,7 @@ The resources that are currently subject to accounting are the core-hours. The c ### Check Consumed Resources -!!! Note +!!! note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. diff --git a/docs.it4i/salomon/shell-and-data-access.md b/docs.it4i/salomon/shell-and-data-access.md index 26333a6f9..44fc731b1 100644 --- a/docs.it4i/salomon/shell-and-data-access.md +++ b/docs.it4i/salomon/shell-and-data-access.md @@ -4,7 +4,7 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, login3 and login4 at address salomon.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address. -!!! Note +!!! note The alias salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to VPN. | Login address | Port | Protocol | Login node | @@ -17,7 +17,7 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) -!!! Note +!!! note Please verify SSH fingerprints during the first logon. They are identical on all login nodes: f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) 70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) @@ -56,7 +56,7 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com [username@login2.salomon ~]$ ``` -!!! Note +!!! note The environment is **not** shared between login nodes, except for [shared filesystems](storage/). ## Data Transfer @@ -120,7 +120,7 @@ Outgoing connections, from Salomon Cluster login nodes to the outside world, are | 443 | https | | 9418 | git | -!!! Note +!!! note Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. Outgoing connections, from Salomon Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. @@ -129,7 +129,7 @@ Outgoing connections, from Salomon Cluster compute nodes are restricted to the i ### Port Forwarding From Login Nodes -!!! Note +!!! note Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. It works by tunneling the connection from Salomon back to users workstation and forwarding from the workstation to the remote host. @@ -170,7 +170,7 @@ In this example, we assume that port forwarding from login1:6000 to remote.host. Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. -!!! Note +!!! note Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: diff --git a/docs.it4i/salomon/software/chemistry/molpro.md b/docs.it4i/salomon/software/chemistry/molpro.md index bf01d750f..eb0ffb2db 100644 --- a/docs.it4i/salomon/software/chemistry/molpro.md +++ b/docs.it4i/salomon/software/chemistry/molpro.md @@ -32,7 +32,7 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. -!!! Note +!!! note The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH filesystem](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch filesystem. diff --git a/docs.it4i/salomon/software/chemistry/phono3py.md b/docs.it4i/salomon/software/chemistry/phono3py.md index b453bb6fd..b0a136614 100644 --- a/docs.it4i/salomon/software/chemistry/phono3py.md +++ b/docs.it4i/salomon/software/chemistry/phono3py.md @@ -4,7 +4,7 @@ This GPL software calculates phonon-phonon interactions via the third order force constants. It allows to obtain lattice thermal conductivity, phonon lifetime/linewidth, imaginary part of self energy at the lowest order, joint density of states (JDOS) and weighted-JDOS. For details see Phys. Rev. B 91, 094306 (2015) and <http://atztogo.github.io/phono3py/index.html> -!!! Note +!!! note Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module ```bash diff --git a/docs.it4i/salomon/software/compilers.md b/docs.it4i/salomon/software/compilers.md index d493d62f0..04ade837b 100644 --- a/docs.it4i/salomon/software/compilers.md +++ b/docs.it4i/salomon/software/compilers.md @@ -140,7 +140,7 @@ As default UPC network the "smp" is used. This is very quick and easy way for te For production runs, it is recommended to use the native InfiniBand implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. -!!! Warning +!!! warning Selection of the network is done at the compile time and not at runtime (as expected)! Example UPC code: diff --git a/docs.it4i/salomon/software/debuggers/aislinn.md b/docs.it4i/salomon/software/debuggers/aislinn.md index 29bb3c4e8..e4625c002 100644 --- a/docs.it4i/salomon/software/debuggers/aislinn.md +++ b/docs.it4i/salomon/software/debuggers/aislinn.md @@ -5,7 +5,7 @@ - Aislinn is open-source software; you can use it without any licensing limitations. - Web page of the project: <http://verif.cs.vsb.cz/aislinn/> -!!! Note +!!! note Aislinn is software developed at IT4Innovations and some parts are still considered experimental. If you have any questions or experienced any problems, please contact the author: <mailto:stanislav.bohm@vsb.cz>. ### Usage diff --git a/docs.it4i/salomon/software/debuggers/allinea-ddt.md b/docs.it4i/salomon/software/debuggers/allinea-ddt.md index 3cd22b2c5..42c0b4aa0 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-ddt.md +++ b/docs.it4i/salomon/software/debuggers/allinea-ddt.md @@ -47,7 +47,7 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: -!!! Note +!!! note \- **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - - **O0** : Suppress all optimizations. diff --git a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md index d774d7e4a..e1bb453e7 100644 --- a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md @@ -68,7 +68,7 @@ This mode is useful for native Xeon Phi applications launched directly on the ca This mode is useful for applications that are launched from the host and use offload, OpenCL or mpirun. In *Analysis Target* window, select *Intel Xeon Phi coprocessor (native)*, choose path to the binaryand MIC card to run on. -!!! Note +!!! note If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : diff --git a/docs.it4i/salomon/software/debuggers/total-view.md b/docs.it4i/salomon/software/debuggers/total-view.md index 450efd1d0..17a2d4234 100644 --- a/docs.it4i/salomon/software/debuggers/total-view.md +++ b/docs.it4i/salomon/software/debuggers/total-view.md @@ -45,7 +45,7 @@ Compile the code: Before debugging, you need to compile your code with theses flags: -!!! Note +!!! note **-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. **-O0** : Suppress all optimizations. @@ -80,7 +80,7 @@ To debug a serial code use: To debug a parallel code compiled with **OpenMPI** you need to setup your TotalView environment: -!!! Hint +!!! hint To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file. ```bash diff --git a/docs.it4i/salomon/software/intel-xeon-phi.md b/docs.it4i/salomon/software/intel-xeon-phi.md index 4dfc251a4..65457058c 100644 --- a/docs.it4i/salomon/software/intel-xeon-phi.md +++ b/docs.it4i/salomon/software/intel-xeon-phi.md @@ -105,7 +105,7 @@ For debugging purposes it is also recommended to set environment variable "OFFLO A very basic example of code that employs offload programming technique is shown in the next listing. -!!! Note +!!! note This code is sequential and utilizes only single core of the accelerator. ```bash @@ -232,7 +232,7 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: -!!! Note +!!! note Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level @@ -328,7 +328,7 @@ Following example show how to automatically offload an SGEMM (single precision - } ``` -!!! Note +!!! note This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: @@ -371,7 +371,7 @@ To compile a code user has to be connected to a compute with MIC and load Intel $ module load intel/13.5.192 ``` -!!! Note +!!! note Particular version of the Intel module is specified. This information is used later to specify the correct library paths. To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: @@ -414,12 +414,12 @@ If the code is parallelized using OpenMP a set of additional libraries is requir mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` -!!! Note +!!! note The path exported contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: -!!! Note +!!! note /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so @@ -500,7 +500,7 @@ After executing the complied binary file, following output should be displayed. ... ``` -!!! Note +!!! note More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -540,7 +540,7 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ... ``` -!!! Hint +!!! hint GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. ## MPI @@ -602,7 +602,7 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: -!!! Note +!!! note **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. @@ -649,7 +649,7 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` -!!! Note +!!! note - this file sets up both environmental variable for both MPI and OpenMP libraries. - this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. @@ -702,7 +702,7 @@ or using mpirun $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` -!!! Note +!!! note - the full path to the binary has to specified (here: "**>~/mpi-test-mic**") - the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code @@ -715,7 +715,7 @@ The output should be again similar to: Hello world from process 0 of 4 on host cn207-mic0 ``` -!!! Hint +!!! hint **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: @@ -750,7 +750,7 @@ For example: This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. -!!! Note +!!! note At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - to connect to the second node : `$ ssh cn205` - to connect to the accelerator on the first node from the first node: `$ ssh cn204-mic0` or `$ ssh mic0` @@ -882,14 +882,14 @@ A possible output of the MPI "hello-world" example executed on two hosts and two Hello world from process 7 of 8 on host cn205-mic0 ``` -!!! Note +!!! note At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. **Using the PBS automatically generated node-files** PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -!!! Note +!!! note **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: diff --git a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md index e66ab9fc8..0af557ecf 100644 --- a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md @@ -94,7 +94,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note +!!! note Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/salomon/software/mpi/mpi.md b/docs.it4i/salomon/software/mpi/mpi.md index 4b2b721ff..411d54dda 100644 --- a/docs.it4i/salomon/software/mpi/mpi.md +++ b/docs.it4i/salomon/software/mpi/mpi.md @@ -126,7 +126,7 @@ Consider these ways to run an MPI program: **Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and numa effect overheads. -!!! Note +!!! note Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/salomon/software/numerical-languages/matlab.md b/docs.it4i/salomon/software/numerical-languages/matlab.md index b9f7bc5a3..6b9463cd5 100644 --- a/docs.it4i/salomon/software/numerical-languages/matlab.md +++ b/docs.it4i/salomon/software/numerical-languages/matlab.md @@ -129,7 +129,7 @@ The last part of the configuration is done directly in the user Matlab script be This script creates scheduler object "cluster" of type "local" that starts workers locally. -!!! Hint +!!! hint Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. @@ -213,7 +213,7 @@ You can start this script using batch mode the same way as in Local mode example This method is a "hack" invented by us to emulate the mpiexec functionality found in previous MATLAB versions. We leverage the MATLAB Generic Scheduler interface, but instead of submitting the workers to PBS, we launch the workers directly within the running job, thus we avoid the issues with master script and workers running in separate jobs (issues with license not available, waiting for the worker's job to spawn etc.) -!!! Warning +!!! warning This method is experimental. For this method, you need to use SalomonDirect profile, import it using [the same way as SalomonPBSPro](matlab.md#running-parallel-matlab-using-distributed-computing-toolbox---engine) diff --git a/docs.it4i/salomon/storage.md b/docs.it4i/salomon/storage.md index e2750682c..e20921b13 100644 --- a/docs.it4i/salomon/storage.md +++ b/docs.it4i/salomon/storage.md @@ -60,7 +60,7 @@ There is default stripe configuration for Salomon Lustre file systems. However, 2. stripe_count the number of OSTs to stripe across; default is 1 for Salomon Lustre file systems one can specify -1 to use all OSTs in the file system. 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. -!!! Note +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -94,14 +94,14 @@ $ man lfs ### Hints on Lustre Stripping -!!! Note +!!! note Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. -!!! Note +!!! note Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -219,7 +219,7 @@ Default ACL mechanism can be used to replace setuid/setgid permissions on direct Users home directories /home/username reside on HOME file system. Accessible capacity is 0.5 PB, shared among all users. Individual users are restricted by file system usage quotas, set to 250 GB per user. If 250 GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note +!!! note The HOME file system is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME should not be used to archive data of past Projects or other unrelated data. @@ -240,14 +240,14 @@ The workspace is backed up, such that it can be restored in case of catasthropic The WORK workspace resides on SCRATCH file system. Users may create subdirectories and files in directories **/scratch/work/user/username** and **/scratch/work/project/projectid. **The /scratch/work/user/username is private to user, much like the home directory. The /scratch/work/project/projectid is accessible to all users involved in project projectid. -!!! Note +!!! note The WORK workspace is intended to store users project data as well as for high performance access to input and output files. All project data should be removed once the project is finished. The data on the WORK workspace are not backed up. Files on the WORK file system are **persistent** (not automatically deleted) throughout duration of the project. The WORK workspace is hosted on SCRATCH file system. The SCRATCH is realized as Lustre parallel file system and is available from all login and computational nodes. Default stripe size is 1 MB, stripe count is 1. There are 54 OSTs dedicated for the SCRATCH file system. -!!! Note +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | WORK workspace | | @@ -265,7 +265,7 @@ The WORK workspace is hosted on SCRATCH file system. The SCRATCH is realized as The TEMP workspace resides on SCRATCH file system. The TEMP workspace accesspoint is /scratch/temp. Users may freely create subdirectories and files on the workspace. Accessible capacity is 1.6 PB, shared among all users on TEMP and WORK. Individual users are restricted by file system usage quotas, set to 100 TB per user. The purpose of this quota is to prevent runaway programs from filling the entire file system and deny service to other users. >If 100 TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note +!!! note The TEMP workspace is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the TEMP workspace as their working directory. Users are advised to save the necessary data from the TEMP workspace to HOME or WORK after the calculations and clean up the scratch files. @@ -274,7 +274,7 @@ The TEMP workspace resides on SCRATCH file system. The TEMP workspace accesspoin The TEMP workspace is hosted on SCRATCH file system. The SCRATCH is realized as Lustre parallel file system and is available from all login and computational nodes. Default stripe size is 1 MB, stripe count is 1. There are 54 OSTs dedicated for the SCRATCH file system. -!!! Note +!!! note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | TEMP workspace | | @@ -292,14 +292,14 @@ The TEMP workspace is hosted on SCRATCH file system. The SCRATCH is realized as Every computational node is equipped with file system realized in memory, so called RAM disk. -!!! Note +!!! note Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk file system is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk file system is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. Size of RAM disk file system is limited. Be very careful, use of RAM disk file system is at the expense of operational memory. It is not recommended to allocate large amount of memory and use large amount of data in RAM disk file system at the same time. -!!! Note +!!! note The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | RAM disk | | @@ -323,7 +323,7 @@ The local RAM disk file system is intended for temporary scratch data generated Do not use shared file systems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. -!!! Note +!!! note The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -342,14 +342,14 @@ The procedure to obtain the CESNET access is quick and trouble-free. ### Understanding CESNET Storage -!!! Note +!!! note It is very important to understand the CESNET storage before uploading data. [Please read](<https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first>) Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access -!!! Note +!!! note SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -394,7 +394,7 @@ Once done, please remember to unmount the storage ### Rsync Access -!!! Note +!!! note Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. -- GitLab