From 201844d651fc2e9774a655ccaea01def6167160c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20Hrb=C3=A1=C4=8D?= <david@hrbac.cz> Date: Thu, 26 Jan 2017 20:50:46 +0100 Subject: [PATCH] Note clean-up --- .../capacity-computing.md | 12 +++---- .../environment-and-modules.md | 4 +-- .../job-priority.md | 4 +-- .../job-submission-and-execution.md | 20 ++++++------ .../anselm-cluster-documentation/network.md | 2 +- .../anselm-cluster-documentation/prace.md | 2 +- .../resource-allocation-and-job-execution.md | 6 ++-- .../resources-allocation-policy.md | 6 ++-- .../shell-and-data-access.md | 12 +++---- .../software/chemistry/molpro.md | 2 +- .../debuggers/allinea-performance-reports.md | 2 +- .../software/debuggers/papi.md | 2 +- ...intel-integrated-performance-primitives.md | 2 +- .../software/intel-suite/intel-mkl.md | 4 +-- .../software/intel-suite/intel-tbb.md | 2 +- .../software/intel-xeon-phi.md | 28 ++++++++-------- .../software/isv_licenses.md | 2 +- .../software/kvirtualization.md | 4 +-- .../software/mpi/mpi.md | 8 ++--- .../software/mpi/running-mpich2.md | 6 ++-- .../software/numerical-languages/matlab.md | 6 ++-- .../numerical-languages/matlab_1314.md | 6 ++-- .../software/numerical-languages/octave.md | 2 +- .../software/numerical-languages/r.md | 6 ++-- .../software/numerical-libraries/hdf5.md | 2 +- .../magma-for-intel-xeon-phi.md | 14 ++++---- .../software/nvidia-cuda.md | 2 +- .../software/openfoam.md | 6 ++-- .../anselm-cluster-documentation/storage.md | 30 ++++++++--------- .../graphical-user-interface/vnc.md | 6 ++-- .../x-window-system.md | 2 +- .../accessing-the-clusters/introduction.md | 2 +- .../shell-access-and-data-transfer/putty.md | 2 +- .../ssh-keys.md | 4 +-- .../accessing-the-clusters/vpn1-access.md | 2 +- docs.it4i/index.md | 4 +-- docs.it4i/salomon/capacity-computing.md | 12 +++---- docs.it4i/salomon/environment-and-modules.md | 4 +-- docs.it4i/salomon/job-priority.md | 4 +-- .../salomon/job-submission-and-execution.md | 32 +++++++++---------- docs.it4i/salomon/prace.md | 2 +- .../resource-allocation-and-job-execution.md | 4 +-- .../salomon/resources-allocation-policy.md | 10 +++--- docs.it4i/salomon/shell-and-data-access.md | 12 +++---- .../salomon/software/chemistry/molpro.md | 2 +- .../salomon/software/chemistry/phono3py.md | 2 +- .../salomon/software/debuggers/aislinn.md | 2 +- .../salomon/software/debuggers/allinea-ddt.md | 2 +- .../debuggers/intel-vtune-amplifier.md | 2 +- .../salomon/software/debuggers/total-view.md | 4 +-- docs.it4i/salomon/software/intel-xeon-phi.md | 28 ++++++++-------- .../salomon/software/mpi/Running_OpenMPI.md | 2 +- docs.it4i/salomon/software/mpi/mpi.md | 2 +- docs.it4i/salomon/storage.md | 26 +++++++-------- 54 files changed, 189 insertions(+), 189 deletions(-) diff --git a/docs.it4i/anselm-cluster-documentation/capacity-computing.md b/docs.it4i/anselm-cluster-documentation/capacity-computing.md index cab8b9c1..e76bdfc3 100644 --- a/docs.it4i/anselm-cluster-documentation/capacity-computing.md +++ b/docs.it4i/anselm-cluster-documentation/capacity-computing.md @@ -6,7 +6,7 @@ In many cases, it is useful to submit huge (>100+) number of computational jobs However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array** -!!! Note "Note" +!!! Note Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. - Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs @@ -20,7 +20,7 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays -!!! Note "Note" +!!! Note Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: @@ -149,7 +149,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel -!!! Note "Note" +!!! Note Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. @@ -220,12 +220,12 @@ Please note the #PBS directives in the beginning of the jobscript file, dont' fo ## Job Arrays and GNU Parallel -!!! Note "Note" +!!! Note Combine the Job arrays and GNU parallel for best throughput of single core jobs While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. -!!! Note "Note" +!!! Note Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU Parallel, Shared jobscript @@ -280,7 +280,7 @@ cp output $PBS_O_WORKDIR/$TASK.out In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. -!!! Note "Note" +!!! Note Select subjob walltime and number of tasks per subjob carefully When deciding this values, think about following guiding rules: diff --git a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md index 7d328353..c674a36c 100644 --- a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md +++ b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md @@ -23,14 +23,14 @@ then fi ``` -!!! Note "Note" +!!! Note Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example. ### Application Modules In order to configure your shell for running particular application on Anselm we use Module package interface. -!!! Note "Note" +!!! Note The modules set up the application paths, library paths and environment variables for running particular application. We have also second modules repository. This modules repository is created using tool called EasyBuild. On Salomon cluster, all modules will be build by this tool. If you want to use software from this modules repository, please follow instructions in section [Application Modules Path Expansion](environment-and-modules/#EasyBuild). diff --git a/docs.it4i/anselm-cluster-documentation/job-priority.md b/docs.it4i/anselm-cluster-documentation/job-priority.md index fbd57f0f..02e86ada 100644 --- a/docs.it4i/anselm-cluster-documentation/job-priority.md +++ b/docs.it4i/anselm-cluster-documentation/job-priority.md @@ -35,7 +35,7 @@ usage<sub>Total</sub> is total usage by all users, by all projects. Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut in half periodically, at the interval 168 hours (one week). Jobs queued in queue qexp are not calculated to project's usage. -!!! Note "Note" +!!! Note Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -64,7 +64,7 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. -!!! Note "Note" +!!! Note It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md index 8ab52e9b..ebb2b7cd 100644 --- a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md +++ b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md @@ -11,7 +11,7 @@ When allocating computational resources for the job, please specify 5. Project ID 6. Jobscript or interactive switch -!!! Note "Note" +!!! Note Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: @@ -132,7 +132,7 @@ Although this example is somewhat artificial, it demonstrates the flexibility of ## Job Management -!!! Note "Note" +!!! Note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands ```bash @@ -213,7 +213,7 @@ Run loop 3 In this example, we see actual output (some iteration loops) of the job 35141.dm2 -!!! Note "Note" +!!! Note Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel**, **qsig** or **qalter** commands You may release your allocation at any time, using qdel command @@ -238,12 +238,12 @@ $ man pbs_professional ### Jobscript -!!! Note "Note" +!!! Note Prepare the jobscript to run batch jobs in the PBS queue system The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager. -!!! Note "Note" +!!! Note The jobscript or interactive shell is executed on first of the allocated nodes. ```bash @@ -273,7 +273,7 @@ $ pwd In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory. -!!! Note "Note" +!!! Note All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well. @@ -305,7 +305,7 @@ In this example, the hostname program is executed via pdsh from the interactive ### Example Jobscript for MPI Calculation -!!! Note "Note" +!!! Note Production jobs must use the /scratch directory for I/O The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. @@ -337,12 +337,12 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. -!!! Note "Note" +!!! Note Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -!!! Note "Note" +!!! Note Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. Example jobscript for an MPI job with preloaded inputs and executables, options for qsub are stored within the script : @@ -375,7 +375,7 @@ sections. ### Example Jobscript for Single Node Calculation -!!! Note "Note" +!!! Note Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Example jobscript for single node calculation, using [local scratch](storage/) on the node: diff --git a/docs.it4i/anselm-cluster-documentation/network.md b/docs.it4i/anselm-cluster-documentation/network.md index 307931a5..c0226db1 100644 --- a/docs.it4i/anselm-cluster-documentation/network.md +++ b/docs.it4i/anselm-cluster-documentation/network.md @@ -8,7 +8,7 @@ All compute and login nodes of Anselm are interconnected by a high-bandwidth, lo The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes. -!!! Note "Note" +!!! Note The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol. The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently. diff --git a/docs.it4i/anselm-cluster-documentation/prace.md b/docs.it4i/anselm-cluster-documentation/prace.md index 4a7417fd..579339bb 100644 --- a/docs.it4i/anselm-cluster-documentation/prace.md +++ b/docs.it4i/anselm-cluster-documentation/prace.md @@ -235,7 +235,7 @@ PRACE users should check their project accounting using the [PRACE Accounting To Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". Please note that you need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". -!!! Note "Note" +!!! Note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> ```bash diff --git a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md index 27340a1f..ae02e4b3 100644 --- a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md +++ b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md @@ -12,14 +12,14 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qnvidia**, **qmic**, **qfat**, the Dedicated queues - **qfree**, the Free resource utilization queue -!!! Note "Note" +!!! Note Check the queue status at <https://extranet.it4i.cz/anselm/> Read more on the [Resource AllocationPolicy](resources-allocation-policy/) page. ## Job Submission and Execution -!!! Note "Note" +!!! Note Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 16 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** @@ -28,7 +28,7 @@ Read more on the [Job submission and execution](job-submission-and-execution/) p ## Capacity Computing -!!! Note "Note" +!!! Note Use Job arrays when running huge number of jobs. Use GNU Parallel and/or Job arrays when running (many) single core jobs. diff --git a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md index 9ed9bb7c..eab7a56a 100644 --- a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md +++ b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md @@ -4,7 +4,7 @@ The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The Fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: -!!! Note "Note" +!!! Note Check the queue status at <https://extranet.it4i.cz/anselm/> | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | @@ -15,7 +15,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const | qnvidia, qmic, qfat | yes | 0 | 23 total qnvidia4 total qmic2 total qfat | 16 | 200 | yes | 24/48 h | | qfree | yes | none required | 178 w/o accelerator | 16 | -1024 | no | 12 h | -!!! Note "Note" +!!! Note **The qfree queue is not free of charge**. [Normal accounting](#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. **The qexp queue is equipped with the nodes not having the very same CPU clock speed.** Should you need the very same CPU speed, you have to select the proper nodes during the PSB job submission. @@ -113,7 +113,7 @@ The resources that are currently subject to accounting are the core-hours. The c ### Check Consumed Resources -!!! Note "Note" +!!! Note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. diff --git a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md index 2f3aad9b..d830fa79 100644 --- a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md +++ b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md @@ -53,7 +53,7 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com Example to the cluster login: -!!! Note "Note" +!!! Note The environment is **not** shared between login nodes, except for [shared filesystems](storage/#shared-filesystems). ## Data Transfer @@ -69,14 +69,14 @@ Data in and out of the system may be transferred by the [scp](http://en.wikipedi The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) -!!! Note "Note" +!!! Note Data transfer rates up to **160MB/s** can be achieved with scp or sftp. 1TB may be transferred in 1:50h. To achieve 160MB/s transfer rates, the end user must be connected by 10G line all the way to IT4Innovations and use computer with fast processor for the transfer. Using Gigabit ethernet connection, up to 110MB/s may be expected. Fast cipher (aes128-ctr) should be used. -!!! Note "Note" +!!! Note If you experience degraded data transfer performance, consult your local network provider. On linux or Mac, use scp or sftp client to transfer the data to Anselm: @@ -126,7 +126,7 @@ Outgoing connections, from Anselm Cluster login nodes to the outside world, are | 443 | https | | 9418 | git | -!!! Note "Note" +!!! Note Please use **ssh port forwarding** and proxy servers to connect from Anselm to all other remote ports. Outgoing connections, from Anselm Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. @@ -135,7 +135,7 @@ Outgoing connections, from Anselm Cluster compute nodes are restricted to the in ### Port Forwarding From Login Nodes -!!! Note "Note" +!!! Note Port forwarding allows an application running on Anselm to connect to arbitrary remote host and port. It works by tunneling the connection from Anselm back to users workstation and forwarding from the workstation to the remote host. @@ -177,7 +177,7 @@ In this example, we assume that port forwarding from login1:6000 to remote.host. Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. -!!! Note "Note" +!!! Note Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: diff --git a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md index 7f1bdcd9..0918ca19 100644 --- a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md +++ b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md @@ -32,7 +32,7 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. -!!! Note "Note" +!!! Note The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH file system](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch file system. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md index fdc57fb3..1a5df6b0 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md @@ -20,7 +20,7 @@ The module sets up environment variables, required for using the Allinea Perform ## Usage -!!! Note "Note" +!!! Note Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi/), use the the perf report wrapper: diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md index 689bdf61..0671850e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md @@ -190,7 +190,7 @@ Now the compiler won't remove the multiplication loop. (However it is still not ### Intel Xeon Phi -!!! Note "Note" +!!! Note PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing. To use PAPI in [Intel Xeon Phi](../intel-xeon-phi/) native applications, you need to load module with " -mic" suffix, for example " papi/5.3.2-mic" : diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md index 7d874b4d..5fef3b2c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md @@ -4,7 +4,7 @@ Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more. -!!! Note "Note" +!!! Note Check out IPP before implementing own math functions for data processing, it is likely already there. ```bash diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md index d887b4e5..d8faf804 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md @@ -23,7 +23,7 @@ Intel MKL version 13.5.192 is available on Anselm The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES -!!! Note "Note" +!!! Note The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. ### Interfaces @@ -47,7 +47,7 @@ You will need the mkl module loaded to run the mkl enabled executable. This may ### Threading -!!! Note "Note" +!!! Note Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md index 24c6380f..4546ac07 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md @@ -13,7 +13,7 @@ Intel TBB version 4.1 is available on Anselm The module sets up environment variables, required for linking and running tbb enabled applications. -!!! Note "Note" +!!! Note Link the tbb library, using -ltbb ## Examples diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md index 5c0a71af..479af5a2 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md @@ -229,7 +229,7 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: -!!! Note "Note" +!!! Note Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level @@ -326,7 +326,7 @@ Following example show how to automatically offload an SGEMM (single precision - } ``` -!!! Note "Note" +!!! Note Please note: This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: @@ -369,7 +369,7 @@ To compile a code user has to be connected to a compute with MIC and load Intel $ module load intel/13.5.192 ``` -!!! Note "Note" +!!! Note Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: @@ -412,12 +412,12 @@ If the code is parallelized using OpenMP a set of additional libraries is requir mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` -!!! Note "Note" +!!! Note Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: -!!! Note "Note" +!!! Note /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so @@ -498,7 +498,7 @@ After executing the complied binary file, following output should be displayed. ... ``` -!!! Note "Note" +!!! Note More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -538,7 +538,7 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ... ``` -!!! Note "Note" +!!! Note Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. ## MPI @@ -600,7 +600,7 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: -!!! Note "Note" +!!! Note **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. @@ -647,7 +647,7 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` -!!! Note "Note" +!!! Note Please note: \- this file sets up both environmental variable for both MPI and OpenMP libraries. \- this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. @@ -701,7 +701,7 @@ or using mpirun $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` -!!! Note "Note" +!!! Note Please note: \- the full path to the binary has to specified (here: "**>~/mpi-test-mic**") \- the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code @@ -715,7 +715,7 @@ The output should be again similar to: Hello world from process 0 of 4 on host cn207-mic0 ``` -!!! Note "Note" +!!! Note Please note that the **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: ```bash @@ -748,7 +748,7 @@ For example: This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. -!!! Note "Note" +!!! Note Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - to connect to the second node : ** $ ssh cn205** @@ -881,14 +881,14 @@ A possible output of the MPI "hello-world" example executed on two hosts and two Hello world from process 7 of 8 on host cn205-mic0 ``` -!!! Note "Note" +!!! Note Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. **Using the PBS automatically generated node-files** PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -!!! Note "Note" +!!! Note **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: diff --git a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md index 577478f9..7d7dc89b 100644 --- a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md +++ b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md @@ -10,7 +10,7 @@ If an ISV application was purchased for educational (research) purposes and also ## Overview of the Licenses Usage -!!! Note "Note" +!!! Note The overview is generated every minute and is accessible from web or command line interface. ### Web Interface diff --git a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md index 259d6b09..31e371f2 100644 --- a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md +++ b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md @@ -38,7 +38,7 @@ For running Windows application (when source code and Linux native application a IT4Innovations does not provide any licenses for operating systems and software of virtual machines. Users are ( in accordance with [Acceptable use policy document](http://www.it4i.cz/acceptable-use-policy.pdf)) fully responsible for licensing all software running in virtual machines on Anselm. Be aware of complex conditions of licensing software in virtual environments. -!!! Note "Note" +!!! Note Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines. ## Howto @@ -248,7 +248,7 @@ Run virtual machine using optimized devices, user network back-end with sharing Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 for SSH). You must use VPN network). -!!! Note "Note" +!!! Note Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine. ### Networking and Data Sharing diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md index dd92c4e6..b2290555 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md @@ -60,7 +60,7 @@ In this example, the openmpi 1.6.5 using intel compilers is activated ## Compiling MPI Programs -!!! Note "Note" +!!! Note After setting up your MPI environment, compile your program using one of the mpi wrappers ```bash @@ -107,7 +107,7 @@ Compile the above example with ## Running MPI Programs -!!! Note "Note" +!!! Note The MPI program executable must be compatible with the loaded MPI module. Always compile and execute using the very same MPI module. @@ -119,7 +119,7 @@ The MPI program executable must be available within the same path on all nodes. Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern. -!!! Note "Note" +!!! Note Consider these ways to run an MPI program: 1. One MPI process per node, 16 threads per process @@ -130,7 +130,7 @@ Optimal way to run an MPI program depends on its memory requirements, memory acc **Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and NUMA effect overheads. -!!! Note "Note" +!!! Note Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md index 0d5f5945..9fe89641 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md @@ -6,7 +6,7 @@ The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS ### Basic Usage -!!! Note "Note" +!!! Note Use the mpirun to execute the MPICH2 code. Example: @@ -43,7 +43,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node. -!!! Note "Note" +!!! Note MPI process mapping may be controlled by PBS parameters. The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. @@ -92,7 +92,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note "Note" +!!! Note Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md index 602b51b4..af36d6e9 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md @@ -41,7 +41,7 @@ plots, images, etc... will be still available. ## Running Parallel Matlab Using Distributed Computing Toolbox / Engine -!!! Note "Note" +!!! Note Distributed toolbox is available only for the EDU variant The MPIEXEC mode available in previous versions is no longer available in MATLAB 2015. Also, the programming interface has changed. Refer to [Release Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1). @@ -64,7 +64,7 @@ Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click I With the new mode, MATLAB itself launches the workers via PBS, so you can either use interactive mode or a batch mode on one node, but the actual parallel processing will be done in a separate job started by MATLAB itself. Alternatively, you can use "local" mode to run parallel code on just a single node. -!!! Note "Note" +!!! Note The profile is confusingly named Salomon, but you can use it also on Anselm. ### Parallel Matlab Interactive Session @@ -133,7 +133,7 @@ The last part of the configuration is done directly in the user Matlab script be This script creates scheduler object "cluster" of type "local" that starts workers locally. -!!! Note "Note" +!!! Note Please note: Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md index d10c114c..c69a9eea 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md @@ -2,7 +2,7 @@ ## Introduction -!!! Note "Note" +!!! Note This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/). Matlab is available in the latest stable version. There are always two variants of the release: @@ -71,7 +71,7 @@ extras = {}; System MPI library allows Matlab to communicate through 40 Gbit/s InfiniBand QDR interconnect instead of slower 1 Gbit Ethernet network. -!!! Note "Note" +!!! Note The path to MPI library in "mpiLibConf.m" has to match with version of loaded Intel MPI module. In this example the version 4.1.1.036 of Intel MPI is used by Matlab and therefore module impi/4.1.1.036 has to be loaded prior to starting Matlab. ### Parallel Matlab Interactive Session @@ -144,7 +144,7 @@ set(sched, 'EnvironmentSetMethod', 'setenv'); This script creates scheduler object "sched" of type "mpiexec" that starts workers using mpirun tool. To use correct version of mpirun, the second line specifies the path to correct version of system Intel MPI library. -!!! Note "Note" +!!! Note Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling matlabpool(sched, ...) function. The last step is to start matlabpool with "sched" object and correct number of workers. In this case qsub asked for total number of 32 cores, therefore the number of workers is also set to 32. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md index e043f45a..fa6a0378 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md @@ -96,7 +96,7 @@ A version of [native](../intel-xeon-phi/#section-4) Octave is compiled for Xeon Octave is linked with parallel Intel MKL, so it best suited for batch processing of tasks that utilize BLAS, LAPACK and FFT operations. By default, number of threads is set to 120, you can control this with > OMP_NUM_THREADS environment variable. -!!! Note "Note" +!!! Note Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. To use Octave on a node with Xeon Phi: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md index 48ac36ca..c9993016 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md @@ -95,7 +95,7 @@ Download the package [parallell](package-parallel-vignette.pdf) vignette. The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. -!!! Note "Note" +!!! Note Forking via package parallel provides functionality similar to OpenMP construct omp parallel for @@ -146,7 +146,7 @@ Every evaluation of the integrad function runs in parallel on different process. ## Package Rmpi -!!! Note "Note" +!!! Note package Rmpi provides an interface (wrapper) to MPI APIs. It also provides interactive R slave environment. On Anselm, Rmpi provides interface to the [OpenMPI](../mpi-1/Running_OpenMPI/). @@ -296,7 +296,7 @@ Execute the example as: mpi.apply is a specific way of executing Dynamic Rmpi programs. -!!! Note "Note" +!!! Note mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. Execution is identical to other dynamic Rmpi programs. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md index 238d1cb9..42e05e01 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md @@ -22,7 +22,7 @@ Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, comp The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. -!!! Note "Note" +!!! Note Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> All GCC versions of **HDF5 1.8.13** are not affected by the bug, are compiled with -O3 optimizations and are recommended for production computations. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md index 6a91d614..600bd8ae 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -12,10 +12,10 @@ To be able to compile and link code with MAGMA library user has to load followin To make compilation more user friendly module also sets these two environment variables: -!!! Note "Note" +!!! Note MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) -!!! Note "Note" +!!! Note MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). Compilation example: @@ -30,15 +30,15 @@ Compilation example: MAGMA implementation for Intel MIC requires a MAGMA server running on accelerator prior to executing the user application. The server can be started and stopped using following scripts: -!!! Note "Note" +!!! Note To start MAGMA server use: **$MAGMAROOT/start_magma_server** -!!! Note "Note" +!!! Note To stop the server use: **$MAGMAROOT/stop_magma_server** -!!! Note "Note" +!!! Note For deeper understanding how the MAGMA server is started, see the following script: **$MAGMAROOT/launch_anselm_from_mic.sh** @@ -66,11 +66,11 @@ To test if the MAGMA server runs properly we can run one of examples that are pa 10304 10304 --- ( --- ) 500.70 ( 1.46) --- ``` -!!! Note "Note" +!!! Note Please note: MAGMA contains several benchmarks and examples that can be found in: **$MAGMAROOT/testing/** -!!! Note "Note" +!!! Note MAGMA relies on the performance of all CPU cores as well as on the performance of the accelerator. Therefore on Anselm number of CPU OpenMP threads has to be set to 16: **export OMP_NUM_THREADS=16** diff --git a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md index 493eb91a..a57f8c7a 100644 --- a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md +++ b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md @@ -280,7 +280,7 @@ SAXPY function multiplies the vector x by the scalar alpha and adds it to the ve } ``` -!!! Note "Note" +!!! Note Please note: cuBLAS has its own function for data transfers between CPU and GPU memory: - [cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) - transfers data from CPU to GPU memory diff --git a/docs.it4i/anselm-cluster-documentation/software/openfoam.md b/docs.it4i/anselm-cluster-documentation/software/openfoam.md index d1b22d53..350340fb 100644 --- a/docs.it4i/anselm-cluster-documentation/software/openfoam.md +++ b/docs.it4i/anselm-cluster-documentation/software/openfoam.md @@ -57,7 +57,7 @@ To create OpenFOAM environment on ANSELM give the commands: $ source $FOAM_BASHRC ``` -!!! Note "Note" +!!! Note Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SPâ€. Create a project directory within the $HOME/OpenFOAM directory named \<USER\>-\<OFversion\> and create a directory named run within it, e.g. by typing: @@ -120,7 +120,7 @@ Run the second case for example external incompressible turbulent flow - case - First we must run serial application bockMesh and decomposePar for preparation of parallel computation. -!!! Note "Note" +!!! Note Create a Bash scrip test.sh: ```bash @@ -145,7 +145,7 @@ Job submission This job create simple block mesh and domain decomposition. Check your decomposition, and submit parallel computation: -!!! Note "Note" +!!! Note Create a PBS script testParallel.pbs: ```bash diff --git a/docs.it4i/anselm-cluster-documentation/storage.md b/docs.it4i/anselm-cluster-documentation/storage.md index 7c3b9ef7..67a08d87 100644 --- a/docs.it4i/anselm-cluster-documentation/storage.md +++ b/docs.it4i/anselm-cluster-documentation/storage.md @@ -26,7 +26,7 @@ There is default stripe configuration for Anselm Lustre filesystems. However, us 2. stripe_count the number of OSTs to stripe across; default is 1 for Anselm Lustre filesystems one can specify -1 to use all OSTs in the filesystem. 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. -!!! Note "Note" +!!! Note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -60,14 +60,14 @@ $ man lfs ### Hints on Lustre Stripping -!!! Note "Note" +!!! Note Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. -!!! Note "Note" +!!! Note Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -102,7 +102,7 @@ The architecture of Lustre on Anselm is composed of two metadata servers (MDS) The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 320TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 250GB per user. If 250GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" +!!! Note The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME filesystem should not be used to archive data of past Projects or other unrelated data. @@ -114,7 +114,7 @@ The filesystem is backed up, such that it can be restored in case of catasthropi The HOME filesystem is realized as Lustre parallel filesystem and is available on all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for the HOME filesystem. -!!! Note "Note" +!!! Note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | HOME filesystem | | @@ -131,7 +131,7 @@ Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for t The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" +!!! Note The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. >Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. @@ -140,7 +140,7 @@ The SCRATCH filesystem is mounted in directory /scratch. Users may freely create The SCRATCH filesystem is realized as Lustre parallel filesystem and is available from all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 10 OSTs dedicated for the SCRATCH filesystem. -!!! Note "Note" +!!! Note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | SCRATCH filesystem | | @@ -260,7 +260,7 @@ Default ACL mechanism can be used to replace setuid/setgid permissions on direct ### Local Scratch -!!! Note "Note" +!!! Note Every computational node is equipped with 330GB local scratch disk. Use local scratch in case you need to access large amount of small files during your calculation. @@ -269,7 +269,7 @@ The local scratch disk is mounted as /lscratch and is accessible to user at /lsc The local scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs that access large number of small files within the calculation must use the local scratch filesystem as their working directory. This is required for performance reasons, as frequent access to number of small files may overload the metadata servers (MDS) of the Lustre filesystem. -!!! Note "Note" +!!! Note The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | local SCRATCH filesystem | | @@ -284,14 +284,14 @@ The local scratch filesystem is intended for temporary scratch data generated d Every computational node is equipped with filesystem realized in memory, so called RAM disk. -!!! Note "Note" +!!! Note Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. Size of RAM disk filesystem is limited. Be very careful, use of RAM disk filesystem is at the expense of operational memory. It is not recommended to allocate large amount of memory and use large amount of data in RAM disk filesystem at the same time. -!!! Note "Note" +!!! Note The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | RAM disk | | @@ -320,7 +320,7 @@ Each node is equipped with local /tmp directory of few GB capacity. The /tmp dir Do not use shared filesystems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. -!!! Note "Note" +!!! Note The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -339,14 +339,14 @@ The procedure to obtain the CESNET access is quick and trouble-free. ### Understanding CESNET Storage -!!! Note "Note" +!!! Note It is very important to understand the CESNET storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access -!!! Note "Note" +!!! Note SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -391,7 +391,7 @@ Once done, please remember to unmount the storage ### Rsync Access -!!! Note "Note" +!!! Note Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md index 5ed9f564..0947d196 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md @@ -6,7 +6,7 @@ The recommended clients are [TightVNC](http://www.tightvnc.com) or [TigerVNC](ht ## Create VNC Password -!!! Note "Note" +!!! Note Local VNC password should be set before the first login. Do use a strong password. ```bash @@ -17,7 +17,7 @@ Verify: ## Start Vncserver -!!! Note "Note" +!!! Note To access VNC a local vncserver must be started first and also a tunnel using SSH port forwarding must be established. [See below](vnc.md#linux-example-of-creating-a-tunnel) for the details on SSH tunnels. In this example we use port 61. @@ -63,7 +63,7 @@ username 10296 0.0 0.0 131772 21076 pts/29 SN 13:01 0:01 /usr/bin/Xvn To access the VNC server you have to create a tunnel between the login node using TCP **port 5961** and your machine using a free TCP port (for simplicity the very same, in this case). -!!! Note "Note" +!!! Note The tunnel must point to the same login node where you launched the VNC server, eg. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin. ## Linux/Mac OS Example of Creating a Tunnel diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md index 94daef1f..1882a751 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md @@ -52,7 +52,7 @@ Read more on [http://www.math.umn.edu/systems_guide/putty_xwin32.html](http://ww ## Running GUI Enabled Applications -!!! Note "Note" +!!! Note Make sure that X forwarding is activated and the X server is running. Then launch the application as usual. Use the & to run the application in background. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md index ed4dd2c5..6f806156 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/introduction.md @@ -2,7 +2,7 @@ The IT4Innovations clusters are accessed by SSH protocol via login nodes. -!!! Note "Note" +!!! Note Read more on [Accessing the Salomon Cluster](../../salomon/shell-and-data-access.md) or [Accessing the Anselm Cluster](../../anselm-cluster-documentation/shell-and-data-access.md) pages. ## PuTTY diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md index 0ab1e6ef..517076b0 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md @@ -4,7 +4,7 @@ We recommned you to download "**A Windows installer for everything except PuTTYtel**" with **Pageant** (SSH authentication agent) and **PuTTYgen** (PuTTY key generator) which is available [here](http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html). -!!! Note "Note" +!!! Note After installation you can proceed directly to private keys authentication using ["Putty"](putty#putty). "Change Password for Existing Private Key" is optional. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md index ec5b7ffb..4fbb8aab 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md @@ -37,7 +37,7 @@ After logging in, you can see .ssh/ directory with SSH keys and authorized_keys ## Private Key -!!! Note "Note" +!!! Note The path to a private key is usually /home/username/.ssh/ Private key file in "id_rsa" or `*.ppk` format is used to authenticate with the servers. Private key is present locally on local side and used for example in SSH agent Pageant (for Windows users). The private key should always be kept in a safe place. @@ -92,7 +92,7 @@ First, generate a new keypair of your public and private key: local $ ssh-keygen -C 'username@organization.example.com' -f additional_key ``` -!!! Note "Note" +!!! Note Please, enter **strong** **passphrase** for securing your private key. You can insert additional public key into authorized_keys file for authentication with your own private key. Additional records in authorized_keys file must be delimited by new line. Users are not advised to remove the default public key from authorized_keys file. diff --git a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md index d1b5cdb1..376f9724 100644 --- a/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md +++ b/docs.it4i/get-started-with-it4innovations/accessing-the-clusters/vpn1-access.md @@ -2,7 +2,7 @@ ## Accessing IT4Innovations Internal Resources via VPN -!!! Note "Note" +!!! Note **Failed to initialize connection subsystem Win 8.1 - 02-10-15 MS patch** Workaround can be found at [vpn-connection-fail-in-win-8.1](../../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.html) diff --git a/docs.it4i/index.md b/docs.it4i/index.md index 5041509b..1ed5efa2 100644 --- a/docs.it4i/index.md +++ b/docs.it4i/index.md @@ -17,12 +17,12 @@ Use your IT4Innotations username and password to log in to the [support](http:// ## Required Proficiency -!!! Note "Note" +!!! Note You need basic proficiency in Linux environment. In order to use the system for your calculations, you need basic proficiency in Linux environment. To gain the proficiency, we recommend you reading the [introduction to Linux](http://www.tldp.org/LDP/intro-linux/html/) operating system environment and installing a Linux distribution on your personal computer. A good choice might be the [CentOS](http://www.centos.org/) distribution, as it is similar to systems on the clusters at IT4Innovations. It's easy to install and use. In fact, any distribution would do. -!!! Note "Note" +!!! Note Learn how to parallelize your code! In many cases, you will run your own code on the cluster. In order to fully exploit the cluster, you will need to carefully consider how to utilize all the cores available on the node and how to use multiple nodes at the same time. You need to **parallelize** your code. Proficieny in MPI, OpenMP, CUDA, UPC or GPI2 programming may be gained via the [training provided by IT4Innovations.](http://prace.it4i.cz) diff --git a/docs.it4i/salomon/capacity-computing.md b/docs.it4i/salomon/capacity-computing.md index 90dfc25c..d79e4834 100644 --- a/docs.it4i/salomon/capacity-computing.md +++ b/docs.it4i/salomon/capacity-computing.md @@ -6,7 +6,7 @@ In many cases, it is useful to submit huge (100+) number of computational jobs i However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1500 per job array** -!!! Note "Note" +!!! Note Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. - Use [Job arrays](capacity-computing.md#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs @@ -20,7 +20,7 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays -!!! Note "Note" +!!! Note Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: @@ -151,7 +151,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel -!!! Note "Note" +!!! Note Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. @@ -222,12 +222,12 @@ Please note the #PBS directives in the beginning of the jobscript file, dont' fo ## Job Arrays and GNU Parallel -!!! Note "Note" +!!! Note Combine the Job arrays and GNU parallel for best throughput of single core jobs While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. -!!! Note "Note" +!!! Note Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU Parallel, Shared jobscript @@ -282,7 +282,7 @@ cp output $PBS_O_WORKDIR/$TASK.out In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. -!!! Note "Note" +!!! Note Select subjob walltime and number of tasks per subjob carefully When deciding this values, think about following guiding rules : diff --git a/docs.it4i/salomon/environment-and-modules.md b/docs.it4i/salomon/environment-and-modules.md index 0a452049..c1adc49d 100644 --- a/docs.it4i/salomon/environment-and-modules.md +++ b/docs.it4i/salomon/environment-and-modules.md @@ -23,7 +23,7 @@ then fi ``` -!!! Note "Note" +!!! Note Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example. ### Application Modules @@ -56,7 +56,7 @@ Application modules on Salomon cluster are built using [EasyBuild](http://hpcuge vis: Visualization, plotting, documentation and typesetting ``` -!!! Note "Note" +!!! Note The modules set up the application paths, library paths and environment variables for running particular application. The modules may be loaded, unloaded and switched, according to momentary needs. diff --git a/docs.it4i/salomon/job-priority.md b/docs.it4i/salomon/job-priority.md index d13c8f5f..090d6ff3 100644 --- a/docs.it4i/salomon/job-priority.md +++ b/docs.it4i/salomon/job-priority.md @@ -36,7 +36,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut # Jobs Queued in Queue qexp Are Not Calculated to Project's Usage. -!!! Note "Note" +!!! Note Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -65,7 +65,7 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. -!!! Note "Note" +!!! Note It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/salomon/job-submission-and-execution.md b/docs.it4i/salomon/job-submission-and-execution.md index 96f8d218..b5d83ad1 100644 --- a/docs.it4i/salomon/job-submission-and-execution.md +++ b/docs.it4i/salomon/job-submission-and-execution.md @@ -11,7 +11,7 @@ When allocating computational resources for the job, please specify 5. Project ID 6. Jobscript or interactive switch -!!! Note "Note" +!!! Note Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: @@ -22,7 +22,7 @@ $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] job The qsub submits the job into the queue, in another words the qsub command creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to above described policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** -!!! Note "Note" +!!! Note PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster. ### Job Submission Examples @@ -71,7 +71,7 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores ### UV2000 SMP -!!! Note "Note" +!!! Note 14 NUMA nodes available on UV2000 Per NUMA node allocation. Jobs are isolated by cpusets. @@ -108,7 +108,7 @@ $ qsub -m n ### Placement by Name -!!! Note "Note" +!!! Note Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks. Specific nodes may be selected using PBS resource attribute host (for hostnames): @@ -135,7 +135,7 @@ For communication intensive jobs it is possible to set stricter requirement - to Nodes directly connected to the same InifiBand switch can communicate most efficiently. Using the same switch prevents hops in the network and provides for unbiased, most efficient network communication. There are 9 nodes directly connected to every InifiBand switch. -!!! Note "Note" +!!! Note We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. Nodes directly connected to the one InifiBand switch can be allocated using node grouping on PBS resource attribute switch. @@ -148,7 +148,7 @@ $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob ### Placement by Specific InifiBand Switch -!!! Note "Note" +!!! Note Not useful for ordinary computing, suitable for testing and management tasks. Nodes directly connected to the specific InifiBand switch can be selected using the PBS resource attribute _switch_. @@ -233,7 +233,7 @@ r1i0n11 ## Job Management -!!! Note "Note" +!!! Note Check status of your jobs using the **qstat** and **check-pbs-jobs** commands ```bash @@ -312,7 +312,7 @@ Run loop 3 In this example, we see actual output (some iteration loops) of the job 35141.dm2 -!!! Note "Note" +!!! Note Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel,** **qsig** or **qalter** commands You may release your allocation at any time, using qdel command @@ -337,12 +337,12 @@ $ man pbs_professional ### Jobscript -!!! Note "Note" +!!! Note Prepare the jobscript to run batch jobs in the PBS queue system The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager. -!!! Note "Note" +!!! Note The jobscript or interactive shell is executed on first of the allocated nodes. ```bash @@ -359,7 +359,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time In this example, the nodes r21u01n577, r21u02n578, r21u03n579, r21u04n580 were allocated for 1 hour via the qexp queue. The jobscript myjob will be executed on the node r21u01n577, while the nodes r21u02n578, r21u03n579, r21u04n580 are available for use as well. -!!! Note "Note" +!!! Note The jobscript or interactive shell is by default executed in home directory ```bash @@ -373,7 +373,7 @@ $ pwd In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory. -!!! Note "Note" +!!! Note All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well. @@ -405,7 +405,7 @@ In this example, the hostname program is executed via pdsh from the interactive ### Example Jobscript for MPI Calculation -!!! Note "Note" +!!! Note Production jobs must use the /scratch directory for I/O The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. @@ -437,12 +437,12 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. -!!! Note "Note" +!!! Note Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -!!! Note "Note" +!!! Note Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. ### Example Jobscript for MPI Calculation With Preloaded Inputs @@ -476,7 +476,7 @@ HTML commented section #2 (examples need to be reworked) ### Example Jobscript for Single Node Calculation -!!! Note "Note" +!!! Note Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory. Example jobscript for single node calculation, using [local scratch](storage/) on the node: diff --git a/docs.it4i/salomon/prace.md b/docs.it4i/salomon/prace.md index 5dfd1dfa..1684281f 100644 --- a/docs.it4i/salomon/prace.md +++ b/docs.it4i/salomon/prace.md @@ -247,7 +247,7 @@ PRACE users should check their project accounting using the [PRACE Accounting To Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". Please note that you need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". -!!! Note "Note" +!!! Note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> ```bash diff --git a/docs.it4i/salomon/resource-allocation-and-job-execution.md b/docs.it4i/salomon/resource-allocation-and-job-execution.md index fa47d4a4..489e9de6 100644 --- a/docs.it4i/salomon/resource-allocation-and-job-execution.md +++ b/docs.it4i/salomon/resource-allocation-and-job-execution.md @@ -13,14 +13,14 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qfat**, the queue to access SMP UV2000 machine - **qfree**, the Free resource utilization queue -!!! Note "Note" +!!! Note Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> Read more on the [Resource Allocation Policy](resources-allocation-policy/) page. ## Job Submission and Execution -!!! Note "Note" +!!! Note Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 24 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** diff --git a/docs.it4i/salomon/resources-allocation-policy.md b/docs.it4i/salomon/resources-allocation-policy.md index 8f77c70f..5d97c4bd 100644 --- a/docs.it4i/salomon/resources-allocation-policy.md +++ b/docs.it4i/salomon/resources-allocation-policy.md @@ -4,7 +4,7 @@ The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: -!!! Note "Note" +!!! Note Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | @@ -17,7 +17,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const | **qfree** Free resource queue | yes | none required | 752 nodes, max 86 per job | 24 | -1024 | no | 12 / 12h | | **qviz** Visualization queue | yes | none required | 2 (with NVIDIA Quadro K5000) | 4 | 150 | no | 1 / 8h | -!!! Note "Note" +!!! Note **The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy/#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. - **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. @@ -28,7 +28,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qfree**, the Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 24 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. - **qviz**, the Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 73 GB of RAM and 1/7 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 4 cores each) up to one whole node per user, so that all 28 cores, 512 GB RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. -!!! Note "Note" +!!! Note To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/). ### Notes @@ -41,7 +41,7 @@ Salomon users may check current queue configuration at <https://extranet.it4i.cz ### Queue Status -!!! Note "Note" +!!! Note Check the status of jobs, queues and compute nodes at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon)  @@ -119,7 +119,7 @@ The resources that are currently subject to accounting are the core-hours. The c ### Check Consumed Resources -!!! Note "Note" +!!! Note The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. diff --git a/docs.it4i/salomon/shell-and-data-access.md b/docs.it4i/salomon/shell-and-data-access.md index 06f79c20..4b01e65d 100644 --- a/docs.it4i/salomon/shell-and-data-access.md +++ b/docs.it4i/salomon/shell-and-data-access.md @@ -4,7 +4,7 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, login3 and login4 at address salomon.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address. -!!! Note "Note" +!!! Note The alias salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to VPN. | Login address | Port | Protocol | Login node | @@ -17,7 +17,7 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) -!!! Note "Note" +!!! Note Please verify SSH fingerprints during the first logon. They are identical on all login nodes: f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) 70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) @@ -56,7 +56,7 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com [username@login2.salomon ~]$ ``` -!!! Note "Note" +!!! Note The environment is **not** shared between login nodes, except for [shared filesystems](storage/). ## Data Transfer @@ -120,7 +120,7 @@ Outgoing connections, from Salomon Cluster login nodes to the outside world, are | 443 | https | | 9418 | git | -!!! Note "Note" +!!! Note Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. Outgoing connections, from Salomon Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. @@ -129,7 +129,7 @@ Outgoing connections, from Salomon Cluster compute nodes are restricted to the i ### Port Forwarding From Login Nodes -!!! Note "Note" +!!! Note Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. It works by tunneling the connection from Salomon back to users workstation and forwarding from the workstation to the remote host. @@ -170,7 +170,7 @@ In this example, we assume that port forwarding from login1:6000 to remote.host. Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. -!!! Note "Note" +!!! Note Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: diff --git a/docs.it4i/salomon/software/chemistry/molpro.md b/docs.it4i/salomon/software/chemistry/molpro.md index 308ed266..ca925876 100644 --- a/docs.it4i/salomon/software/chemistry/molpro.md +++ b/docs.it4i/salomon/software/chemistry/molpro.md @@ -32,7 +32,7 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. -!!! Note "Note" +!!! Note The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH filesystem](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch filesystem. diff --git a/docs.it4i/salomon/software/chemistry/phono3py.md b/docs.it4i/salomon/software/chemistry/phono3py.md index 35a5d131..d6807311 100644 --- a/docs.it4i/salomon/software/chemistry/phono3py.md +++ b/docs.it4i/salomon/software/chemistry/phono3py.md @@ -4,7 +4,7 @@ This GPL software calculates phonon-phonon interactions via the third order force constants. It allows to obtain lattice thermal conductivity, phonon lifetime/linewidth, imaginary part of self energy at the lowest order, joint density of states (JDOS) and weighted-JDOS. For details see Phys. Rev. B 91, 094306 (2015) and <http://atztogo.github.io/phono3py/index.html> -!!! Note "Note" +!!! Note Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module ```bash diff --git a/docs.it4i/salomon/software/debuggers/aislinn.md b/docs.it4i/salomon/software/debuggers/aislinn.md index cf3c57b6..c881c078 100644 --- a/docs.it4i/salomon/software/debuggers/aislinn.md +++ b/docs.it4i/salomon/software/debuggers/aislinn.md @@ -5,7 +5,7 @@ - Aislinn is open-source software; you can use it without any licensing limitations. - Web page of the project: <http://verif.cs.vsb.cz/aislinn/> -!!! Note "Note" +!!! Note Aislinn is software developed at IT4Innovations and some parts are still considered experimental. If you have any questions or experienced any problems, please contact the author: <mailto:stanislav.bohm@vsb.cz>. ### Usage diff --git a/docs.it4i/salomon/software/debuggers/allinea-ddt.md b/docs.it4i/salomon/software/debuggers/allinea-ddt.md index 94890c05..bde30948 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-ddt.md +++ b/docs.it4i/salomon/software/debuggers/allinea-ddt.md @@ -47,7 +47,7 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: -!!! Note "Note" +!!! Note \- **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - - **O0** : Suppress all optimizations. diff --git a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md index 918ffca1..14f54e72 100644 --- a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md @@ -68,7 +68,7 @@ This mode is useful for native Xeon Phi applications launched directly on the ca This mode is useful for applications that are launched from the host and use offload, OpenCL or mpirun. In *Analysis Target* window, select *Intel Xeon Phi coprocessor (native)*, choose path to the binaryand MIC card to run on. -!!! Note "Note" +!!! Note If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : diff --git a/docs.it4i/salomon/software/debuggers/total-view.md b/docs.it4i/salomon/software/debuggers/total-view.md index 7f1cd15d..29d21130 100644 --- a/docs.it4i/salomon/software/debuggers/total-view.md +++ b/docs.it4i/salomon/software/debuggers/total-view.md @@ -45,7 +45,7 @@ Compile the code: Before debugging, you need to compile your code with theses flags: -!!! Note "Note" +!!! Note **-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. **-O0** : Suppress all optimizations. @@ -80,7 +80,7 @@ To debug a serial code use: To debug a parallel code compiled with **OpenMPI** you need to setup your TotalView environment: -!!! Note "Note" +!!! Note **Please note:** To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file: ```bash diff --git a/docs.it4i/salomon/software/intel-xeon-phi.md b/docs.it4i/salomon/software/intel-xeon-phi.md index ecddeea4..4f648d15 100644 --- a/docs.it4i/salomon/software/intel-xeon-phi.md +++ b/docs.it4i/salomon/software/intel-xeon-phi.md @@ -229,7 +229,7 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: -!!! Note "Note" +!!! Note Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level @@ -325,7 +325,7 @@ Following example show how to automatically offload an SGEMM (single precision - } ``` -!!! Note "Note" +!!! Note Please note: This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: @@ -368,7 +368,7 @@ To compile a code user has to be connected to a compute with MIC and load Intel $ module load intel/13.5.192 ``` -!!! Note "Note" +!!! Note Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: @@ -411,12 +411,12 @@ If the code is parallelized using OpenMP a set of additional libraries is requir mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` -!!! Note "Note" +!!! Note Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: -!!! Note "Note" +!!! Note /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so @@ -497,7 +497,7 @@ After executing the complied binary file, following output should be displayed. ... ``` -!!! Note "Note" +!!! Note More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -537,7 +537,7 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ... ``` -!!! Note "Note" +!!! Note Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. ## MPI @@ -599,7 +599,7 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: -!!! Note "Note" +!!! Note **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. @@ -646,7 +646,7 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` -!!! Note "Note" +!!! Note Please note: - this file sets up both environmental variable for both MPI and OpenMP libraries. @@ -701,7 +701,7 @@ or using mpirun $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` -!!! Note "Note" +!!! Note Please note: \- the full path to the binary has to specified (here: "**>~/mpi-test-mic**") \- the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code @@ -715,7 +715,7 @@ The output should be again similar to: Hello world from process 0 of 4 on host cn207-mic0 ``` -!!! Note "Note" +!!! Note Please note that the **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: ```bash @@ -748,7 +748,7 @@ For example: This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. -!!! Note "Note" +!!! Note Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: - to connect to the second node : ** $ ssh cn205** @@ -881,14 +881,14 @@ A possible output of the MPI "hello-world" example executed on two hosts and two Hello world from process 7 of 8 on host cn205-mic0 ``` -!!! Note "Note" +!!! Note Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. **Using the PBS automatically generated node-files** PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -!!! Note "Note" +!!! Note **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: diff --git a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md index da78ee38..4c742a61 100644 --- a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md @@ -94,7 +94,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity -!!! Note "Note" +!!! Note Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/salomon/software/mpi/mpi.md b/docs.it4i/salomon/software/mpi/mpi.md index 3f89096c..2428b60d 100644 --- a/docs.it4i/salomon/software/mpi/mpi.md +++ b/docs.it4i/salomon/software/mpi/mpi.md @@ -126,7 +126,7 @@ Consider these ways to run an MPI program: **Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and numa effect overheads. -!!! Note "Note" +!!! Note Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/salomon/storage.md b/docs.it4i/salomon/storage.md index 27cbead8..cd42d086 100644 --- a/docs.it4i/salomon/storage.md +++ b/docs.it4i/salomon/storage.md @@ -60,7 +60,7 @@ There is default stripe configuration for Salomon Lustre file systems. However, 2. stripe_count the number of OSTs to stripe across; default is 1 for Salomon Lustre file systems one can specify -1 to use all OSTs in the file system. 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. -!!! Note "Note" +!!! Note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -94,14 +94,14 @@ $ man lfs ### Hints on Lustre Stripping -!!! Note "Note" +!!! Note Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. -!!! Note "Note" +!!! Note Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -219,7 +219,7 @@ Default ACL mechanism can be used to replace setuid/setgid permissions on direct Users home directories /home/username reside on HOME file system. Accessible capacity is 0.5 PB, shared among all users. Individual users are restricted by file system usage quotas, set to 250 GB per user. If 250 GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" +!!! Note The HOME file system is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME should not be used to archive data of past Projects or other unrelated data. @@ -240,14 +240,14 @@ The workspace is backed up, such that it can be restored in case of catasthropic The WORK workspace resides on SCRATCH file system. Users may create subdirectories and files in directories **/scratch/work/user/username** and **/scratch/work/project/projectid. **The /scratch/work/user/username is private to user, much like the home directory. The /scratch/work/project/projectid is accessible to all users involved in project projectid. -!!! Note "Note" +!!! Note The WORK workspace is intended to store users project data as well as for high performance access to input and output files. All project data should be removed once the project is finished. The data on the WORK workspace are not backed up. Files on the WORK file system are **persistent** (not automatically deleted) throughout duration of the project. The WORK workspace is hosted on SCRATCH file system. The SCRATCH is realized as Lustre parallel file system and is available from all login and computational nodes. Default stripe size is 1 MB, stripe count is 1. There are 54 OSTs dedicated for the SCRATCH file system. -!!! Note "Note" +!!! Note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | WORK workspace | | @@ -265,7 +265,7 @@ The WORK workspace is hosted on SCRATCH file system. The SCRATCH is realized as The TEMP workspace resides on SCRATCH file system. The TEMP workspace accesspoint is /scratch/temp. Users may freely create subdirectories and files on the workspace. Accessible capacity is 1.6 PB, shared among all users on TEMP and WORK. Individual users are restricted by file system usage quotas, set to 100 TB per user. The purpose of this quota is to prevent runaway programs from filling the entire file system and deny service to other users. >If 100 TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. -!!! Note "Note" +!!! Note The TEMP workspace is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the TEMP workspace as their working directory. Users are advised to save the necessary data from the TEMP workspace to HOME or WORK after the calculations and clean up the scratch files. @@ -274,7 +274,7 @@ The TEMP workspace resides on SCRATCH file system. The TEMP workspace accesspoin The TEMP workspace is hosted on SCRATCH file system. The SCRATCH is realized as Lustre parallel file system and is available from all login and computational nodes. Default stripe size is 1 MB, stripe count is 1. There are 54 OSTs dedicated for the SCRATCH file system. -!!! Note "Note" +!!! Note Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | TEMP workspace | | @@ -292,7 +292,7 @@ The TEMP workspace is hosted on SCRATCH file system. The SCRATCH is realized as Every computational node is equipped with file system realized in memory, so called RAM disk. -!!! Note "Note" +!!! Note Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk file system is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. @@ -323,7 +323,7 @@ The local RAM disk file system is intended for temporary scratch data generated Do not use shared file systems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. -!!! Note "Note" +!!! Note The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -342,14 +342,14 @@ The procedure to obtain the CESNET access is quick and trouble-free. ### Understanding CESNET Storage -!!! Note "Note" +!!! Note It is very important to understand the CESNET storage before uploading data. [Please read](<https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first>) Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access -!!! Note "Note" +!!! Note SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -394,7 +394,7 @@ Once done, please remember to unmount the storage ### Rsync Access -!!! Note "Note" +!!! Note Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. -- GitLab