diff --git a/docs.it4i/anselm-cluster-documentation/capacity-computing.md b/docs.it4i/anselm-cluster-documentation/capacity-computing.md index 306474076df83ea3255390348fe63be8640ce4bf..3180cd447e0fdc07924fa4742acc204639cd14df 100644 --- a/docs.it4i/anselm-cluster-documentation/capacity-computing.md +++ b/docs.it4i/anselm-cluster-documentation/capacity-computing.md @@ -7,7 +7,7 @@ In many cases, it is useful to submit huge (>100+) number of computational jobs However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array** !!! Note - Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. + Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. - Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs - Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs @@ -21,7 +21,7 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays !!! Note - Huge number of jobs may be easily submitted and managed as a job array. + Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: @@ -150,7 +150,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel !!! Note - Use GNU parallel to run many single core tasks on one node. + Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. diff --git a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md index c674a36cb635e380612074899cd8cbc8c44d6f2d..2506efb2ed80a23e9b4d70a90c054d50edd7668a 100644 --- a/docs.it4i/anselm-cluster-documentation/environment-and-modules.md +++ b/docs.it4i/anselm-cluster-documentation/environment-and-modules.md @@ -24,14 +24,14 @@ fi ``` !!! Note - Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example. + Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example. ### Application Modules In order to configure your shell for running particular application on Anselm we use Module package interface. !!! Note - The modules set up the application paths, library paths and environment variables for running particular application. + The modules set up the application paths, library paths and environment variables for running particular application. We have also second modules repository. This modules repository is created using tool called EasyBuild. On Salomon cluster, all modules will be build by this tool. If you want to use software from this modules repository, please follow instructions in section [Application Modules Path Expansion](environment-and-modules/#EasyBuild). diff --git a/docs.it4i/anselm-cluster-documentation/job-priority.md b/docs.it4i/anselm-cluster-documentation/job-priority.md index 02e86ada55d938d04a09779282da5b22acbeb757..8d72dde770a65cee530919bd705eccff341b83d8 100644 --- a/docs.it4i/anselm-cluster-documentation/job-priority.md +++ b/docs.it4i/anselm-cluster-documentation/job-priority.md @@ -36,7 +36,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut Jobs queued in queue qexp are not calculated to project's usage. !!! Note - Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>. + Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -65,6 +65,6 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. !!! Note - It is **very beneficial to specify the walltime** when submitting jobs. + It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/anselm-cluster-documentation/network.md b/docs.it4i/anselm-cluster-documentation/network.md index c0226db15c4d3823e761b507cd647065a612bb98..9a4a341c375b7adb0768e07f9ba6c8eba65b0be2 100644 --- a/docs.it4i/anselm-cluster-documentation/network.md +++ b/docs.it4i/anselm-cluster-documentation/network.md @@ -9,7 +9,7 @@ All compute and login nodes of Anselm are interconnected by a high-bandwidth, lo The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes. !!! Note - The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol. + The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol. The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently. diff --git a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md index ae02e4b34a8fa6c8ffc31989d6116803dae6e1fd..767e5bcdd13cc8a960139f29090f1810f06dcf7d 100644 --- a/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md +++ b/docs.it4i/anselm-cluster-documentation/resource-allocation-and-job-execution.md @@ -13,14 +13,14 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qfree**, the Free resource utilization queue !!! Note - Check the queue status at <https://extranet.it4i.cz/anselm/> + Check the queue status at <https://extranet.it4i.cz/anselm/> Read more on the [Resource AllocationPolicy](resources-allocation-policy/) page. ## Job Submission and Execution !!! Note - Use the **qsub** command to submit your jobs. + Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 16 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** @@ -29,7 +29,7 @@ Read more on the [Job submission and execution](job-submission-and-execution/) p ## Capacity Computing !!! Note - Use Job arrays when running huge number of jobs. + Use Job arrays when running huge number of jobs. Use GNU Parallel and/or Job arrays when running (many) single core jobs. diff --git a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md index 0918ca1926ccdb0df6e5b4a743ba2afffa109a6f..7dc65a08d41ffa049719004830032087d900f802 100644 --- a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md +++ b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md @@ -33,7 +33,7 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. !!! Note - The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. + The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH file system](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch file system. diff --git a/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md b/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md index befce6a433d0f0f35a7429deb1b7e6b11311b335..9a21ead00b22e1bab44717d59dbcc706f5f1f0dc 100644 --- a/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md +++ b/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md @@ -24,13 +24,13 @@ On the Anselm cluster COMSOL is available in the latest stable version. There ar To load the of COMSOL load the module ```bash - $ module load comsol + $ module load comsol ``` By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use ```bash - $ module avail comsol + $ module avail comsol ``` If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the Virtual Network Computing (VNC). diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md index 1a5df6b08568883969a5f8f61f3e73afb1632312..e27f426d6162135531095e4125b4943a91b19dcb 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md @@ -21,7 +21,7 @@ The module sets up environment variables, required for using the Allinea Perform ## Usage !!! Note - Use the the perf-report wrapper on your (MPI) program. + Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi/), use the the perf report wrapper: diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md index 23849f609b3b96db56c1f93da53f0f350cc9b1e9..78ad34845731a6e1f28a14ed1c618b585a351b8f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md @@ -28,7 +28,7 @@ Currently, there are two versions of CUBE 4.2.3 available as [modules](../../env CUBE is a graphical application. Refer to Graphical User Interface documentation for a list of methods to launch graphical applications on Anselm. !!! Note - Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes. + Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes. After loading the appropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md index c408fd4fe08815c59c5e5beafecd99fe4a1d6b9a..ea4e99fed23e4166185debb712950b9a32634944 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md @@ -193,7 +193,7 @@ Can be used as a sensor for ksysguard GUI, which is currently not installed on A In a similar fashion to PAPI, PCM provides a C++ API to access the performance counter from within your application. Refer to the [Doxygen documentation](http://intel-pcm-api-documentation.github.io/classPCM.html) for details of the API. !!! Note - Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user) + Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user) Sample program using the API : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md index e9bae568d427dcda6f11dd0a728533e13d194d17..d1d65bb3807d01c9001021947b3430a14be8b4d6 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md @@ -27,7 +27,7 @@ and launch the GUI : ``` !!! Note - To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications. + To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications. The GUI will open in new window. Click on "_New Project..._" to create a new project. After clicking _OK_, a new window with project properties will appear. At "_Application:_", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "_Managed code profiling mode:_" select "_Native_" (unless you want to profile managed mode .NET/Mono applications). After clicking _OK_, your project is created. @@ -48,7 +48,7 @@ Copy the line to clipboard and then you can paste it in your jobscript or in com ## Xeon Phi !!! Note - This section is outdated. It will be updated with new information soon. + This section is outdated. It will be updated with new information soon. It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties: @@ -59,7 +59,7 @@ Application parameters: mic0 source ~/.profile && /path/to/your/bin Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/). !!! Note - If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. + If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md index 0671850ee5d14d3442abfef77d965a564128a17d..28542810b15715e935cddd83b339d32d1e8d0710 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md @@ -191,7 +191,7 @@ Now the compiler won't remove the multiplication loop. (However it is still not ### Intel Xeon Phi !!! Note - PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing. + PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing. To use PAPI in [Intel Xeon Phi](../intel-xeon-phi/) native applications, you need to load module with " -mic" suffix, for example " papi/5.3.2-mic" : diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md index 45c0768e7cae1ed4e5256e461b2b29f40aa86bb5..fa784f688c0033b2fae2a7510dafbb136e47764b 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md @@ -43,7 +43,7 @@ Some notable Scalasca options are: - **-e <directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.** !!! Note - Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/). + Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/). ### Analysis of Reports diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md index 1389d347704845c9fbcb5d5f9a479b29790275df..fa834330019b4da29b88945463b7c5c6069138be 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md @@ -121,7 +121,7 @@ The source code of this function can be also found in ``` !!! Note - You can also add only following line to you ~/.tvdrc file instead of the entire function: + You can also add only following line to you ~/.tvdrc file instead of the entire function: **source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl** You need to do this step only once. diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md index 5fef3b2c2d9428eb6e3f193824c6357d942c9644..2cc7fc404c17ac220d50e9b8ccf7770a6faf41af 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md @@ -5,7 +5,7 @@ Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more. !!! Note - Check out IPP before implementing own math functions for data processing, it is likely already there. + Check out IPP before implementing own math functions for data processing, it is likely already there. ```bash $ module load ipp diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md index d8faf804e84a2c7eca28817e6d583db3369a6e25..a1c64a4bc0411ce73a333a39870ab32cff1f1f09 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md @@ -24,7 +24,7 @@ Intel MKL version 13.5.192 is available on Anselm The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES !!! Note - The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. + The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. ### Interfaces @@ -48,7 +48,7 @@ You will need the mkl module loaded to run the mkl enabled executable. This may ### Threading !!! Note - Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. + Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md index 4546ac077d4031f552c9b973e00536179b34e4f4..1dfd536d2f15412610ccfd304c54effd5f0f0c91 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md @@ -14,7 +14,7 @@ Intel TBB version 4.1 is available on Anselm The module sets up environment variables, required for linking and running tbb enabled applications. !!! Note - Link the tbb library, using -ltbb + Link the tbb library, using -ltbb ## Examples diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md index 7f478b558b62699e8eece30865070558b9d0af7c..75424f383be454902c8d938248f8fbaeffbe007b 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md @@ -233,7 +233,7 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: !!! Note - Debugging + Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level @@ -421,7 +421,7 @@ If the code is parallelized using OpenMP a set of additional libraries is requir For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: !!! Note - /apps/intel/composer_xe_2013.5.192/compiler/lib/mic + /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so - libimf.so @@ -502,7 +502,7 @@ After executing the complied binary file, following output should be displayed. ``` !!! Note - More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> + More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -604,7 +604,7 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: !!! Note - **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) + **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. @@ -873,7 +873,7 @@ To run the MPI code using mpirun and the machine file "hosts_file_mix" use: A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: ```bash - Hello world from process 0 of 8 on host cn204 + Hello world from process 0 of 8 on host cn204 Hello world from process 1 of 8 on host cn204 Hello world from process 2 of 8 on host cn204-mic0 Hello world from process 3 of 8 on host cn204-mic0 @@ -891,7 +891,7 @@ A possible output of the MPI "hello-world" example executed on two hosts and two PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: !!! Note - **Host only node-file:** + **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: - /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file: diff --git a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md index 7d7dc89b89b39c346664feef868528a9f9bc4217..d930dfc505a3564652d512ce922b43ca4a2377a2 100644 --- a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md +++ b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md @@ -11,7 +11,7 @@ If an ISV application was purchased for educational (research) purposes and also ## Overview of the Licenses Usage !!! Note - The overview is generated every minute and is accessible from web or command line interface. + The overview is generated every minute and is accessible from web or command line interface. ### Web Interface diff --git a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md index 31e371f292832d03640595a0fb922d5763e79ed1..ea20bafc351bc975e6c4f20c1cb649989a16de63 100644 --- a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md +++ b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md @@ -27,7 +27,7 @@ Virtualization has also some drawbacks, it is not so easy to setup efficient sol Solution described in chapter [HOWTO](virtualization/#howto) is suitable for single node tasks, does not introduce virtual machine clustering. !!! Note - Please consider virtualization as last resort solution for your needs. + Please consider virtualization as last resort solution for your needs. !!! Warning Please consult use of virtualization with IT4Innovation's support. @@ -39,7 +39,7 @@ For running Windows application (when source code and Linux native application a IT4Innovations does not provide any licenses for operating systems and software of virtual machines. Users are ( in accordance with [Acceptable use policy document](http://www.it4i.cz/acceptable-use-policy.pdf)) fully responsible for licensing all software running in virtual machines on Anselm. Be aware of complex conditions of licensing software in virtual environments. !!! Note - Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines. + Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines. ## Howto @@ -249,7 +249,7 @@ Run virtual machine using optimized devices, user network back-end with sharing Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 for SSH). You must use VPN network). !!! Note - Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine. + Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine. ### Networking and Data Sharing diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md index 7d954569b14c633272ee4ee793f0a62703f7827c..37d9eab16554063b453727f148ed81ab1d63a2ad 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md @@ -7,7 +7,7 @@ The OpenMPI programs may be executed only via the PBS Workload manager, by enter ### Basic Usage !!! Note - Use the mpiexec to run the OpenMPI code. + Use the mpiexec to run the OpenMPI code. Example: @@ -28,7 +28,7 @@ Example: ``` !!! Note - Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node. + Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node. In this example, we allocate 4 nodes via the express queue interactively. We set up the openmpi environment and interactively run the helloworld_mpi.x program. Note that the executable helloworld_mpi.x must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. @@ -49,7 +49,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi In this example, we assume the executable helloworld_mpi.x is present on compute node cn17 on local scratch. We call the mpiexec whith the **--preload-binary** argument (valid for openmpi). The mpiexec will copy the executable from cn17 to the /lscratch/15210.srv11 directory on cn108, cn109 and cn110 and execute the program. !!! Note - MPI process mapping may be controlled by PBS parameters. + MPI process mapping may be controlled by PBS parameters. The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. @@ -98,7 +98,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity !!! Note - Important! Bind every OpenMP thread to a core! + Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: @@ -153,7 +153,7 @@ In this example, we see that ranks have been mapped on nodes according to the or Exact control of MPI process placement and resource binding is provided by specifying a rankfile !!! Note - Appropriate binding may boost performance of your application. + Appropriate binding may boost performance of your application. Example rankfile diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md index b2290555f209d8b65a7d6a3e04bb802bf2ae8ff2..80f695b068f50dc8def7fa90a2eed5510577f20c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md @@ -61,7 +61,7 @@ In this example, the openmpi 1.6.5 using intel compilers is activated ## Compiling MPI Programs !!! Note - After setting up your MPI environment, compile your program using one of the mpi wrappers + After setting up your MPI environment, compile your program using one of the mpi wrappers ```bash $ mpicc -v @@ -108,7 +108,7 @@ Compile the above example with ## Running MPI Programs !!! Note - The MPI program executable must be compatible with the loaded MPI module. + The MPI program executable must be compatible with the loaded MPI module. Always compile and execute using the very same MPI module. It is strongly discouraged to mix mpi implementations. Linking an application with one MPI implementation and running mpirun/mpiexec form other implementation may result in unexpected errors. @@ -120,7 +120,7 @@ The MPI program executable must be available within the same path on all nodes. Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern. !!! Note - Consider these ways to run an MPI program: + Consider these ways to run an MPI program: 1. One MPI process per node, 16 threads per process 2. Two MPI processes per node, 8 threads per process @@ -131,7 +131,7 @@ Optimal way to run an MPI program depends on its memory requirements, memory acc **Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and NUMA effect overheads. !!! Note - Important! Bind every OpenMP thread to a core! + Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md index 9fe89641117e55704c029a26043311b6aec3a96a..b8ec1b2d46a4c03a4fd7896ed2bc6074fa30ff16 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md @@ -7,7 +7,7 @@ The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS ### Basic Usage !!! Note - Use the mpirun to execute the MPICH2 code. + Use the mpirun to execute the MPICH2 code. Example: @@ -44,7 +44,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node. !!! Note - MPI process mapping may be controlled by PBS parameters. + MPI process mapping may be controlled by PBS parameters. The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. @@ -93,7 +93,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity !!! Note - Important! Bind every OpenMP thread to a core! + Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md index 1f933d82f633a98cf16394bdf10e608d2f2c1b8f..c46ea816087b5a51c4ea86c3fcc1ddd8d2205237 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md @@ -42,7 +42,7 @@ plots, images, etc... will be still available. ## Running Parallel Matlab Using Distributed Computing Toolbox / Engine !!! Note - Distributed toolbox is available only for the EDU variant + Distributed toolbox is available only for the EDU variant The MPIEXEC mode available in previous versions is no longer available in MATLAB 2015. Also, the programming interface has changed. Refer to [Release Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1). @@ -65,7 +65,7 @@ Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click I With the new mode, MATLAB itself launches the workers via PBS, so you can either use interactive mode or a batch mode on one node, but the actual parallel processing will be done in a separate job started by MATLAB itself. Alternatively, you can use "local" mode to run parallel code on just a single node. !!! Note - The profile is confusingly named Salomon, but you can use it also on Anselm. + The profile is confusingly named Salomon, but you can use it also on Anselm. ### Parallel Matlab Interactive Session diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md index c69a9eea9debc508b84dbe0b502e205272451eea..9ab8ff3f2ad8cc51921b39937921a55d61296cd5 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab_1314.md @@ -3,7 +3,7 @@ ## Introduction !!! Note - This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/). + This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/). Matlab is available in the latest stable version. There are always two variants of the release: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md index fa6a0378ddc36a1b085d0e24f625fd40030679cb..76dcc3b30e21467f1c162c103a489078d59e8d1e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md @@ -97,7 +97,7 @@ Octave is linked with parallel Intel MKL, so it best suited for batch processing variable. !!! Note - Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. + Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. To use Octave on a node with Xeon Phi: diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md index c99930167290cd2b2707a6d07a0f61a1ef89ad85..39a8dbf1c88553a776d84a88b891522eb520b597 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md @@ -96,7 +96,7 @@ Download the package [parallell](package-parallel-vignette.pdf) vignette. The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. !!! Note - Forking via package parallel provides functionality similar to OpenMP construct + Forking via package parallel provides functionality similar to OpenMP construct omp parallel for @@ -147,7 +147,7 @@ Every evaluation of the integrad function runs in parallel on different process. ## Package Rmpi !!! Note - package Rmpi provides an interface (wrapper) to MPI APIs. + package Rmpi provides an interface (wrapper) to MPI APIs. It also provides interactive R slave environment. On Anselm, Rmpi provides interface to the [OpenMPI](../mpi-1/Running_OpenMPI/). @@ -297,7 +297,7 @@ Execute the example as: mpi.apply is a specific way of executing Dynamic Rmpi programs. !!! Note - mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. + mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. Execution is identical to other dynamic Rmpi programs. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md index 42e05e01e70a551f1eb7f60ef7d66203320d8c81..c222f768b20aea1e3a5e99223390d44a9785f75b 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md @@ -23,7 +23,7 @@ Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, comp The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. !!! Note - Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> + Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> All GCC versions of **HDF5 1.8.13** are not affected by the bug, are compiled with -O3 optimizations and are recommended for production computations. diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md index ef2ee580914212b99eda25128ba0dc4063994b5b..8637723c60198687b6fd5a4434f1a03a6e63ebd1 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -13,10 +13,10 @@ To be able to compile and link code with MAGMA library user has to load followin To make compilation more user friendly module also sets these two environment variables: !!! Note - MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) + MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) !!! Note - MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). + MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). Compilation example: @@ -31,16 +31,16 @@ Compilation example: MAGMA implementation for Intel MIC requires a MAGMA server running on accelerator prior to executing the user application. The server can be started and stopped using following scripts: !!! Note - To start MAGMA server use: - **$MAGMAROOT/start_magma_server** + To start MAGMA server use: + **$MAGMAROOT/start_magma_server** !!! Note - To stop the server use: - **$MAGMAROOT/stop_magma_server** + To stop the server use: + **$MAGMAROOT/stop_magma_server** !!! Note - For deeper understanding how the MAGMA server is started, see the following script: - **$MAGMAROOT/launch_anselm_from_mic.sh** + For deeper understanding how the MAGMA server is started, see the following script: + **$MAGMAROOT/launch_anselm_from_mic.sh** To test if the MAGMA server runs properly we can run one of examples that are part of the MAGMA installation: diff --git a/docs.it4i/anselm-cluster-documentation/software/openfoam.md b/docs.it4i/anselm-cluster-documentation/software/openfoam.md index 350340fbc31d1099001f44e5b69fb1de1a64bd4d..d52c0d9e792ebc52bfae271c24cab3c1eb964b30 100644 --- a/docs.it4i/anselm-cluster-documentation/software/openfoam.md +++ b/docs.it4i/anselm-cluster-documentation/software/openfoam.md @@ -58,7 +58,7 @@ To create OpenFOAM environment on ANSELM give the commands: ``` !!! Note - Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SPâ€. + Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SPâ€. Create a project directory within the $HOME/OpenFOAM directory named \<USER\>-\<OFversion\> and create a directory named run within it, e.g. by typing: @@ -121,7 +121,7 @@ Run the second case for example external incompressible turbulent flow - case - First we must run serial application bockMesh and decomposePar for preparation of parallel computation. !!! Note - Create a Bash scrip test.sh: + Create a Bash scrip test.sh: ```bash #!/bin/bash @@ -146,7 +146,7 @@ Job submission This job create simple block mesh and domain decomposition. Check your decomposition, and submit parallel computation: !!! Note - Create a PBS script testParallel.pbs: + Create a PBS script testParallel.pbs: ```bash #!/bin/bash diff --git a/docs.it4i/anselm-cluster-documentation/storage.md b/docs.it4i/anselm-cluster-documentation/storage.md index 67a08d875a9c13e397413bfc37b5e277be69178d..264fe8e05df890eafbbac20280a8d0f81da2255b 100644 --- a/docs.it4i/anselm-cluster-documentation/storage.md +++ b/docs.it4i/anselm-cluster-documentation/storage.md @@ -27,7 +27,7 @@ There is default stripe configuration for Anselm Lustre filesystems. However, us 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. !!! Note - Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -61,14 +61,14 @@ $ man lfs ### Hints on Lustre Stripping !!! Note - Increase the stripe_count for parallel I/O to the same file. + Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. !!! Note - Using a large stripe size can improve performance when accessing very large files + Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -103,7 +103,7 @@ The architecture of Lustre on Anselm is composed of two metadata servers (MDS) The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 320TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 250GB per user. If 250GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. !!! Note - The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. + The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME filesystem should not be used to archive data of past Projects or other unrelated data. @@ -115,7 +115,7 @@ The HOME filesystem is realized as Lustre parallel filesystem and is available o Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for the HOME filesystem. !!! Note - Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | HOME filesystem | | | -------------------- | ------ | @@ -132,7 +132,7 @@ Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for t The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. !!! Note - The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. + The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. >Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. @@ -141,7 +141,7 @@ The SCRATCH filesystem is mounted in directory /scratch. Users may freely create The SCRATCH filesystem is realized as Lustre parallel filesystem and is available from all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 10 OSTs dedicated for the SCRATCH filesystem. !!! Note - Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. | SCRATCH filesystem | | | -------------------- | -------- | @@ -261,7 +261,7 @@ Default ACL mechanism can be used to replace setuid/setgid permissions on direct ### Local Scratch !!! Note - Every computational node is equipped with 330GB local scratch disk. + Every computational node is equipped with 330GB local scratch disk. Use local scratch in case you need to access large amount of small files during your calculation. @@ -270,7 +270,7 @@ The local scratch disk is mounted as /lscratch and is accessible to user at /lsc The local scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs that access large number of small files within the calculation must use the local scratch filesystem as their working directory. This is required for performance reasons, as frequent access to number of small files may overload the metadata servers (MDS) of the Lustre filesystem. !!! Note - The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. + The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | local SCRATCH filesystem | | | ------------------------ | -------------------- | @@ -285,14 +285,14 @@ The local scratch filesystem is intended for temporary scratch data generated d Every computational node is equipped with filesystem realized in memory, so called RAM disk. !!! Note - Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. + Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. Size of RAM disk filesystem is limited. Be very careful, use of RAM disk filesystem is at the expense of operational memory. It is not recommended to allocate large amount of memory and use large amount of data in RAM disk filesystem at the same time. !!! Note - The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. + The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. | RAM disk | | | ----------- | ------------------------------------------------------------------------------------------------------- | @@ -321,7 +321,7 @@ Each node is equipped with local /tmp directory of few GB capacity. The /tmp dir Do not use shared filesystems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. !!! Note - The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). + The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -340,14 +340,14 @@ The procedure to obtain the CESNET access is quick and trouble-free. ### Understanding CESNET Storage !!! Note - It is very important to understand the CESNET storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. + It is very important to understand the CESNET storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access !!! Note - SSHFS: The storage will be mounted like a local hard drive + SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -392,7 +392,7 @@ Once done, please remember to unmount the storage ### Rsync Access !!! Note - Rsync provides delta transfer for best performance, can resume interrupted transfers + Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. diff --git a/docs.it4i/salomon/capacity-computing.md b/docs.it4i/salomon/capacity-computing.md index b8c0cd66e7739a73a5fd5332f54eab694b4bb6ea..7c0404228e7710c9ce1f8e944e76020e016e394b 100644 --- a/docs.it4i/salomon/capacity-computing.md +++ b/docs.it4i/salomon/capacity-computing.md @@ -7,7 +7,7 @@ In many cases, it is useful to submit huge (100+) number of computational jobs i However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1500 per job array** !!! Note - Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. + Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. - Use [Job arrays](capacity-computing.md#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs - Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs @@ -21,7 +21,7 @@ However, executing huge number of jobs via the PBS queue may strain the system. ## Job Arrays !!! Note - Huge number of jobs may be easily submitted and managed as a job array. + Huge number of jobs may be easily submitted and managed as a job array. A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: @@ -152,7 +152,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ## GNU Parallel !!! Note - Use GNU parallel to run many single core tasks on one node. + Use GNU parallel to run many single core tasks on one node. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. @@ -224,12 +224,12 @@ In this example, we submit a job of 101 tasks. 24 input files will be processed ## Job Arrays and GNU Parallel !!! Note - Combine the Job arrays and GNU parallel for best throughput of single core jobs + Combine the Job arrays and GNU parallel for best throughput of single core jobs While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. !!! Note - Every subjob in an array runs GNU parallel to utilize all cores on the node + Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU Parallel, Shared jobscript @@ -284,7 +284,7 @@ cp output $PBS_O_WORKDIR/$TASK.out In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. !!! Note - Select subjob walltime and number of tasks per subjob carefully + Select subjob walltime and number of tasks per subjob carefully When deciding this values, think about following guiding rules : diff --git a/docs.it4i/salomon/environment-and-modules.md b/docs.it4i/salomon/environment-and-modules.md index c1adc49d2af400553a84bc2df9b4c6de625dee06..f94fa017b6668f39978ae856f286c217d7d2e138 100644 --- a/docs.it4i/salomon/environment-and-modules.md +++ b/docs.it4i/salomon/environment-and-modules.md @@ -24,7 +24,7 @@ fi ``` !!! Note - Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example. + Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example. ### Application Modules @@ -57,7 +57,7 @@ Application modules on Salomon cluster are built using [EasyBuild](http://hpcuge ``` !!! Note - The modules set up the application paths, library paths and environment variables for running particular application. + The modules set up the application paths, library paths and environment variables for running particular application. The modules may be loaded, unloaded and switched, according to momentary needs. diff --git a/docs.it4i/salomon/job-priority.md b/docs.it4i/salomon/job-priority.md index 090d6ff31be97364b89139fbce9f6de3b52e0914..3f2693588f49fff75359e4b0f70f5eb92e7bfd23 100644 --- a/docs.it4i/salomon/job-priority.md +++ b/docs.it4i/salomon/job-priority.md @@ -37,7 +37,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut # Jobs Queued in Queue qexp Are Not Calculated to Project's Usage. !!! Note - Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>. + Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -66,7 +66,7 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu It means, that jobs with lower execution priority can be run before jobs with higher execution priority. !!! Note - It is **very beneficial to specify the walltime** when submitting jobs. + It is **very beneficial to specify the walltime** when submitting jobs. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. diff --git a/docs.it4i/salomon/job-submission-and-execution.md b/docs.it4i/salomon/job-submission-and-execution.md index b5d83ad1460c4c1493133640ef6fad314628dc32..01d1870072bf6489b9cf7ac9f397d8bee66da7e4 100644 --- a/docs.it4i/salomon/job-submission-and-execution.md +++ b/docs.it4i/salomon/job-submission-and-execution.md @@ -12,7 +12,7 @@ When allocating computational resources for the job, please specify 6. Jobscript or interactive switch !!! Note - Use the **qsub** command to submit your job to a queue for allocation of the computational resources. + Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: @@ -23,7 +23,7 @@ $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] job The qsub submits the job into the queue, in another words the qsub command creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to above described policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** !!! Note - PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster. + PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster. ### Job Submission Examples @@ -72,7 +72,7 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores ### UV2000 SMP !!! Note - 14 NUMA nodes available on UV2000 + 14 NUMA nodes available on UV2000 Per NUMA node allocation. Jobs are isolated by cpusets. @@ -109,7 +109,7 @@ $ qsub -m n ### Placement by Name !!! Note - Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks. + Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks. Specific nodes may be selected using PBS resource attribute host (for hostnames): @@ -136,7 +136,7 @@ For communication intensive jobs it is possible to set stricter requirement - to Nodes directly connected to the same InifiBand switch can communicate most efficiently. Using the same switch prevents hops in the network and provides for unbiased, most efficient network communication. There are 9 nodes directly connected to every InifiBand switch. !!! Note - We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. + We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. Nodes directly connected to the one InifiBand switch can be allocated using node grouping on PBS resource attribute switch. @@ -149,7 +149,7 @@ $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob ### Placement by Specific InifiBand Switch !!! Note - Not useful for ordinary computing, suitable for testing and management tasks. + Not useful for ordinary computing, suitable for testing and management tasks. Nodes directly connected to the specific InifiBand switch can be selected using the PBS resource attribute _switch_. @@ -234,7 +234,7 @@ r1i0n11 ## Job Management !!! Note - Check status of your jobs using the **qstat** and **check-pbs-jobs** commands + Check status of your jobs using the **qstat** and **check-pbs-jobs** commands ```bash $ qstat -a @@ -313,7 +313,7 @@ Run loop 3 In this example, we see actual output (some iteration loops) of the job 35141.dm2 !!! Note - Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel,** **qsig** or **qalter** commands + Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel,** **qsig** or **qalter** commands You may release your allocation at any time, using qdel command @@ -338,12 +338,12 @@ $ man pbs_professional ### Jobscript !!! Note - Prepare the jobscript to run batch jobs in the PBS queue system + Prepare the jobscript to run batch jobs in the PBS queue system The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager. !!! Note - The jobscript or interactive shell is executed on first of the allocated nodes. + The jobscript or interactive shell is executed on first of the allocated nodes. ```bash $ qsub -q qexp -l select=4:ncpus=24 -N Name0 ./myjob @@ -360,7 +360,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time In this example, the nodes r21u01n577, r21u02n578, r21u03n579, r21u04n580 were allocated for 1 hour via the qexp queue. The jobscript myjob will be executed on the node r21u01n577, while the nodes r21u02n578, r21u03n579, r21u04n580 are available for use as well. !!! Note - The jobscript or interactive shell is by default executed in home directory + The jobscript or interactive shell is by default executed in home directory ```bash $ qsub -q qexp -l select=4:ncpus=24 -I @@ -374,7 +374,7 @@ $ pwd In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory. !!! Note - All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. + All nodes within the allocation may be accessed via ssh. Unallocated nodes are not accessible to user. The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well. @@ -406,7 +406,7 @@ In this example, the hostname program is executed via pdsh from the interactive ### Example Jobscript for MPI Calculation !!! Note - Production jobs must use the /scratch directory for I/O + Production jobs must use the /scratch directory for I/O The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. @@ -438,12 +438,12 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. !!! Note - Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. + Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. !!! Note - Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. + Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. ### Example Jobscript for MPI Calculation With Preloaded Inputs @@ -477,7 +477,7 @@ HTML commented section #2 (examples need to be reworked) ### Example Jobscript for Single Node Calculation !!! Note - Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory. + Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory. Example jobscript for single node calculation, using [local scratch](storage/) on the node: diff --git a/docs.it4i/salomon/prace.md b/docs.it4i/salomon/prace.md index 4c0f22f746830d77a1fc1a05ac599ac868f20c52..eb90adea5116e03ddc4c10223dd5394a9bc45044 100644 --- a/docs.it4i/salomon/prace.md +++ b/docs.it4i/salomon/prace.md @@ -249,7 +249,7 @@ PRACE users should check their project accounting using the [PRACE Accounting To Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". You need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". !!! Note - The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> + The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> ```bash $ it4ifree diff --git a/docs.it4i/salomon/resource-allocation-and-job-execution.md b/docs.it4i/salomon/resource-allocation-and-job-execution.md index 489e9de698de0eec0b036684254d1269448d1c75..7f8c1e3dc29223510b04a681196ec9d692a48c73 100644 --- a/docs.it4i/salomon/resource-allocation-and-job-execution.md +++ b/docs.it4i/salomon/resource-allocation-and-job-execution.md @@ -14,14 +14,14 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qfree**, the Free resource utilization queue !!! Note - Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> + Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> Read more on the [Resource Allocation Policy](resources-allocation-policy/) page. ## Job Submission and Execution !!! Note - Use the **qsub** command to submit your jobs. + Use the **qsub** command to submit your jobs. The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 24 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** diff --git a/docs.it4i/salomon/resources-allocation-policy.md b/docs.it4i/salomon/resources-allocation-policy.md index 5d97c4bdf47074f50a074d08c46cf1db620656fb..1ceaf4846755c1e4ce977ead17a7e677fd6b1c2a 100644 --- a/docs.it4i/salomon/resources-allocation-policy.md +++ b/docs.it4i/salomon/resources-allocation-policy.md @@ -5,7 +5,7 @@ The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: !!! Note - Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> + Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | | ------------------------------- | -------------- | ----------------- | ------------------------------------------------------------- | --------- | -------- | ------------- | --------- | @@ -18,7 +18,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const | **qviz** Visualization queue | yes | none required | 2 (with NVIDIA Quadro K5000) | 4 | 150 | no | 1 / 8h | !!! Note - **The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy/#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. + **The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy/#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. - **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour. - **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, however only 86 per job. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. @@ -29,7 +29,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const - **qviz**, the Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 73 GB of RAM and 1/7 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 4 cores each) up to one whole node per user, so that all 28 cores, 512 GB RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. !!! Note - To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/). + To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/). ### Notes @@ -42,7 +42,7 @@ Salomon users may check current queue configuration at <https://extranet.it4i.cz ### Queue Status !!! Note - Check the status of jobs, queues and compute nodes at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon) + Check the status of jobs, queues and compute nodes at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon)  @@ -120,7 +120,7 @@ The resources that are currently subject to accounting are the core-hours. The c ### Check Consumed Resources !!! Note - The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> + The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. diff --git a/docs.it4i/salomon/shell-and-data-access.md b/docs.it4i/salomon/shell-and-data-access.md index 4b01e65ddc3c048291cee2007db2d3836e2f055b..26333a6f9868eaa83b31cabcf2de383ddfa6dd00 100644 --- a/docs.it4i/salomon/shell-and-data-access.md +++ b/docs.it4i/salomon/shell-and-data-access.md @@ -5,7 +5,7 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, login3 and login4 at address salomon.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address. !!! Note - The alias salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to VPN. + The alias salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to VPN. | Login address | Port | Protocol | Login node | | ---------------------- | ---- | -------- | ------------------------------------- | @@ -18,9 +18,9 @@ The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) !!! Note - Please verify SSH fingerprints during the first logon. They are identical on all login nodes: - f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) - 70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) + Please verify SSH fingerprints during the first logon. They are identical on all login nodes: + f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) + 70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) Private key authentication: @@ -57,7 +57,7 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com ``` !!! Note - The environment is **not** shared between login nodes, except for [shared filesystems](storage/). + The environment is **not** shared between login nodes, except for [shared filesystems](storage/). ## Data Transfer @@ -121,7 +121,7 @@ Outgoing connections, from Salomon Cluster login nodes to the outside world, are | 9418 | git | !!! Note - Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. + Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. Outgoing connections, from Salomon Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. @@ -130,7 +130,7 @@ Outgoing connections, from Salomon Cluster compute nodes are restricted to the i ### Port Forwarding From Login Nodes !!! Note - Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. + Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. It works by tunneling the connection from Salomon back to users workstation and forwarding from the workstation to the remote host. @@ -171,7 +171,7 @@ In this example, we assume that port forwarding from login1:6000 to remote.host. Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. !!! Note - Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. + Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: diff --git a/docs.it4i/salomon/software/chemistry/molpro.md b/docs.it4i/salomon/software/chemistry/molpro.md index ca9258766d6ab0e51c31ef50b02717c976ae4cc7..bf01d750fad5580b159741953b8ea778096ae98a 100644 --- a/docs.it4i/salomon/software/chemistry/molpro.md +++ b/docs.it4i/salomon/software/chemistry/molpro.md @@ -33,7 +33,7 @@ Compilation parameters are default: Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. !!! Note - The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. + The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH filesystem](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch filesystem. diff --git a/docs.it4i/salomon/software/chemistry/phono3py.md b/docs.it4i/salomon/software/chemistry/phono3py.md index d680731167baaa9ccd70d6cafa63a781101f2681..b453bb6fdd380842b8343ba61a0e93c31ebc8b2e 100644 --- a/docs.it4i/salomon/software/chemistry/phono3py.md +++ b/docs.it4i/salomon/software/chemistry/phono3py.md @@ -5,7 +5,7 @@ This GPL software calculates phonon-phonon interactions via the third order force constants. It allows to obtain lattice thermal conductivity, phonon lifetime/linewidth, imaginary part of self energy at the lowest order, joint density of states (JDOS) and weighted-JDOS. For details see Phys. Rev. B 91, 094306 (2015) and <http://atztogo.github.io/phono3py/index.html> !!! Note - Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module + Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module ```bash $ module load phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 diff --git a/docs.it4i/salomon/software/debuggers/aislinn.md b/docs.it4i/salomon/software/debuggers/aislinn.md index c881c078c7b351badd3c06512eb9f2fdf846d739..29bb3c4e8f3e664856ab81b7a6c5d45ad31b17d8 100644 --- a/docs.it4i/salomon/software/debuggers/aislinn.md +++ b/docs.it4i/salomon/software/debuggers/aislinn.md @@ -6,7 +6,7 @@ - Web page of the project: <http://verif.cs.vsb.cz/aislinn/> !!! Note - Aislinn is software developed at IT4Innovations and some parts are still considered experimental. If you have any questions or experienced any problems, please contact the author: <mailto:stanislav.bohm@vsb.cz>. + Aislinn is software developed at IT4Innovations and some parts are still considered experimental. If you have any questions or experienced any problems, please contact the author: <mailto:stanislav.bohm@vsb.cz>. ### Usage diff --git a/docs.it4i/salomon/software/debuggers/allinea-ddt.md b/docs.it4i/salomon/software/debuggers/allinea-ddt.md index bde30948226667d5723067710523566ccd6538ca..3cd22b2c5feeb02925b51ecb9451f902e248da51 100644 --- a/docs.it4i/salomon/software/debuggers/allinea-ddt.md +++ b/docs.it4i/salomon/software/debuggers/allinea-ddt.md @@ -48,7 +48,7 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: !!! Note - \- **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. + \- **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. - - **O0** : Suppress all optimizations. diff --git a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md index 14f54e72e7a0cce19406f236a647d89cc1246d52..d774d7e4a14ecbede2a6c85850a784803126cfd2 100644 --- a/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md @@ -69,7 +69,7 @@ This mode is useful for native Xeon Phi applications launched directly on the ca This mode is useful for applications that are launched from the host and use offload, OpenCL or mpirun. In *Analysis Target* window, select *Intel Xeon Phi coprocessor (native)*, choose path to the binaryand MIC card to run on. !!! Note - If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. + If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : diff --git a/docs.it4i/salomon/software/debuggers/total-view.md b/docs.it4i/salomon/software/debuggers/total-view.md index 508350571a558fc7b564a6800574d85b9447a917..450efd1d0c7bb90c16db0851e6bd10dd838e5ad9 100644 --- a/docs.it4i/salomon/software/debuggers/total-view.md +++ b/docs.it4i/salomon/software/debuggers/total-view.md @@ -46,7 +46,7 @@ Compile the code: Before debugging, you need to compile your code with theses flags: !!! Note - **-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. + **-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. **-O0** : Suppress all optimizations. diff --git a/docs.it4i/salomon/software/intel-xeon-phi.md b/docs.it4i/salomon/software/intel-xeon-phi.md index 9fbdb31eb1736420a6ea254928c8db9676941fea..4dfc251a49328535a3d336030691400bc49323f7 100644 --- a/docs.it4i/salomon/software/intel-xeon-phi.md +++ b/docs.it4i/salomon/software/intel-xeon-phi.md @@ -233,7 +233,7 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: !!! Note - Debugging + Debugging openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level @@ -420,7 +420,7 @@ If the code is parallelized using OpenMP a set of additional libraries is requir For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: !!! Note - /apps/intel/composer_xe_2013.5.192/compiler/lib/mic + /apps/intel/composer_xe_2013.5.192/compiler/lib/mic - libiomp5.so - libimf.so @@ -501,7 +501,7 @@ After executing the complied binary file, following output should be displayed. ``` !!! Note - More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> + More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -872,7 +872,7 @@ To run the MPI code using mpirun and the machine file "hosts_file_mix" use: A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: ```bash - Hello world from process 0 of 8 on host cn204 + Hello world from process 0 of 8 on host cn204 Hello world from process 1 of 8 on host cn204 Hello world from process 2 of 8 on host cn204-mic0 Hello world from process 3 of 8 on host cn204-mic0 @@ -890,7 +890,7 @@ A possible output of the MPI "hello-world" example executed on two hosts and two PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: !!! Note - **Host only node-file:** + **Host only node-file:** - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: - /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file: diff --git a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md index 4c742a6191de85dba2003ec6d02594e895fc225b..e66ab9fc860af4e0958edb30e16ac8d2c3400239 100644 --- a/docs.it4i/salomon/software/mpi/Running_OpenMPI.md +++ b/docs.it4i/salomon/software/mpi/Running_OpenMPI.md @@ -95,7 +95,7 @@ In this example, we demonstrate recommended way to run an MPI application, using ### OpenMP Thread Affinity !!! Note - Important! Bind every OpenMP thread to a core! + Important! Bind every OpenMP thread to a core! In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: diff --git a/docs.it4i/salomon/software/mpi/mpi.md b/docs.it4i/salomon/software/mpi/mpi.md index 2428b60d24fd31d9e160c671be9926204fe339f8..4b2b721ff4dba995ad382f3603b8e52e61660345 100644 --- a/docs.it4i/salomon/software/mpi/mpi.md +++ b/docs.it4i/salomon/software/mpi/mpi.md @@ -127,7 +127,7 @@ Consider these ways to run an MPI program: **Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and numa effect overheads. !!! Note - Important! Bind every OpenMP thread to a core! + Important! Bind every OpenMP thread to a core! In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. diff --git a/docs.it4i/salomon/software/numerical-languages/octave.md b/docs.it4i/salomon/software/numerical-languages/octave.md index a9a82dfc0e88d777754465e602ec9a18cf40b188..eda9196ae8972e946d32361067e3bd43c4762721 100644 --- a/docs.it4i/salomon/software/numerical-languages/octave.md +++ b/docs.it4i/salomon/software/numerical-languages/octave.md @@ -9,7 +9,7 @@ Two versions of octave are available on the cluster, via module | **Stable** | Octave 3.8.2 | Octave | ```bash - $ module load Octave + $ module load Octave ``` The octave on the cluster is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, octave would parallelize to 24 threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. @@ -50,7 +50,7 @@ This script may be submitted directly to the PBS workload manager via the qsub c The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. ```bash - $ mkoctfile -v + $ mkoctfile -v ``` Octave may use MPI for interprocess communication This functionality is currently not supported on the cluster cluster. In case you require the octave interface to MPI, please contact our [cluster support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/salomon/software/numerical-languages/r.md b/docs.it4i/salomon/software/numerical-languages/r.md index 9afa31655aa34f07ff217c5ece8f6de298e691e2..138e4da07151f4e9e802ef447c8ad7bdad7ec190 100644 --- a/docs.it4i/salomon/software/numerical-languages/r.md +++ b/docs.it4i/salomon/software/numerical-languages/r.md @@ -96,7 +96,7 @@ Download the package [parallell](package-parallel-vignette.pdf) vignette. The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. !!! warning - Forking via package parallel provides functionality similar to OpenMP construct omp parallel for + Forking via package parallel provides functionality similar to OpenMP construct omp parallel for Only cores of single node can be utilized this way! diff --git a/docs.it4i/salomon/storage.md b/docs.it4i/salomon/storage.md index cd42d086625f26020199732afd5eb3e377a4edaf..e2750682ce983c1e957ca93505a54332fb68ddab 100644 --- a/docs.it4i/salomon/storage.md +++ b/docs.it4i/salomon/storage.md @@ -350,7 +350,7 @@ Once registered for CESNET Storage, you may [access the storage](https://du.cesn ### SSHFS Access !!! Note - SSHFS: The storage will be mounted like a local hard drive + SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable hard drive. Files can be than copied in and out in a usual fashion. @@ -395,7 +395,7 @@ Once done, please remember to unmount the storage ### Rsync Access !!! Note - Rsync provides delta transfer for best performance, can resume interrupted transfers + Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. diff --git a/docs.it4i/software/lmod.md b/docs.it4i/software/lmod.md index 5ba63f7e03762e356a0d74cfb4eb4826682314a6..00e70819ce8c8bac76f0d14d42b39c10afcd0a67 100644 --- a/docs.it4i/software/lmod.md +++ b/docs.it4i/software/lmod.md @@ -108,7 +108,7 @@ $ ml spider gcc ``` !!! tip - spider is case-insensitive. + spider is case-insensitive. If you use spider on a full module name like GCC/6.2.0-2.27 it will tell on which cluster(s) that module available: @@ -148,7 +148,7 @@ Use "module keyword key1 key2 ..." to search for all possible modules matching a ``` !!! tip - the specified software name is treated case-insensitively. + the specified software name is treated case-insensitively. Lmod does a partial match on the module name, so sometimes you need to use / to indicate the end of the software name you are interested in: @@ -196,7 +196,7 @@ setenv("EBEXTSLISTPYTHON","setuptools-20.1.1,pip-8.0.2,nose-1.3.7") ``` !!! tip - Note that both the direct changes to the environment as well as other modules that will be loaded are shown. + Note that both the direct changes to the environment as well as other modules that will be loaded are shown. If you're not sure what all of this means: don't worry, you don't have to know; just try loading the module as try using the software. @@ -224,12 +224,12 @@ Currently Loaded Modules: ``` !!! tip - Note that even though we only loaded a single module, the output of ml shows that a whole bunch of modules were loaded, which are required dependencies for intel/2017.00. + Note that even though we only loaded a single module, the output of ml shows that a whole bunch of modules were loaded, which are required dependencies for intel/2017.00. ## Conflicting Modules !!! warning - It is important to note that **only modules that are compatible with each other can be loaded together. In particular, modules must be installed either with the same toolchain as the modules that** are already loaded, or with a compatible (sub)toolchain. + It is important to note that **only modules that are compatible with each other can be loaded together. In particular, modules must be installed either with the same toolchain as the modules that** are already loaded, or with a compatible (sub)toolchain. For example, once you have loaded one or more modules that were installed with the intel/2017.00 toolchain, all other modules that you load should have been installed with the same toolchain.