From c161162183f67caf2e0a4316ea206e17abd52d31 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Luk=C3=A1=C5=A1=20Krup=C4=8D=C3=ADk?= <lukas.krupcik@vsb.cz> Date: Wed, 31 Aug 2016 12:30:32 +0200 Subject: [PATCH] repair external and internal links --- .../software/gpi2.md | 13 +-- .../software/intel-xeon-phi.md | 85 +++++++++++-------- .../storage/cesnet-data-storage.md | 12 ++- .../storage/storage.md | 38 ++++++--- 4 files changed, 90 insertions(+), 58 deletions(-) diff --git a/docs.it4i/anselm-cluster-documentation/software/gpi2.md b/docs.it4i/anselm-cluster-documentation/software/gpi2.md index ca1950dc8..7de6e0281 100644 --- a/docs.it4i/anselm-cluster-documentation/software/gpi2.md +++ b/docs.it4i/anselm-cluster-documentation/software/gpi2.md @@ -21,7 +21,8 @@ The module sets up environment variables, required for linking and running GPI-2 Linking ------- ->Link with -lGPI2 -libverbs +!!! Note "Note" + Link with -lGPI2 -libverbs Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link your code against GPI-2. The GPI-2 requires the OFED infinband communication library ibverbs. @@ -43,9 +44,9 @@ Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link Running the GPI-2 codes ----------------------- -gaspi_run ->gaspi_run starts the GPI-2 application +!!! Note "Note" + gaspi_run starts the GPI-2 application The gaspi_run utility is used to start and run GPI-2 applications: @@ -79,7 +80,8 @@ machinefle: This machinefile will run 4 GPI-2 processes, 2 on node cn79 o 2 on node cn80. ->Use the **mpiprocs** to control how many GPI-2 processes will run per node +!!! Note "Note" + Use the **mpiprocs** to control how many GPI-2 processes will run per node Example: @@ -91,7 +93,8 @@ This example will produce $PBS_NODEFILE with 16 entries per node. ### gaspi_logger ->gaspi_logger views the output form GPI-2 application ranks +!!! Note "Note" + gaspi_logger views the output form GPI-2 application ranks The gaspi_logger utility is used to view the output from all nodes except the master node (rank 0). The gaspi_logger is started, on another session, on the master node - the node where the gaspi_run is executed. The output of the application, when called with gaspi_printf(), will be redirected to the gaspi_logger. Other I/O routines (e.g. printf) will not. diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md index e5ef60196..feb70b901 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md @@ -230,12 +230,14 @@ During the compilation Intel compiler shows which loops have been vectorized in Some interesting compiler flags useful not only for code debugging are: ->Debugging - openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level - vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level +!!! Note "Note" + Debugging ->Performance ooptimization - xhost - FOR HOST ONLY - to generate AVX (Advanced Vector Extensions) instructions. + openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level + vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level + + Performance ooptimization + xhost - FOR HOST ONLY - to generate AVX (Advanced Vector Extensions) instructions. Automatic Offload using Intel MKL Library ----------------------------------------- @@ -325,7 +327,8 @@ Following example show how to automatically offload an SGEMM (single precision - } ``` ->Please note: This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** +!!! Note "Note" + Please note: This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: @@ -367,7 +370,8 @@ To compile a code user has to be connected to a compute with MIC and load Intel $ module load intel/13.5.192 ``` ->Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. +!!! Note "Note" + Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: @@ -409,17 +413,19 @@ If the code is parallelized using OpenMP a set of additional libraries is requir mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH ``` ->Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. +!!! Note "Note" + Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: ->/apps/intel/composer_xe_2013.5.192/compiler/lib/mic +!!! Note "Note" + /apps/intel/composer_xe_2013.5.192/compiler/lib/mic ->libiomp5.so -libimf.so -libsvml.so -libirng.so -libintlc.so.5 + libiomp5.so + libimf.so + libsvml.so + libirng.so + libintlc.so.5 Finally, to run the compiled code use: @@ -494,7 +500,8 @@ After executing the complied binary file, following output should be displayed. ... ``` ->More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> +!!! Note "Note" + More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. @@ -533,7 +540,8 @@ To see the performance of Intel Xeon Phi performing the DGEMM run the example as ... ``` ->Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. +!!! Note "Note" + Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. MPI ----------------- @@ -595,11 +603,12 @@ An example of basic MPI version of "hello-world" example in C language, that can Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: ->**Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) +!!! Note "Note" + **Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) ->**Coprocessor-only model** - all MPI ranks reside only on the coprocessors. + **Coprocessor-only model** - all MPI ranks reside only on the coprocessors. ->**Symmetric model** - the MPI ranks reside on both the host and the coprocessor. Most general MPI case. + **Symmetric model** - the MPI ranks reside on both the host and the coprocessor. Most general MPI case. ###Host-only model @@ -641,9 +650,10 @@ Similarly to execution of OpenMP programs in native mode, since the environmenta export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH ``` ->Please note: - - this file sets up both environmental variable for both MPI and OpenMP libraries. - - this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. +!!! Note "Note" + Please note: + - this file sets up both environmental variable for both MPI and OpenMP libraries. + - this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. To access a MIC accelerator located on a node that user is currently connected to, use: @@ -694,9 +704,10 @@ or using mpirun $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` ->Please note: - - the full path to the binary has to specified (here: "**>~/mpi-test-mic**") - - the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code +!!! Note "Note" + Please note: + - the full path to the binary has to specified (here: "**>~/mpi-test-mic**") + - the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code The output should be again similar to: @@ -707,7 +718,8 @@ The output should be again similar to: Hello world from process 0 of 4 on host cn207-mic0 ``` ->>Please note that the **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: +!!! Note "Note" + Please note that the **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: ```bash   $ ssh mic0 ls /bin/pmi_proxy @@ -739,10 +751,11 @@ For example: This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. ->Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: -- to connect to the second node : ** $ ssh cn205** -- to connect to the accelerator on the first node from the first node: **$ ssh cn204-mic0** or **$ ssh mic0** -- to connect to the accelerator on the second node from the first node: **$ ssh cn205-mic0** +!!! Note "Note" + Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: + - to connect to the second node : ** $ ssh cn205** + - to connect to the accelerator on the first node from the first node: **$ ssh cn204-mic0** or **$ ssh mic0** + - to connect to the accelerator on the second node from the first node: **$ ssh cn205-mic0** At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" @@ -869,16 +882,18 @@ A possible output of the MPI "hello-world" example executed on two hosts and two Hello world from process 7 of 8 on host cn205-mic0 ``` ->Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. +!!! Note "Note" + Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. **Using the PBS automatically generated node-files** PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: ->**Host only node-file:** - - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: - - /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file: - - /lscratch/${PBS_JOBID}/nodefile-mix +!!! Note "Note" + **Host only node-file:** + - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: + - /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file: + - /lscratch/${PBS_JOBID}/nodefile-mix Please note each host or accelerator is listed only per files. User has to specify how many jobs should be executed per node using "-n" parameter of the mpirun command. diff --git a/docs.it4i/anselm-cluster-documentation/storage/cesnet-data-storage.md b/docs.it4i/anselm-cluster-documentation/storage/cesnet-data-storage.md index e7e2c0293..c4764a918 100644 --- a/docs.it4i/anselm-cluster-documentation/storage/cesnet-data-storage.md +++ b/docs.it4i/anselm-cluster-documentation/storage/cesnet-data-storage.md @@ -5,7 +5,8 @@ Introduction ------------ Do not use shared filesystems at IT4Innovations as a backup for large amount of data or long-term archiving purposes. ->The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). +!!! Note "Note" + The IT4Innovations does not provide storage capacity for data archiving. Academic staff and students of research institutions in the Czech Republic can use [CESNET Storage service](https://du.cesnet.cz/). The CESNET Storage service can be used for research purposes, mainly by academic staff and students of research institutions in the Czech Republic. @@ -24,13 +25,15 @@ CESNET storage access ### Understanding Cesnet storage ->It is very important to understand the Cesnet storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. +!!! Note "Note" + It is very important to understand the Cesnet storage before uploading data. Please read <https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first. Once registered for CESNET Storage, you may [access the storage](https://du.cesnet.cz/en/navody/faq/start) in number of ways. We recommend the SSHFS and RSYNC methods. ### SSHFS Access ->SSHFS: The storage will be mounted like a local hard drive +!!! Note "Note" + SSHFS: The storage will be mounted like a local hard drive The SSHFS provides a very convenient way to access the CESNET Storage. The storage will be mounted onto a local directory, exposing the vast CESNET Storage as if it was a local removable harddrive. Files can be than copied in and out in a usual fashion. @@ -74,7 +77,8 @@ Once done, please remember to unmount the storage ### Rsync access ->Rsync provides delta transfer for best performance, can resume interrupted transfers +!!! Note "Note" + Rsync provides delta transfer for best performance, can resume interrupted transfers Rsync is a fast and extraordinarily versatile file copying tool. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use. diff --git a/docs.it4i/anselm-cluster-documentation/storage/storage.md b/docs.it4i/anselm-cluster-documentation/storage/storage.md index 6fa8f0330..eb76ccc24 100644 --- a/docs.it4i/anselm-cluster-documentation/storage/storage.md +++ b/docs.it4i/anselm-cluster-documentation/storage/storage.md @@ -29,7 +29,8 @@ There is default stripe configuration for Anselm Lustre filesystems. However, us 2. stripe_count the number of OSTs to stripe across; default is 1 for Anselm Lustre filesystems one can specify -1 to use all OSTs in the filesystem. 3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended. ->Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. +!!! Note "Note" + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. Use the lfs getstripe for getting the stripe parameters. Use the lfs setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. @@ -62,13 +63,15 @@ $ man lfs ### Hints on Lustre Stripping ->Increase the stripe_count for parallel I/O to the same file. +!!! Note "Note" + Increase the stripe_count for parallel I/O to the same file. When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the stripe_count is set to a larger value. The stripe count sets the number of OSTs the file will be written to. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the ls -l command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file. Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes. ->Using a large stripe size can improve performance when accessing very large files +!!! Note "Note" + Using a large stripe size can improve performance when accessing very large files Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file. @@ -102,7 +105,8 @@ The architecture of Lustre on Anselm is composed of two metadata servers (MDS) The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 320TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 250GB per user. If 250GB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. ->The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. +!!! Note "Note" + The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects. The HOME filesystem should not be used to archive data of past Projects or other unrelated data. @@ -113,7 +117,8 @@ The filesystem is backed up, such that it can be restored in case of catasthrop The HOME filesystem is realized as Lustre parallel filesystem and is available on all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for the HOME filesystem. ->Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. +!!! Note "Note" + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. |HOME filesystem|| |---|---| @@ -129,15 +134,17 @@ Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for t The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. ->The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. +!!! Note "Note" + The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory. ->Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. + >Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files. ->Files on the SCRATCH filesystem that are **not accessed for more than 90 days** will be automatically **deleted**. + Files on the SCRATCH filesystem that are **not accessed for more than 90 days** will be automatically **deleted**. The SCRATCH filesystem is realized as Lustre parallel filesystem and is available from all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 10 OSTs dedicated for the SCRATCH filesystem. ->Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. +!!! Note "Note" + Setting stripe size and stripe count correctly for your needs may significantly impact the I/O performance you experience. |SCRATCH filesystem|| |---|---| @@ -257,7 +264,8 @@ Local Filesystems ### Local Scratch ->Every computational node is equipped with 330GB local scratch disk. +!!! Note "Note" + Every computational node is equipped with 330GB local scratch disk. Use local scratch in case you need to access large amount of small files during your calculation. @@ -265,7 +273,8 @@ The local scratch disk is mounted as /lscratch and is accessible to user at /lsc The local scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs that access large number of small files within the calculation must use the local scratch filesystem as their working directory. This is required for performance reasons, as frequent access to number of small files may overload the metadata servers (MDS) of the Lustre filesystem. ->The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. +!!! Note "Note" + The local scratch directory /lscratch/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. |local SCRATCH filesystem|| |---|---| @@ -279,14 +288,15 @@ The local scratch filesystem is intended for temporary scratch data generated Every computational node is equipped with filesystem realized in memory, so called RAM disk. ->Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of -operational memory. +!!! Note "Note" + Use RAM disk in case you need really fast access to your data of limited size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. Size of RAM disk filesystem is limited. Be very careful, use of RAM disk filesystem is at the expense of operational memory. It is not recommended to allocate large amount of memory and use large amount of data in RAM disk filesystem at the same time. ->The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. +!!! Note "Note" + The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. |RAM disk|| |---|---| -- GitLab