diff --git a/docs.it4i/anselm-cluster-documentation/software/ansys.md b/docs.it4i/anselm-cluster-documentation/software/ansys.md deleted file mode 100644 index e5130c0709a9ea0b0dccb4808bcf53472da91af8..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/ansys.md +++ /dev/null @@ -1,33 +0,0 @@ -Overview of ANSYS Products -========================== - -[SVS FEM](http://www.svsfem.cz/)** as **[ANSYS -Channel partner](http://www.ansys.com/)** for Czech -Republic provided all ANSYS licenses for ANSELM cluster and supports of -all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, -Maxwell, LS-DYNA...) to IT staff and ANSYS users. If you are challenging -to problem of ANSYS functionality contact -please [hotline@svsfem.cz](mailto:hotline@svsfem.cz?subject=Ostrava%20-%20ANSELM) - -Anselm provides as commercial as academic variants. Academic variants -are distinguished by "**Academic...**" word in the name of  license or -by two letter preposition "**aa_**" in the license feature name. Change -of license is realized on command line respectively directly in user's -pbs file (see individual products). [ - More about -licensing -here](ansys/licensing.html) - -To load the latest version of any ANSYS product (Mechanical, Fluent, -CFX, MAPDL,...) load the module: - - $ module load ansys - -ANSYS supports interactive regime, but due to assumed solution of -extremely difficult tasks it is not recommended. - -If user needs to work in interactive regime we recommend to configure -the RSM service on the client machine which allows to forward the -solution to the Anselm directly from the client's Workbench project -(see ANSYS RSM service). - diff --git a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-cfx.md b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-cfx.md index 5d50cda135ee04d574eb928dbb7b2aedf1a87013..681bcea98c5bfdd5ab1ebe382870c51e24bc2ec8 100644 --- a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-cfx.md +++ b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-cfx.md @@ -1,24 +1,12 @@ -ANSYS CFX +ANSYS CFX ========= -[ANSYS -CFX](http://www.ansys.com/Products/Simulation+Technology/Fluid+Dynamics/Fluid+Dynamics+Products/ANSYS+CFX) -software is a high-performance, general purpose fluid dynamics program -that has been applied to solve wide-ranging fluid flow problems for over -20 years. At the heart of ANSYS CFX is its advanced solver technology, -the key to achieving reliable and accurate solutions quickly and -robustly. The modern, highly parallelized solver is the foundation for -an abundant choice of physical models to capture virtually any type of -phenomena related to fluid flow. The solver and its many physical models -are wrapped in a modern, intuitive, and flexible GUI and user -environment, with extensive capabilities for customization and -automation using session files, scripting and a powerful expression -language. +[ANSYS CFX](http://www.ansys.com/Products/Simulation+Technology/Fluid+Dynamics/Fluid+Dynamics+Products/ANSYS+CFX) +software is a high-performance, general purpose fluid dynamics program that has been applied to solve wide-ranging fluid flow problems for over 20 years. At the heart of ANSYS CFX is its advanced solver technology, the key to achieving reliable and accurate solutions quickly and robustly. The modern, highly parallelized solver is the foundation for an abundant choice of physical models to capture virtually any type of phenomena related to fluid flow. The solver and its many physical models are wrapped in a modern, intuitive, and flexible GUI and user environment, with extensive capabilities for customization and automation using session files, scripting and a powerful expression language. -To run ANSYS CFX in batch mode you can utilize/modify the default -cfx.pbs script and execute it via the qsub command. +To run ANSYS CFX in batch mode you can utilize/modify the default cfx.pbs script and execute it via the qsub command. -` +```bash #!/bin/bash #PBS -l nodes=2:ppn=16 #PBS -q qprod @@ -59,29 +47,11 @@ echo Machines: $hl #-dev input.def includes the input of CFX analysis in DEF format #-P the name of prefered license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial)) /ansys_inc/v145/CFX/bin/cfx5solve -def input.def -size 4 -size-ni 4x -part-large -start-method "Platform MPI Distributed Parallel" -par-dist $hl -P aa_r -` +``` -Header of the pbs file (above) is common and description can be find -on [this -site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). -SVS FEM recommends to utilize sources by keywords: nodes, ppn. These -keywords allows to address directly the number of nodes (computers) and -cores (ppn) which will be utilized in the job. Also the rest of code -assumes such structure of allocated resources. +Header of the pbs file (above) is common and description can be find on [this site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). SVS FEM recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of allocated resources. -Working directory has to be created before sending pbs job into the -queue. Input file should be in working directory or full path to input -file has to be specified. >Input file has to be defined by common -CFX def file which is attached to the cfx solver via parameter --def - -License** should be selected by parameter -P (Big letter **P**). -Licensed products are the following: aa_r -(ANSYS **Academic Research), ane3fl (ANSYS -Multiphysics)-**Commercial. -[ More - about licensing -here](licensing.html) - - +Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. >Input file has to be defined by common CFX def file which is attached to the cfx solver via parameter -def +**License** should be selected by parameter -P (Big letter **P**). Licensed products are the following: aa_r (ANSYS **Academic** Research), ane3fl (ANSYS Multiphysics)-**Commercial**. +[More about licensing here](licensing.html) \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-fluent.md b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-fluent.md index ab675d3381dc6377e63df7b7a5db21ff7db72aff..63c40c1ecf3889715162c0385b15c5593e450232 100644 --- a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-fluent.md +++ b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-fluent.md @@ -1,24 +1,14 @@ -ANSYS Fluent +ANSYS Fluent ============ -[ANSYS -Fluent](http://www.ansys.com/Products/Simulation+Technology/Fluid+Dynamics/Fluid+Dynamics+Products/ANSYS+Fluent) -software contains the broad physical modeling capabilities needed to -model flow, turbulence, heat transfer, and reactions for industrial -applications ranging from air flow over an aircraft wing to combustion -in a furnace, from bubble columns to oil platforms, from blood flow to -semiconductor manufacturing, and from clean room design to wastewater -treatment plants. Special models that give the software the ability to -model in-cylinder combustion, aeroacoustics, turbomachinery, and -multiphase systems have served to broaden its reach. +[ANSYS Fluent](http://www.ansys.com/Products/Simulation+Technology/Fluid+Dynamics/Fluid+Dynamics+Products/ANSYS+Fluent) +software contains the broad physical modeling capabilities needed to model flow, turbulence, heat transfer, and reactions for industrial applications ranging from air flow over an aircraft wing to combustion in a furnace, from bubble columns to oil platforms, from blood flow to semiconductor manufacturing, and from clean room design to wastewater treatment plants. Special models that give the software the ability to model in-cylinder combustion, aeroacoustics, turbomachinery, and multiphase systems have served to broaden its reach. 1. Common way to run Fluent over pbs file ------------------------------------------------------ +To run ANSYS Fluent in batch mode you can utilize/modify the default fluent.pbs script and execute it via the qsub command. -To run ANSYS Fluent in batch mode you can utilize/modify the -default fluent.pbs script and execute it via the qsub command. - -` +```bash #!/bin/bash #PBS -S /bin/bash #PBS -l nodes=2:ppn=16 @@ -47,27 +37,15 @@ module load ansys NCORES=`wc -l $PBS_NODEFILE |awk '{print $1}'` /ansys_inc/v145/fluent/bin/fluent 3d -t$NCORES -cnf=$PBS_NODEFILE -g -i fluent.jou -` - -Header of the pbs file (above) is common and description can be find -on [this -site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). -[SVS FEM](http://www.svsfem.cz) recommends to utilize -sources by keywords: nodes, ppn. These keywords allows to address -directly the number of nodes (computers) and cores (ppn) which will be -utilized in the job. Also the rest of code assumes such structure of -allocated resources. - -Working directory has to be created before sending pbs job into the -queue. Input file should be in working directory or full path to input -file has to be specified. Input file has to be defined by common Fluent -journal file which is attached to the Fluent solver via parameter -i -fluent.jou - -Journal file with definition of the input geometry and boundary -conditions and defined process of solution has e.g. the following -structure: +``` + +Header of the pbs file (above) is common and description can be find on [this site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). [SVS FEM](http://www.svsfem.cz) recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of allocated resources. + +Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common Fluent journal file which is attached to the Fluent solver via parameter -i fluent.jou + +Journal file with definition of the input geometry and boundary conditions and defined process of solution has e.g. the following structure: +```bash /file/read-case aircraft_2m.cas.gz /solve/init init @@ -75,77 +53,47 @@ structure: 10 /file/write-case-dat aircraft_2m-solution /exit yes +``` -The appropriate dimension of the problem has to be set by -parameter (2d/3d). +The appropriate dimension of the problem has to be set by parameter (2d/3d). 2. Fast way to run Fluent from command line -------------------------------------------------------- -` +```bash fluent solver_version [FLUENT_options] -i journal_file -pbs -` - -This syntax will start the ANSYS FLUENT job under PBS Professional using -the qsub command in a batch manner. When -resources are available, PBS Professional will start the job and return -a job ID, usually in the form of -*job_ID.hostname*. This job ID can then be used -to query, control, or stop the job using standard PBS Professional -commands, such as qstat or -qdel. The job will be run out of the current -working directory, and all output will be written to the file -fluent.o> -*job_ID*.     +``` + +This syntax will start the ANSYS FLUENT job under PBS Professional using the qsub command in a batch manner. When resources are available, PBS Professional will start the job and return a job ID, usually in the form of *job_ID.hostname*. This job ID can then be used to query, control, or stop the job using standard PBS Professional commands, such as qstat or qdel. The job will be run out of the current working directory, and all output will be written to the file fluent.o *job_ID*. 3. Running Fluent via user's config file ---------------------------------------- +The sample script uses a configuration file called pbs_fluent.conf  if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content of pbs_fluent.conf can be: -The sample script uses a configuration file called -pbs_fluent.conf  if no command line arguments -are present. This configuration file should be present in the directory -from which the jobs are submitted (which is also the directory in which -the jobs are executed). The following is an example of what the content -of pbs_fluent.conf can be: - -` +```bash input="example_small.flin" case="Small-1.65m.cas" fluent_args="3d -pmyrinet" outfile="fluent_test.out" mpp="true" -` +``` The following is an explanation of the parameters: - input is the name of the input -file. +input is the name of the input file. - case is the name of the -.cas file that the input file will utilize. +case is the name of the .cas file that the input file will utilize. - fluent_args are extra ANSYS FLUENT -arguments. As shown in the previous example, you can specify the -interconnect by using the -p interconnect -command. The available interconnects include -ethernet (the default), -myrinet, class="monospace"> -infiniband, vendor, -altix>, and -crayx. The MPI is selected automatically, based -on the specified interconnect. +fluent_args are extra ANSYS FLUENT arguments. As shown in the previous example, you can specify the interconnect by using the -p interconnect command. The available interconnects include ethernet (the default), myrinet, infiniband, vendor, +altix, and crayx. The MPI is selected automatically, based on the specified interconnect. - outfile is the name of the file to which -the standard output will be sent. +outfile is the name of the file to which the standard output will be sent. - mpp="true" will tell the job script to -execute the job across multiple processors.         + mpp="true" will tell the job script to execute the job across multiple processors. -To run ANSYS Fluent in batch mode with user's config file you can -utilize/modify the following script and execute it via the qsub -command. +To run ANSYS Fluent in batch mode with user's config file you can utilize/modify the following script and execute it via the qsub command. -` +```bash #!/bin/sh #PBS -l nodes=2:ppn=4 #PBS -1 qprod @@ -153,7 +101,7 @@ command. #PBS -A XX-YY-ZZ cd $PBS_O_WORKDIR - + #We assume that if they didn’t specify arguments then they should use the #config file if [ "xx${input}${case}${mpp}${fluent_args}zz" = "xxzz" ]; then if [ -f pbs_fluent.conf ]; then @@ -163,7 +111,7 @@ command. printf "and no configuration file found. Exiting n" fi fi - + #Augment the ANSYS FLUENT command line arguments case "$mpp" in true) @@ -189,25 +137,20 @@ command. Case: $case Output: $outfile Fluent arguments: $fluent_args" - + #run the solver /ansys_inc/v145/fluent/bin/fluent $fluent_args > $outfile -` +``` -It runs the jobs out of the directory from which they are -submitted (PBS_O_WORKDIR). +It runs the jobs out of the directory from which they are submitted (PBS_O_WORKDIR). 4. Running Fluent in parralel ----------------------------- +Fluent could be run in parallel only under Academic Research license. To do so this ANSYS Academic Research license must be placed before ANSYS CFD license in user preferences. To make this change anslic_admin utility should be run -Fluent could be run in parallel only under Academic Research license. To -do so this ANSYS Academic Research license must be placed before ANSYS -CFD license in user preferences. To make this change anslic_admin -utility should be run - -` +```bash /ansys_inc/shared_les/licensing/lic_admin/anslic_admin -` +``` ANSLIC_ADMIN Utility will be run @@ -217,12 +160,6 @@ ANSLIC_ADMIN Utility will be run  - - -ANSYS Academic Research license should be moved up to the top of the -list. - - - - +ANSYS Academic Research license should be moved up to the top of the list. + \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md index 397aa94adba08eb00f01dec704c3031739e8d110..4949412e14c729b8ff4b9d3b030f75a49f06cfba 100644 --- a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md +++ b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md @@ -1,28 +1,11 @@ -ANSYS LS-DYNA +ANSYS LS-DYNA ============= -[ANSYS -LS-DYNA](http://www.ansys.com/Products/Simulation+Technology/Structural+Mechanics/Explicit+Dynamics/ANSYS+LS-DYNA) -software provides convenient and easy-to-use access to the -technology-rich, time-tested explicit solver without the need to contend -with the complex input requirements of this sophisticated program. -Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in -numerous industries to resolve highly intricate design -issues. >ANSYS Mechanical users have been able take advantage of -complex explicit solutions for a long time utilizing the traditional -ANSYS Parametric Design Language (APDL) environment. >These -explicit capabilities are available to ANSYS Workbench users as well. -The Workbench platform is a powerful, comprehensive, easy-to-use -environment for engineering simulation. CAD import from all sources, -geometry cleanup, automatic meshing, solution, parametric optimization, -result visualization and comprehensive report generation are all -available within a single fully interactive modern graphical user -environment. - -To run ANSYS LS-DYNA in batch mode you can utilize/modify the -default ansysdyna.pbs script and execute it via the qsub command. - -` +**[ANSYSLS-DYNA](http://www.ansys.com/Products/Simulation+Technology/Structural+Mechanics/Explicit+Dynamics/ANSYS+LS-DYNA)** software provides convenient and easy-to-use access to the technology-rich, time-tested explicit solver without the need to contend with the complex input requirements of this sophisticated program. Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries to resolve highly intricate design issues. ANSYS Mechanical users have been able take advantage of complex explicit solutions for a long time utilizing the traditional ANSYS Parametric Design Language (APDL) environment. These explicit capabilities are available to ANSYS Workbench users as well. The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation. CAD import from all sources, geometry cleanup, automatic meshing, solution, parametric optimization, result visualization and comprehensive report generation are all available within a single fully interactive modern graphical user environment. + +To run ANSYS LS-DYNA in batch mode you can utilize/modify the default ansysdyna.pbs script and execute it via the qsub command. + +```bash #!/bin/bash #PBS -l nodes=2:ppn=16 #PBS -q qprod @@ -66,21 +49,11 @@ done echo Machines: $hl /ansys_inc/v145/ansys/bin/ansys145 -dis -lsdynampp i=input.k -machines $hl -` - -Header of the pbs file (above) is common and description can be -find on [this -site](../../resource-allocation-and-job-execution/job-submission-and-execution.html)>. -[SVS FEM](http://www.svsfem.cz) recommends to utilize -sources by keywords: nodes, ppn. These keywords allows to address -directly the number of nodes (computers) and cores (ppn) which will be -utilized in the job. Also the rest of code assumes such structure of -allocated resources. - -Working directory has to be created before sending pbs job into the -queue. Input file should be in working directory or full path to input -file has to be specified. Input file has to be defined by common LS-DYNA -.**k** file which is attached to the ansys solver via parameter i= +``` + +Header of the pbs file (above) is common and description can be find on [this site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). [SVS FEM](http://www.svsfem.cz) recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of allocated resources. + +Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common LS-DYNA .**k** file which is attached to the ansys solver via parameter i=  diff --git a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md index ac6357c2f9a6df62253546816739944985f0270e..a84a1bf995c645688c64159c8562b2595ee6361e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md +++ b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md @@ -1,19 +1,12 @@ -ANSYS MAPDL +ANSYS MAPDL =========== -**[ANSYS -Multiphysics](http://www.ansys.com/Products/Simulation+Technology/Structural+Mechanics/ANSYS+Multiphysics)** -software offers a comprehensive product solution for both multiphysics -and single-physics analysis. The product includes structural, thermal, -fluid and both high- and low-frequency electromagnetic analysis. The -product also contains solutions for both direct and sequentially coupled -physics problems including direct coupled-field elements and the ANSYS -multi-field solver. +**[ANSYS Multiphysics](http://www.ansys.com/Products/Simulation+Technology/Structural+Mechanics/ANSYS+Multiphysics)** +software offers a comprehensive product solution for both multiphysics and single-physics analysis. The product includes structural, thermal, fluid and both high- and low-frequency electromagnetic analysis. The product also contains solutions for both direct and sequentially coupled physics problems including direct coupled-field elements and the ANSYS multi-field solver. -To run ANSYS MAPDL in batch mode you can utilize/modify the -default mapdl.pbs script and execute it via the qsub command. +To run ANSYS MAPDL in batch mode you can utilize/modify the default mapdl.pbs script and execute it via the qsub command. -` +```bash #!/bin/bash #PBS -l nodes=2:ppn=16 #PBS -q qprod @@ -52,30 +45,15 @@ done echo Machines: $hl #-i input.dat includes the input of analysis in APDL format -#-o file.out is output file from ansys where all text outputs will be redirected +#-o file.out is output file from ansys where all text outputs will be redirected #-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN) /ansys_inc/v145/ansys/bin/ansys145 -b -dis -p aa_r -i input.dat -o file.out -machines $hl -dir $WORK_DIR -` +``` -Header of the pbs file (above) is common and description can be find on -[this -site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). -[SVS FEM](http://www.svsfem.cz) recommends to utilize -sources by keywords: nodes, ppn. These keywords allows to address -directly the number of nodes (computers) and cores (ppn) which will be -utilized in the job. Also the rest of code assumes such structure of -allocated resources. +Header of the pbs file (above) is common and description can be find on [this site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). [SVS FEM](http://www.svsfem.cz) recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of allocated resources. -Working directory has to be created before sending pbs job into the -queue. Input file should be in working directory or full path to input -file has to be specified. Input file has to be defined by common APDL -file which is attached to the ansys solver via parameter -i +Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common APDL file which is attached to the ansys solver via parameter -i -License** should be selected by parameter -p. Licensed products are -the following: aa_r (ANSYS **Academic Research), ane3fl (ANSYS -Multiphysics)-**Commercial**, aa_r_dy (ANSYS **Academic -AUTODYN)> -[ More - about licensing -here](licensing.html) +**License** should be selected by parameter -p. Licensed products are the following: aa_r (ANSYS **Academic** Research), ane3fl (ANSYS Multiphysics)-**Commercial**, aa_r_dy (ANSYS **Academic** AUTODYN) +[More about licensing here](licensing.html) diff --git a/docs.it4i/anselm-cluster-documentation/software/ansys/ansys.md b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys.md new file mode 100644 index 0000000000000000000000000000000000000000..53e175f0fe1ade524602d72648f550e5c35a6308 --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/ansys/ansys.md @@ -0,0 +1,17 @@ +Overview of ANSYS Products +========================== + +**[SVS FEM](http://www.svsfem.cz/)** as **[ANSYS Channel partner](http://www.ansys.com/)** for Czech Republic provided all ANSYS licenses for ANSELM cluster and supports of all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, Maxwell, LS-DYNA...) to IT staff and ANSYS users. If you are challenging to problem of ANSYS functionality contact please [hotline@svsfem.cz](mailto:hotline@svsfem.cz?subject=Ostrava%20-%20ANSELM) + +Anselm provides as commercial as academic variants. Academic variants are distinguished by "**Academic...**" word in the name of  license or by two letter preposition "**aa_**" in the license feature name. Change of license is realized on command line respectively directly in user's pbs file (see individual products). [ More about licensing here](ansys/licensing.html) + +To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module: + +```bash + $ module load ansys +``` + +ANSYS supports interactive regime, but due to assumed solution of extremely difficult tasks it is not recommended. + +If user needs to work in interactive regime we recommend to configure the RSM service on the client machine which allows to forward the solution to the Anselm directly from the client's Workbench project (see ANSYS RSM service). + diff --git a/docs.it4i/anselm-cluster-documentation/software/ansys/ls-dyna.md b/docs.it4i/anselm-cluster-documentation/software/ansys/ls-dyna.md index c2a86aa8574d0e7491af73766adce0e82c56bcd5..e2eb2c12ac41b5df6997743034f410608c8ea34e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/ansys/ls-dyna.md +++ b/docs.it4i/anselm-cluster-documentation/software/ansys/ls-dyna.md @@ -1,31 +1,13 @@ -LS-DYNA +LS-DYNA ======= -[LS-DYNA](http://www.lstc.com/) is a multi-purpose, -explicit and implicit finite element program used to analyze the -nonlinear dynamic response of structures. Its fully automated contact -analysis capability, a wide range of constitutive models to simulate a -whole range of engineering materials (steels, composites, foams, -concrete, etc.), error-checking features and the high scalability have -enabled users worldwide to solve successfully many complex -problems. >Additionally LS-DYNA is extensively used to simulate -impacts on structures from drop tests, underwater shock, explosions or -high-velocity impacts. Explosive forming, process engineering, accident -reconstruction, vehicle dynamics, thermal brake disc analysis or nuclear -safety are further areas in the broad range of possible applications. In -leading-edge research LS-DYNA is used to investigate the behaviour of -materials like composites, ceramics, concrete, or wood. Moreover, it is -used in biomechanics, human modelling, molecular structures, casting, -forging, or virtual testing. +[LS-DYNA](http://www.lstc.com/) is a multi-purpose, explicit and implicit finite element program used to analyze the nonlinear dynamic response of structures. Its fully automated contact analysis capability, a wide range of constitutive models to simulate a whole range of engineering materials (steels, composites, foams, concrete, etc.), error-checking features and the high scalability have enabled users worldwide to solve successfully many complex problems. Additionally LS-DYNA is extensively used to simulate impacts on structures from drop tests, underwater shock, explosions or high-velocity impacts. Explosive forming, process engineering, accident reconstruction, vehicle dynamics, thermal brake disc analysis or nuclear safety are further areas in the broad range of possible applications. In leading-edge research LS-DYNA is used to investigate the behaviour of materials like composites, ceramics, concrete, or wood. Moreover, it is used in biomechanics, human modelling, molecular structures, casting, forging, or virtual testing. -Anselm provides **1 commercial license of LS-DYNA without HPC** -support now. +Anselm provides **1 commercial license of LS-DYNA without HPC** support now. -To run LS-DYNA in batch mode you can utilize/modify the -default lsdyna.pbs script and execute it via the qsub -command. +To run LS-DYNA in batch mode you can utilize/modify the default lsdyna.pbs script and execute it via the qsub command. -` +```bash #!/bin/bash #PBS -l nodes=1:ppn=16 #PBS -q qprod @@ -47,19 +29,8 @@ echo Directory is `pwd` module load lsdyna /apps/engineering/lsdyna/lsdyna700s i=input.k -` +``` -Header of the pbs file (above) is common and description can be find -on [this -site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). -[SVS FEM](http://www.svsfem.cz) recommends to utilize -sources by keywords: nodes, ppn. These keywords allows to address -directly the number of nodes (computers) and cores (ppn) which will be -utilized in the job. Also the rest of code assumes such structure of -allocated resources. - -Working directory has to be created before sending pbs job into the -queue. Input file should be in working directory or full path to input -file has to be specified. Input file has to be defined by common LS-DYNA -.k** file which is attached to the LS-DYNA solver via parameter i= +Header of the pbs file (above) is common and description can be find on [this site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). [SVS FEM](http://www.svsfem.cz) recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of allocated resources. +Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common LS-DYNA **.k** file which is attached to the LS-DYNA solver via parameter i= \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md index ff0be71faecd1c1b6374e9b2d92fbf44bece4eda..55f052e49092353a6e81806f679d25b82a0726aa 100644 --- a/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md +++ b/docs.it4i/anselm-cluster-documentation/software/chemistry/molpro.md @@ -1,73 +1,45 @@ -Molpro +Molpro ====== -Molpro is a complete system of ab initio programs for molecular -electronic structure calculations. +Molpro is a complete system of ab initio programs for molecular electronic structure calculations. About Molpro ------------ - -Molpro is a software package used for accurate ab-initio quantum -chemistry calculations. More information can be found at the [official -webpage](http://www.molpro.net/). +Molpro is a software package used for accurate ab-initio quantum chemistry calculations. More information can be found at the [official webpage](http://www.molpro.net/). License ------- +Molpro software package is available only to users that have a valid license. Please contact support to enable access to Molpro if you have a valid license appropriate for running on our cluster (eg. academic research group licence, parallel execution). -Molpro software package is available only to users that have a valid -license. Please contact support to enable access to Molpro if you have a -valid license appropriate for running on our cluster (eg. >academic -research group licence, parallel execution). - -To run Molpro, you need to have a valid license token present in -" $HOME/.molpro/token". You can -download the token from [Molpro -website](https://www.molpro.net/licensee/?portal=licensee). +To run Molpro, you need to have a valid license token present in " $HOME/.molpro/token". You can download the token from [Molpro website](https://www.molpro.net/licensee/?portal=licensee). Installed version ----------------- +Currently on Anselm is installed version 2010.1, patch level 45, parallel version compiled with Intel compilers and Intel MPI. -Currently on Anselm is installed version 2010.1, patch level 45, -parallel version compiled with Intel compilers and Intel MPI. - -Compilation parameters are default : - - |Parameter|Value| - ------------------------------------------- |---|---|------------------- - |max number of atoms|200| - |max number of valence orbitals|300| - |max number of basis functions|4095| - |max number of states per symmmetry|20| - |max number of state symmetries|16| - |max number of records|200| - |max number of primitives|maxbfn x [2]| +Compilation parameters are default: - +|Parameter|Value| +|---|---| +|max number of atoms|200| +|max number of valence orbitals|300| +|max number of basis functions|4095| +|max number of states per symmmetry|20| +|max number of state symmetries|16| +|max number of records|200| +|max number of primitives|maxbfn x [2]| Running -------- +------ +Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details. + +>The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS. -Molpro is compiled for parallel execution using MPI and OpenMP. By -default, Molpro reads the number of allocated nodes from PBS and -launches a data server on one node. On the remaining allocated nodes, -compute processes are launched, one process per node, each with 16 -threads. You can modify this behavior by using -n, -t and helper-server -options. Please refer to the [Molpro -documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) -for more details. - -The OpenMP parallelization in Molpro is limited and has been observed to -produce limited scaling. We therefore recommend to use MPI -parallelization only. This can be achieved by passing option -mpiprocs=16:ompthreads=1 to PBS. - -You are advised to use the -d option to point to a directory in [SCRATCH -filesystem](../../storage.html). Molpro can produce a -large amount of temporary data during its run, and it is important that -these are placed in the fast scratch filesystem. +You are advised to use the -d option to point to a directory in [SCRATCH filesystem](../../storage.html). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch filesystem. ### Example jobscript +```bash #PBS -A IT4I-0-0 #PBS -q qprod #PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1 @@ -87,5 +59,5 @@ these are placed in the fast scratch filesystem. molpro -d /scratch/$USER/$PBS_JOBID caffeine_opt_diis.com # delete scratch directory - rm -rf /scratch/$USER/$PBS_JOBID - + rm -rf /scratch/$USER/$PBS_JOBID +``` diff --git a/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md b/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md index d52644e00840ab4ec5be1f3d856e2e2b82a83d45..9869c37e9ba98bedcde5a2016c4f3ce654d75e39 100644 --- a/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md +++ b/docs.it4i/anselm-cluster-documentation/software/chemistry/nwchem.md @@ -1,66 +1,47 @@ -NWChem +NWChem ====== -High-Performance Computational Chemistry +##High-Performance Computational Chemistry Introduction ------------------------- - -NWChem aims to provide its users with computational chemistry -tools that are scalable both in their ability to treat large scientific -computational chemistry problems efficiently, and in their use of -available parallel computing resources from high-performance parallel -supercomputers to conventional workstation clusters. +NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters. [Homepage](http://www.nwchem-sw.org/index.php/Main_Page) Installed versions ------------------ -The following versions are currently installed : - -- 6.1.1, not recommended, problems have been observed with this - version - -- 6.3-rev2-patch1, current release with QMD patch applied. Compiled - with Intel compilers, MKL and Intel MPI +The following versions are currently installed: -- 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI - and NWChem provided BLAS instead of MKL. This version is expected to - be slower +- 6.1.1, not recommended, problems have been observed with this version +- 6.3-rev2-patch1, current release with QMD patch applied. Compiled with Intel compilers, MKL and Intel MPI +- 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI and NWChem provided BLAS instead of MKL. This version is expected to be slower +- 6.3-rev2-patch1-venus, this version contains only libraries for VENUS interface linking. Does not provide standalone NWChem executable -- 6.3-rev2-patch1-venus, this version contains only libraries for - VENUS interface linking. Does not provide standalone NWChem - executable - -For a current list of installed versions, execute : +For a current list of installed versions, execute: +```bash module avail nwchem +``` Running ------- +NWChem is compiled for parallel MPI execution. Normal procedure for MPI jobs applies. Sample jobscript: -NWChem is compiled for parallel MPI execution. Normal procedure for MPI -jobs applies. Sample jobscript : - +```bash #PBS -A IT4I-0-0 #PBS -q qprod #PBS -l select=1:ncpus=16 module add nwchem/6.3-rev2-patch1 mpirun -np 16 nwchem h2o.nw +``` Options -------------------- +Please refer to [the documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and in the input file set the following directives : -Please refer to [the -documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and -in the input file set the following directives : - -- >MEMORY : controls the amount of memory NWChem will use -- >SCRATCH_DIR : set this to a directory in [SCRATCH - filesystem](../../storage.html#scratch) (or run the - calculation completely in a scratch directory). For certain - calculations, it might be advisable to reduce I/O by forcing - "direct" mode, eg. "scf direct" +- MEMORY : controls the amount of memory NWChem will use +- SCRATCH_DIR : set this to a directory in [SCRATCH filesystem](../../storage.html#scratch) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, eg. "scf direct" diff --git a/docs.it4i/anselm-cluster-documentation/software/compilers.md b/docs.it4i/anselm-cluster-documentation/software/compilers.md index 7450d0e1457ceb3e6d5230031834c6284a68c1e5..2aaf9714afcbf74f65fa3260d34bcb4e15389cec 100644 --- a/docs.it4i/anselm-cluster-documentation/software/compilers.md +++ b/docs.it4i/anselm-cluster-documentation/software/compilers.md @@ -1,12 +1,9 @@ -Compilers +Compilers ========= -Available compilers, including GNU, INTEL and UPC compilers +##Available compilers, including GNU, INTEL and UPC compilers - - -Currently there are several compilers for different programming -languages available on the Anselm cluster: +Currently there are several compilers for different programming languages available on the Anselm cluster: - C/C++ - Fortran 77/90/95 @@ -14,66 +11,58 @@ languages available on the Anselm cluster: - Java - nVidia CUDA - - -The C/C++ and Fortran compilers are divided into two main groups GNU and -Intel. +The C/C++ and Fortran compilers are divided into two main groups GNU and Intel. Intel Compilers --------------- - -For information about the usage of Intel Compilers and other Intel -products, please read the [Intel Parallel -studio](intel-suite.html) page. +For information about the usage of Intel Compilers and other Intel products, please read the [Intel Parallel studio](intel-suite.html) page. GNU C/C++ and Fortran Compilers ------------------------------- +For compatibility reasons there are still available the original (old 4.4.6-4) versions of GNU compilers as part of the OS. These are accessible in the search path by default. -For compatibility reasons there are still available the original (old -4.4.6-4) versions of GNU compilers as part of the OS. These are -accessible in the search path by default. - -It is strongly recommended to use the up to date version (4.8.1) which -comes with the module gcc: +It is strongly recommended to use the up to date version (4.8.1) which comes with the module gcc: +```bash $ module load gcc $ gcc -v $ g++ -v $ gfortran -v +``` -With the module loaded two environment variables are predefined. One for -maximum optimizations on the Anselm cluster architecture, and the other -for debugging purposes: +With the module loaded two environment variables are predefined. One for maximum optimizations on the Anselm cluster architecture, and the other for debugging purposes: +```bash $ echo $OPTFLAGS -O3 -march=corei7-avx $ echo $DEBUGFLAGS -O0 -g +``` -For more informations about the possibilities of the compilers, please -see the man pages. +For more informations about the possibilities of the compilers, please see the man pages. Unified Parallel C ------------------ - -UPC is supported by two compiler/runtime implementations: + UPC is supported by two compiler/runtime implementations: - GNU - SMP/multi-threading support only - Berkley - multi-node support as well as SMP/multi-threading support ### GNU UPC Compiler -To use the GNU UPC compiler and run the compiled binaries use the module -gupc +To use the GNU UPC compiler and run the compiled binaries use the module gupc +```bash $ module add gupc $ gupc -v $ g++ -v +``` Simple program to test the compiler - $ cat count.upc +```bash + $ cat count.upc /* hello.upc - a simple UPC example */ #include <upc.h> @@ -86,38 +75,40 @@ Simple program to test the compiler  upc_barrier;  printf(" - Hello from thread %in", MYTHREAD);  return 0; - } + } +``` To compile the example use +```bash $ gupc -o count.upc.x count.upc +``` To run the example with 5 threads issue +```bash $ ./count.upc.x -fupc-threads-5 +``` For more informations see the man pages. ### Berkley UPC Compiler -To use the Berkley UPC compiler and runtime environment to run the -binaries use the module bupc +To use the Berkley UPC compiler and runtime environment to run the binaries use the module bupc +```bash $ module add bupc $ upcc -version +``` -As default UPC network the "smp" is used. This is very quick and easy -way for testing/debugging, but limited to one node only. +As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only. -For production runs, it is recommended to use the native Infiband -implementation of UPC network "ibv". For testing/debugging using -multiple nodes, the "mpi" UPC network is recommended. Please note, that -the selection of the network is done at the compile time** and not at -runtime (as expected)! +For production runs, it is recommended to use the native Infiband implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. Please note, that **the selection of the network is done at the compile time** and not at runtime (as expected)! Example UPC code: - $ cat hello.upc +```bash + $ cat hello.upc /* hello.upc - a simple UPC example */ #include <upc.h> @@ -130,34 +121,35 @@ Example UPC code:  upc_barrier;  printf(" - Hello from thread %in", MYTHREAD);  return 0; - } + } +``` To compile the example with the "ibv" UPC network use - $ upcc -network=ibv -o hello.upc.x hello.upc +```bash + $ upcc -network=ibv -o hello.upc.x hello.upc +``` To run the example with 5 threads issue +```bash $ upcrun -n 5 ./hello.upc.x +``` -To run the example on two compute nodes using all 32 cores, with 32 -threads, issue +To run the example on two compute nodes using all 32 cores, with 32 threads, issue - $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 +```bash + $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 $ module add bupc $ upcrun -n 32 ./hello.upc.x +``` - For more informations see the man pages. +For more informations see the man pages. Java ---- - -For information how to use Java (runtime and/or compiler), please read -the [Java page](java.html). +For information how to use Java (runtime and/or compiler), please read the [Java page](java.html). nVidia CUDA ----------- - -For information how to work with nVidia CUDA, please read the [nVidia -CUDA page](nvidia-cuda.html). - +For information how to work with nVidia CUDA, please read the [nVidia CUDA page](nvidia-cuda.html). \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md b/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md new file mode 100644 index 0000000000000000000000000000000000000000..fba0656f32dfbf42190d9c2fb0c8e576ecd4f518 --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/comsol-multiphysics.md @@ -0,0 +1,121 @@ +COMSOL Multiphysics® +==================== + +Introduction +------------------------- +[COMSOL](http://www.comsol.com) is a powerful environment for modelling and solving various engineering and scientific problems based on partial differential equations. COMSOL is designed to solve coupled or multiphysics phenomena. For many +standard engineering problems COMSOL provides add-on products such as electrical, mechanical, fluid flow, and chemical +applications. + +- [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module), +- [Heat Transfer Module](http://www.comsol.com/heat-transfer-module), +- [CFD Module](http://www.comsol.com/cfd-module), +- [Acoustics Module](http://www.comsol.com/acoustics-module), +- and [many others](http://www.comsol.com/products) + +COMSOL also allows an interface support for equation-based modelling of partial differential equations. + +Execution +---------------------- +On the Anselm cluster COMSOL is available in the latest stable version. There are two variants of the release: + +- **Non commercial** or so called **EDU variant**, which can be used for research and educational purposes. +- **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon. + +To load the of COMSOL load the module + +```bash + $ module load comsol +``` + +By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use + +```bash + $ module avail comsol +``` + +If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the [Virtual Network Computing (VNC)](https://docs.it4i.cz/anselm-cluster-documentation/software/comsol/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). + +```bash + $ xhost + + $ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=16 + $ module load comsol + $ comsol +``` + +To run COMSOL in batch mode, without the COMSOL Desktop GUI environment, user can utilized the default (comsol.pbs) job script and execute it via the qsub command. + +```bash +#!/bin/bash +#PBS -l select=3:ncpus=16 +#PBS -q qprod +#PBS -N JOB_NAME +#PBS -A PROJECT_ID + +cd /scratch/$USER/ || exit + +echo Time is `date` +echo Directory is `pwd` +echo '**PBS_NODEFILE***START*******' +cat $PBS_NODEFILE +echo '**PBS_NODEFILE***END*********' + +text_nodes < cat $PBS_NODEFILE + +module load comsol +# module load comsol/43b-COM + +ntask=$(wc -l $PBS_NODEFILE) + +comsol -nn ${ntask} batch -configuration /tmp –mpiarg –rmk –mpiarg pbs -tmpdir /scratch/$USER/ -inputfile name_input_f.mph -outputfile name_output_f.mph -batchlog name_log_f.log +``` + +Working directory has to be created before sending the (comsol.pbs) job script into the queue. Input file (name_input_f.mph) has to be in working directory or full path to input file has to be specified. The appropriate path to the temp directory of the job has to be set by command option (-tmpdir). + +LiveLink™* *for MATLAB®^ +------------------------- +COMSOL is the software package for the numerical solution of the partial differential equations. LiveLink for MATLAB allows connection to the COMSOL**®** API (Application Programming Interface) with the benefits of the programming language and computing environment of the MATLAB. + +LiveLink for MATLAB is available in both **EDU** and **COM** **variant** of the COMSOL release. On Anselm 1 commercial (**COM**) license and the 5 educational (**EDU**) licenses of LiveLink for MATLAB (please see the [ISV Licenses](../isv_licenses.html)) are available. +Following example shows how to start COMSOL model from MATLAB via LiveLink in the interactive mode. + +```bash +$ xhost + +$ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ncpus=16 +$ module load matlab +$ module load comsol +$ comsol server matlab +``` + +At the first time to launch the LiveLink for MATLAB (client-MATLAB/server-COMSOL connection) the login and password is requested and this information is not requested again. + +To run LiveLink for MATLAB in batch mode with (comsol_matlab.pbs) job script you can utilize/modify the following script and execute it via the qsub command. + +```bash +#!/bin/bash +#PBS -l select=3:ncpus=16 +#PBS -q qprod +#PBS -N JOB_NAME +#PBS -A PROJECT_ID + +cd /scratch/$USER || exit + +echo Time is `date` +echo Directory is `pwd` +echo '**PBS_NODEFILE***START*******' +cat $PBS_NODEFILE +echo '**PBS_NODEFILE***END*********' + +text_nodes < cat $PBS_NODEFILE + +module load matlab +module load comsol/43b-EDU + +ntask=$(wc -l $PBS_NODEFILE) + +comsol -nn ${ntask} server -configuration /tmp -mpiarg -rmk -mpiarg pbs -tmpdir /scratch/$USER & +cd /apps/engineering/comsol/comsol43b/mli +matlab -nodesktop -nosplash -r "mphstart; addpath /scratch/$USER; test_job" +``` + +This example shows how to run Livelink for MATLAB with following configuration: 3 nodes and 16 cores per node. Working directory has to be created before submitting (comsol_matlab.pbs) job script into the queue. Input file (test_job.m) has to be in working directory or full path to input file has to be specified. The Matlab command option (-r ”mphstart”) created a connection with a COMSOL server using the default port number. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/comsol/comsol-multiphysics.md b/docs.it4i/anselm-cluster-documentation/software/comsol/comsol-multiphysics.md deleted file mode 100644 index 25f1c4d0fd1cf538124f230a2297a279820118ce..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/comsol/comsol-multiphysics.md +++ /dev/null @@ -1,204 +0,0 @@ -COMSOL Multiphysics® -==================== - - - -Introduction - -------------------------- - -[COMSOL](http://www.comsol.com) -is a powerful environment for modelling and solving various engineering -and scientific problems based on partial differential equations. COMSOL -is designed to solve coupled or multiphysics phenomena. For many -standard engineering problems COMSOL provides add-on products such as -electrical, mechanical, fluid flow, and chemical -applications. - -- >[Structural Mechanics - Module](http://www.comsol.com/structural-mechanics-module), - - -- >[Heat Transfer - Module](http://www.comsol.com/heat-transfer-module), - - -- >[CFD - Module](http://www.comsol.com/cfd-module), - - -- >[Acoustics - Module](http://www.comsol.com/acoustics-module), - - -- >and [many - others](http://www.comsol.com/products) - -COMSOL also allows an -interface support for -equation-based modelling of -partial differential -equations. - -Execution - ----------------------- - -On the Anselm cluster COMSOL is available in the latest -stable version. There are two variants of the release: - -- >**Non commercial** or so - called >**EDU - variant**>, which can be used for research - and educational purposes. - -- >**Commercial** or so called - >**COM variant**, - which can used also for commercial activities. - >**COM variant** - has only subset of features compared to the - >**EDU - variant**> available. - - More about - licensing will be posted here - soon. - - -To load the of COMSOL load the module - -` -$ module load comsol -` - -By default the **EDU -variant**> will be loaded. If user needs other -version or variant, load the particular version. To obtain the list of -available versions use - -` -$ module avail comsol -` - -If user needs to prepare COMSOL jobs in the interactive mode -it is recommend to use COMSOL on the compute nodes via PBS Pro -scheduler. In order run the COMSOL Desktop GUI on Windows is recommended -to use the [Virtual Network Computing -(VNC)](https://docs.it4i.cz/anselm-cluster-documentation/software/comsol/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). - -` -$ xhost + -$ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=16 -$ module load comsol -$ comsol -` - -To run COMSOL in batch mode, without the COMSOL Desktop GUI -environment, user can utilized the default (comsol.pbs) job script and -execute it via the qsub command. - -` -#!/bin/bash -#PBS -l select=3:ncpus=16 -#PBS -q qprod -#PBS -N JOB_NAME -#PBS -A PROJECT_ID - -cd /scratch/$USER/ || exit - -echo Time is `date` -echo Directory is `pwd` -echo '**PBS_NODEFILE***START*******' -cat $PBS_NODEFILE -echo '**PBS_NODEFILE***END*********' - -text_nodes < cat $PBS_NODEFILE - -module load comsol -# module load comsol/43b-COM - -ntask=$(wc -l $PBS_NODEFILE) - -comsol -nn ${ntask} batch -configuration /tmp –mpiarg –rmk –mpiarg pbs -tmpdir /scratch/$USER/ -inputfile name_input_f.mph -outputfile name_output_f.mph -batchlog name_log_f.log -` - -Working directory has to be created before sending the -(comsol.pbs) job script into the queue. Input file (name_input_f.mph) -has to be in working directory or full path to input file has to be -specified. The appropriate path to the temp directory of the job has to -be set by command option (-tmpdir). - -LiveLink™* *for MATLAB®^ -------------------------- - -COMSOL is the software package for the numerical solution of -the partial differential equations. LiveLink for MATLAB allows -connection to the -COMSOL>><span><span><span><span>**®**</span>^ -API (Application Programming Interface) with the benefits of the -programming language and computing environment of the MATLAB. - -LiveLink for MATLAB is available in both -**EDU** and -**COM** -**variant** of the -COMSOL release. On Anselm 1 commercial -(>**COM**) license -and the 5 educational -(>**EDU**) licenses -of LiveLink for MATLAB (please see the [ISV -Licenses](../isv_licenses.html)) are available. -Following example shows how to start COMSOL model from MATLAB via -LiveLink in the interactive mode. - -` -$ xhost + -$ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ncpus=16 -$ module load matlab -$ module load comsol -$ comsol server matlab -` - -At the first time to launch the LiveLink for MATLAB -(client-MATLAB/server-COMSOL connection) the login and password is -requested and this information is not requested again. - -To run LiveLink for MATLAB in batch mode with -(comsol_matlab.pbs) job script you can utilize/modify the following -script and execute it via the qsub command. - -` -#!/bin/bash -#PBS -l select=3:ncpus=16 -#PBS -q qprod -#PBS -N JOB_NAME -#PBS -A PROJECT_ID - -cd /scratch/$USER || exit - -echo Time is `date` -echo Directory is `pwd` -echo '**PBS_NODEFILE***START*******' -cat $PBS_NODEFILE -echo '**PBS_NODEFILE***END*********' - -text_nodes < cat $PBS_NODEFILE - -module load matlab -module load comsol/43b-EDU - -ntask=$(wc -l $PBS_NODEFILE) - -comsol -nn ${ntask} server -configuration /tmp -mpiarg -rmk -mpiarg pbs -tmpdir /scratch/$USER & -cd /apps/engineering/comsol/comsol43b/mli -matlab -nodesktop -nosplash -r "mphstart; addpath /scratch/$USER; test_job" -` - -This example shows how to run Livelink for MATLAB with following -configuration: 3 nodes and 16 cores per node. Working directory has to -be created before submitting (comsol_matlab.pbs) job script into the -queue. Input file (test_job.m) has to be in working directory or full -path to input file has to be specified. The Matlab command option (-r -”mphstart”) created a connection with a COMSOL server using the default -port number. - diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers.md b/docs.it4i/anselm-cluster-documentation/software/debuggers.md deleted file mode 100644 index f24688bd8fb51a85628a030f8f76379432820952..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers.md +++ /dev/null @@ -1,89 +0,0 @@ -Debuggers and profilers summary -=============================== - - - -Introduction ------------- - -We provide state of the art programms and tools to develop, profile and -debug HPC codes at IT4Innovations. -On these pages, we provide an overview of the profiling and debugging -tools available on Anslem at IT4I. - -Intel debugger --------------- - -The intel debugger version 13.0 is available, via module intel. The -debugger works for applications compiled with C and C++ compiler and the -ifort fortran 77/90/95 compiler. The debugger provides java GUI -environment. Use [X -display](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -for running the GUI. - - $ module load intel - $ idb - -Read more at the [Intel -Debugger](intel-suite/intel-debugger.html) page. - -Allinea Forge (DDT/MAP) ------------------------ - -Allinea DDT, is a commercial debugger primarily for debugging parallel -MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel -Xeon Phi accelerators. DDT provides all the standard debugging features -(stack trace, breakpoints, watches, view variables, threads etc.) for -every thread running as part of your program, or for every process - -even if these processes are distributed across a cluster using an MPI -implementation. - - $ module load Forge - $ forge - -Read more at the [Allinea -DDT](debuggers/allinea-ddt.html) page. - -Allinea Performance Reports ---------------------------- - -Allinea Performance Reports characterize the performance of HPC -application runs. After executing your application through the tool, a -synthetic HTML report is generated automatically, containing information -about several metrics along with clear behavior statements and hints to -help you improve the efficiency of your runs. Our license is limited to -64 MPI processes. - - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 - -Read more at the [Allinea Performance -Reports](debuggers/allinea-performance-reports.html) -page. - -RougeWave Totalview -------------------- - -TotalView is a source- and machine-level debugger for multi-process, -multi-threaded programs. Its wide range of tools provides ways to -analyze, organize, and test programs, making it easy to isolate and -identify problems in individual threads and processes in programs of -great complexity. - - $ module load totalview - $ totalview - -Read more at the [Totalview](debuggers/total-view.html) -page. - -Vampir trace analyzer ---------------------- - -Vampir is a GUI trace analyzer for traces in OTF format. - - $ module load Vampir/8.5.0 - $ vampir - -Read more at -the [Vampir](../../salomon/software/debuggers/vampir.html) page. - diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md index 237f66e9b50348ea499370f53427860f0c779afb..e929ddf3b08d4996f0a246b31b1ac16a750fd09c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-ddt.md @@ -1,129 +1,97 @@ -Allinea Forge (DDT,MAP) +Allinea Forge (DDT,MAP) ======================= - - Allinea Forge consist of two tools - debugger DDT and profiler MAP. -Allinea DDT, is a commercial debugger primarily for debugging parallel -MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel -Xeon Phi accelerators. DDT provides all the standard debugging features -(stack trace, breakpoints, watches, view variables, threads etc.) for -every thread running as part of your program, or for every process - -even if these processes are distributed across a cluster using an MPI -implementation. +Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation. -Allinea MAP is a profiler for C/C++/Fortran HPC codes. It is designed -for profiling parallel code, which uses pthreads, OpenMP or MPI. +Allinea MAP is a profiler for C/C++/Fortran HPC codes. It is designed for profiling parallel code, which uses pthreads, OpenMP or MPI. License and Limitations for Anselm Users ---------------------------------------- - -On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel -processes. In case of debugging GPU or Xeon Phi accelerated codes the -limit is 8 accelerators. These limitation means that: +On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that: - 1 user can debug up 64 processes, or - 32 users can debug 2 processes, etc. In case of debugging on accelerators: -- 1 user can debug on up to 8 accelerators, or -- 8 users can debug on single accelerator. +- 1 user can debug on up to 8 accelerators, or +- 8 users can debug on single accelerator. Compiling Code to run with DDT ------------------------------ ### Modules -Load all necessary modules to compile the code. For example: +Load all necessary modules to compile the code. For example: +```bash $ module load intel - $ module load impi ... or ... module load openmpi/X.X.X-icc + $ module load impi ... or ... module load openmpi/X.X.X-icc +``` Load the Allinea DDT module: +```bash $ module load Forge +``` Compile the code: -` +```bash $ mpicc -g -O0 -o test_debug test.c $ mpif90 -g -O0 -o test_debug test.f -` - - +``` ### Compiler flags Before debugging, you need to compile your code with theses flags: --g** : Generates extra debugging information usable by GDB. -g3** -includes even more debugging information. This option is available for +>- **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. --O0** : Suppress all optimizations.** - - +>- **O0** : Suppress all optimizations. Starting a Job with DDT ----------------------- +Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -Be sure to log in with an X window -forwarding enabled. This could mean using the -X in the ssh:  - - $ ssh -X username@anselm.it4i.cz +```bash + $ ssh -X username@anselm.it4i.cz +``` -Other options is to access login node using VNC. Please see the detailed -information on how to [use graphic user interface on -Anselm](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -. +Other options is to access login node using VNC. Please see the detailed information on how to [use graphic user interface on Anselm](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -From the login node an interactive session **with X windows forwarding** -(-X option) can be started by following command: +From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command: - $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +```bash + $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +``` -Then launch the debugger with the ddt command followed by the name of -the executable to debug: +Then launch the debugger with the ddt command followed by the name of the executable to debug: +```bash $ ddt test_debug +``` -A submission window that appears have -a prefilled path to the executable to debug. You can select the number -of MPI processors and/or OpenMP threads on which to run and press run. -Command line arguments to a program can be entered to the -"Arguments " -box. +A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. Command line arguments to a program can be entered to the "Arguments " box. - + -To start the debugging directly without the submission window, user can -specify the debugging and execution parameters from the command line. -For example the number of MPI processes is set by option "-np 4". -Skipping the dialog is done by "-start" option. To see the list of the -"ddt" command line parameters, run "ddt --help".  +To start the debugging directly without the submission window, user can specify the debugging and execution parameters from the command line. For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help". +```bash ddt -start -np 4 ./hello_debug_impi - - +``` Documentation ------------- +Users can find original User Guide after loading the DDT module: -Users can find original User Guide after loading the DDT module: - +```bash $DDTPATH/doc/userguide.pdf +``` - - - - - [1] Discipline, Magic, Inspiration and Science: Best Practice -Debugging with Allinea DDT, Workshop conducted at LLNL by Allinea on May -10, 2013, -[link](https://computing.llnl.gov/tutorials/allineaDDT/index.html) - - - +[1] Discipline, Magic, Inspiration and Science: Best Practice Debugging with Allinea DDT, Workshop conducted at LLNL by Allinea on May 10, 2013, [link](https://computing.llnl.gov/tutorials/allineaDDT/index.html) \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md index 3c2e3ee645fc31b55e0cc6132c6a31f160674826..c58e94b3a3a71ba8ab3e9025a20350c56ca2ff0e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md @@ -1,21 +1,13 @@ -Allinea Performance Reports +Allinea Performance Reports =========================== -quick application profiling - - +##quick application profiling Introduction ------------ +Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. -Allinea Performance Reports characterize the performance of HPC -application runs. After executing your application through the tool, a -synthetic HTML report is generated automatically, containing information -about several metrics along with clear behavior statements and hints to -help you improve the efficiency of your runs. - -The Allinea Performance Reports is most useful in profiling MPI -programs. +The Allinea Performance Reports is most useful in profiling MPI programs. Our license is limited to 64 MPI processes. @@ -24,54 +16,47 @@ Modules Allinea Performance Reports version 6.0 is available +```bash $ module load PerformanceReports/6.0 +``` -The module sets up environment variables, required for using the Allinea -Performance Reports. This particular command loads the default module, -which is performance reports version 4.2. +The module sets up environment variables, required for using the Allinea Performance Reports. This particular command loads the default module, which is performance reports version 4.2. Usage ----- +>Use the the perf-report wrapper on your (MPI) program. -Use the the perf-report wrapper on your (MPI) program. - -Instead of [running your MPI program the usual -way](../mpi-1.html), use the the perf report wrapper: +Instead of [running your MPI program the usual way](../mpi-1.md), use the the perf report wrapper: +```bash $ perf-report mpirun ./mympiprog.x +``` -The mpi program will run as usual. The perf-report creates two -additional files, in *.txt and *.html format, containing the -performance report. Note that [demanding MPI codes should be run within -the queue -system](../../resource-allocation-and-job-execution/job-submission-and-execution.html). +The mpi program will run as usual. The perf-report creates two additional files, in *.txt and *.html format, containing the performance report. Note that [demanding MPI codes should be run within the queue system](../../resource-allocation-and-job-execution/job-submission-and-execution.md). Example ------- - -In this example, we will be profiling the mympiprog.x MPI program, using -Allinea performance reports. Assume that the code is compiled with intel -compilers and linked against intel MPI library: +In this example, we will be profiling the mympiprog.x MPI program, using Allinea performance reports. Assume that the code is compiled with intel compilers and linked against intel MPI library: First, we allocate some nodes via the express queue: +```bash $ qsub -q qexp -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 -I qsub: waiting for job 262197.dm2 to start qsub: job 262197.dm2 ready +``` Then we load the modules and run the program the usual way: +```bash $ module load intel impi allinea-perf-report/4.2 $ mpirun ./mympiprog.x +``` Now lets profile the code: +```bash $ perf-report mpirun ./mympiprog.x +``` -Performance report files -[mympiprog_32p*.txt](mympiprog_32p_2014-10-15_16-56.txt) -and -[mympiprog_32p*.html](mympiprog_32p_2014-10-15_16-56.html) -were created. We can see that the code is very efficient on MPI and is -CPU bounded. - +Performance report files [mympiprog_32p*.txt](mympiprog_32p_2014-10-15_16-56.txt) and [mympiprog_32p*.html](mympiprog_32p_2014-10-15_16-56.html) were created. We can see that the code is very efficient on MPI and is CPU bounded. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md index 008d86c04f18021fcc536afb7da083be182cd959..96d237f7291022e29d8390f7b1114508a7c32313 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/cube.md @@ -1,65 +1,37 @@ -CUBE +CUBE ==== Introduction ------------ +CUBE is a graphical performance report explorer for displaying data from Score-P and Scalasca (and other compatible tools). The name comes from the fact that it displays performance data in a three-dimensions : -CUBE is a graphical performance report explorer for displaying data from -Score-P and Scalasca (and other compatible tools). The name comes from -the fact that it displays performance data in a three-dimensions : - -- **performance metric**, where a number of metrics are available, - such as communication time or cache misses, +- **performance metric**, where a number of metrics are available, such as communication time or cache misses, - **call path**, which contains the call tree of your program -- s**ystem resource**, which contains system's nodes, processes and - threads, depending on the parallel programming model. +- s**ystem resource**, which contains system's nodes, processes and threads, depending on the parallel programming model. -Each dimension is organized in a tree, for example the time performance -metric is divided into Execution time and Overhead time, call path -dimension is organized by files and routines in your source code etc. +Each dimension is organized in a tree, for example the time performance metric is divided into Execution time and Overhead time, call path dimension is organized by files and routines in your source code etc.  *Figure 1. Screenshot of CUBE displaying data from Scalasca.* -* -*Each node in the tree is colored by severity (the color scheme is -displayed at the bottom of the window, ranging from the least severe -blue to the most severe being red). For example in Figure 1, we can see -that most of the point-to-point MPI communication happens in routine -exch_qbc, colored red. +Each node in the tree is colored by severity (the color scheme is displayed at the bottom of the window, ranging from the least severe blue to the most severe being red). For example in Figure 1, we can see that most of the point-to-point MPI communication happens in routine exch_qbc, colored red. Installed versions ------------------ +Currently, there are two versions of CUBE 4.2.3 available as [modules](../../environment-and-modules.html) : -Currently, there are two versions of CUBE 4.2.3 available as -[modules](../../environment-and-modules.html) : - -- class="s1"> cube/4.2.3-gcc, - compiled with GCC - -- class="s1"> cube/4.2.3-icc, - compiled with Intel compiler +- cube/4.2.3-gcc, compiled with GCC +- cube/4.2.3-icc, compiled with Intel compiler Usage ----- +CUBE is a graphical application. Refer to [Graphical User Interface documentation](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for a list of methods to launch graphical applications on Anselm. -CUBE is a graphical application. Refer to [Graphical User Interface -documentation](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -for a list of methods to launch graphical applications on Anselm. - -Analyzing large data sets can consume large amount of CPU and RAM. Do -not perform large analysis on login nodes. - -After loading the apropriate module, simply launch -cube command, or alternatively you can use - scalasca -examine command to launch the -GUI. Note that for Scalasca datasets, if you do not analyze the data -with > scalasca --examine before to opening them with CUBE, not all -performance data will be available. +>Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes. - >References +After loading the apropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available. +References 1. <http://www.scalasca.org/software/cube-4.x/download.html> diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/debuggers.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/debuggers.md new file mode 100644 index 0000000000000000000000000000000000000000..04a6a0b5ebed669d5b92333e1cf8149762173bca --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/debuggers.md @@ -0,0 +1,61 @@ +Debuggers and profilers summary +=============================== + +Introduction +------------ +We provide state of the art programms and tools to develop, profile and debug HPC codes at IT4Innovations. On these pages, we provide an overview of the profiling and debugging tools available on Anslem at IT4I. + +Intel debugger +-------------- +The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. + +```bash + $ module load intel + $ idb +``` + +Read more at the [Intel Debugger](intel-suite/intel-debugger.html) page. + +Allinea Forge (DDT/MAP) +----------------------- +Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process even if these processes are distributed across a cluster using an MPI implementation. + +```bash + $ module load Forge + $ forge +``` + +Read more at the [Allinea DDT](debuggers/allinea-ddt.html) page. + +Allinea Performance Reports +--------------------------- +Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. + +```bash + $ module load PerformanceReports/6.0 + $ perf-report mpirun -n 64 ./my_application argument01 argument02 +``` + +Read more at the [Allinea Performance Reports](debuggers/allinea-performance-reports.html) page. + +RougeWave Totalview +------------------- +TotalView is a source- and machine-level debugger for multi-process, multi-threaded programs. Its wide range of tools provides ways to analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. + +```bash + $ module load totalview + $ totalview +``` + +Read more at the [Totalview](debuggers/total-view.html) page. + +Vampir trace analyzer +--------------------- +Vampir is a GUI trace analyzer for traces in OTF format. + +```bash + $ module load Vampir/8.5.0 + $ vampir +``` + +Read more at the [Vampir](../../salomon/software/debuggers/vampir.html) page. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md index eeb8206b2c86d2cc09b8ae5f1e84409b700a597b..b552be9ca6fc356cc7359f51e007e48c2f44c54d 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md @@ -1,42 +1,35 @@ -Intel Performance Counter Monitor +Intel Performance Counter Monitor ================================= Introduction ------------ - -Intel PCM (Performance Counter Monitor) is a tool to monitor performance -hardware counters on Intel>® processors, similar to -[PAPI](papi.html). The difference between PCM and PAPI -is that PCM supports only Intel hardware, but PCM can monitor also -uncore metrics, like memory controllers and >QuickPath Interconnect -links. +Intel PCM (Performance Counter Monitor) is a tool to monitor performance hardware counters on Intel>® processors, similar to [PAPI](papi.html). The difference between PCM and PAPI is that PCM supports only Intel hardware, but PCM can monitor also uncore metrics, like memory controllers and >QuickPath Interconnect links. Installed version ------------------------------ +Currently installed version 2.6. To load the [module](../../environment-and-modules.html), issue: -Currently installed version 2.6. To load the -[module](../../environment-and-modules.html), issue : - +```bash $ module load intelpcm +``` Command line tools ------------------ - -PCM provides a set of tools to monitor system/or application. +PCM provides a set of tools to monitor system/or application. ### pcm-memory -Measures memory bandwidth of your application or the whole system. -Usage: + Measures memory bandwidth of your application or the whole system. Usage: +```bash $ pcm-memory.x <delay>|[external_program parameters] +``` -Specify either a delay of updates in seconds or an external program to -monitor. If you get an error about PMU in use, respond "y" and relaunch -the program. +Specify either a delay of updates in seconds or an external program to monitor. If you get an error about PMU in use, respond "y" and relaunch the program. Sample output: +```bash ---------------------------------------||--------------------------------------- -- Socket 0 --||-- Socket 1 -- ---------------------------------------||--------------------------------------- @@ -61,11 +54,11 @@ Sample output: -- System Write Throughput(MB/s): 3.43 -- -- System Memory Throughput(MB/s): 8.35 -- ---------------------------------------||--------------------------------------- +``` ### pcm-msr -Command pcm-msr.x can be used to -read/write model specific registers of the CPU. +Command pcm-msr.x can be used to read/write model specific registers of the CPU. ### pcm-numa @@ -73,23 +66,19 @@ NUMA monitoring utility does not work on Anselm. ### pcm-pcie -Can be used to monitor PCI Express bandwith. Usage: -pcm-pcie.x <delay> +Can be used to monitor PCI Express bandwith. Usage: pcm-pcie.x <delay> ### pcm-power -Displays energy usage and thermal headroom for CPU and DRAM sockets. -Usage: > pcm-power.x <delay> | -<external program> +Displays energy usage and thermal headroom for CPU and DRAM sockets. Usage: pcm-power.x <delay> | <external program> ### pcm -This command provides an overview of performance counters and memory -usage. >Usage: > pcm.x -<delay> | <external program> +This command provides an overview of performance counters and memory usage. Usage: pcm.x <delay> | <external program> Sample output : +```bash $ pcm.x ./matrix Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100 ID=db05e43) @@ -122,8 +111,8 @@ Sample output : IPC : instructions per CPU cycle FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) - L3MISS: L3 cache misses - L2MISS: L2 cache misses (including other core's L2 cache *hits*) + L3MISS: L3 cache misses + L2MISS: L2 cache misses (including other core's L2 cache *hits*) L3HIT : L3 cache hit ratio (0.00-1.00) L2HIT : L2 cache hit ratio (0.00-1.00) L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency @@ -166,21 +155,21 @@ Sample output : Intel(r) QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links): - QPI0 QPI1 | QPI0 QPI1 + QPI0 QPI1 | QPI0 QPI1 ---------------------------------------------------------------------------------------------- - SKT 0 0 0 | 0% 0% - SKT 1 0 0 | 0% 0% + SKT 0 0 0 | 0% 0% + SKT 1 0 0 | 0% 0% ---------------------------------------------------------------------------------------------- Total QPI incoming data traffic: 0 QPI data traffic/Memory controller traffic: 0.00 Intel(r) QPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through QPI links): - QPI0 QPI1 | QPI0 QPI1 + QPI0 QPI1 | QPI0 QPI1 ---------------------------------------------------------------------------------------------- - SKT 0 0 0 | 0% 0% - SKT 1 0 0 | 0% 0% + SKT 0 0 0 | 0% 0% + SKT 1 0 0 | 0% 0% ---------------------------------------------------------------------------------------------- - Total QPI outgoing data and non-data traffic: 0 + Total QPI outgoing data and non-data traffic: 0 ---------------------------------------------------------------------------------------------- SKT 0 package consumed 4.06 Joules @@ -194,28 +183,21 @@ Sample output : ---------------------------------------------------------------------------------------------- TOTAL: 8.47 Joules Cleaning up - - +``` ### pcm-sensor -Can be used as a sensor for ksysguard GUI, which is currently not -installed on Anselm. +Can be used as a sensor for ksysguard GUI, which is currently not installed on Anselm. API --- +In a similar fashion to PAPI, PCM provides a C++ API to access the performance counter from within your application. Refer to the [doxygen documentation](http://intel-pcm-api-documentation.github.io/classPCM.html) for details of the API. -In a similar fashion to PAPI, PCM provides a C++ API to access the -performance counter from within your application. Refer to the [doxygen -documentation](http://intel-pcm-api-documentation.github.io/classPCM.html) -for details of the API. - -Due to security limitations, using PCM API to monitor your applications -is currently not possible on Anselm. (The application must be run as -root user) +>Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user) Sample program using the API : +```cpp #include <stdlib.h> #include <stdio.h> #include "cpucounters.h" @@ -260,13 +242,17 @@ Sample program using the API : return 0; } +``` Compile it with : +```bash $ icc matrix.cpp -o matrix -lpthread -lpcm +``` -Sample output : +Sample output: +```bash $ ./matrix Number of physical cores: 16 Number of logical cores: 16 @@ -286,14 +272,11 @@ Sample output : Instructions per clock:1.7 L3 cache hit ratio:1.0 Bytes read:12513408 +``` References ---------- - 1. <https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization> -2. <https://software.intel.com/sites/default/files/m/3/2/2/xeon-e5-2600-uncore-guide.pdf> Intel® - Xeon® Processor E5-2600 Product Family Uncore Performance - Monitoring Guide. -3. <http://intel-pcm-api-documentation.github.io/classPCM.html> API - Documentation +2. <https://software.intel.com/sites/default/files/m/3/2/2/xeon-e5-2600-uncore-guide.pdf> Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Guide. +3. <http://intel-pcm-api-documentation.github.io/classPCM.html> API Documentation diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md index b22b73f2b336d267a0279892d251d20b30381838..67ddd11fe19e4af5b370bbb0c90662014c039f02 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md @@ -1,15 +1,9 @@ -Intel VTune Amplifier +Intel VTune Amplifier ===================== - - Introduction ------------ - -Intel*® *VTune™ >Amplifier, part of Intel Parallel studio, is a GUI -profiling tool designed for Intel processors. It offers a graphical -performance analysis of single core and multithreaded applications. A -highlight of the features: +Intel*® *VTune™ >Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A highlight of the features: - Hotspot analysis - Locks and waits analysis @@ -21,89 +15,57 @@ highlight of the features: Usage ----- - To launch the GUI, first load the module: +```bash $ module add VTune/2016_update1 +``` - class="s1">and launch the GUI : +and launch the GUI : +```bash $ amplxe-gui +``` + +>To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on [using GUI applications](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). + +The GUI will open in new window. Click on "*New Project...*" to create a new project. After clicking *OK*, a new window with project properties will appear.  At "*Application:*", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "*Managed code profiling mode:*" select "*Native*" (unless you want to profile managed mode .NET/Mono applications). After clicking *OK*, your project is created. -To profile an application with VTune Amplifier, special kernel -modules need to be loaded. The modules are not loaded on Anselm login -nodes, thus direct profiling on login nodes is not possible. Use VTune -on compute nodes and refer to the documentation on [using GUI -applications](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). - -The GUI will open in new window. Click on "*New Project...*" to -create a new project. After clicking *OK*, a new window with project -properties will appear.  At "*Application:*", select the bath to your -binary you want to profile (the binary should be compiled with -g flag). -Some additional options such as command line arguments can be selected. -At "*Managed code profiling mode:*" select "*Native*" (unless you want -to profile managed mode .NET/Mono applications). After clicking *OK*, -your project is created. - -To run a new analysis, click "*New analysis...*". You will see a list of -possible analysis. Some of them will not be possible on the current CPU -(eg. Intel Atom analysis is not possible on Sandy Bridge CPU), the GUI -will show an error box if you select the wrong analysis. For example, -select "*Advanced Hotspots*". Clicking on *Start *will start profiling -of the application. +To run a new analysis, click "*New analysis...*". You will see a list of possible analysis. Some of them will not be possible on the current CPU (eg. Intel Atom analysis is not possible on Sandy Bridge CPU), the GUI will show an error box if you select the wrong analysis. For example, select "*Advanced Hotspots*". Clicking on *Start *will start profiling of the application. Remote Analysis --------------- - -VTune Amplifier also allows a form of remote analysis. In this mode, -data for analysis is collected from the command line without GUI, and -the results are then loaded to GUI on another machine. This allows -profiling without interactive graphical jobs. To perform a remote -analysis, launch a GUI somewhere, open the new analysis window and then -click the button "*Command line*" in bottom right corner. It will show -the command line needed to perform the selected analysis. +VTune Amplifier also allows a form of remote analysis. In this mode, data for analysis is collected from the command line without GUI, and the results are then loaded to GUI on another machine. This allows profiling without interactive graphical jobs. To perform a remote analysis, launch a GUI somewhere, open the new analysis window and then click the button "*Command line*" in bottom right corner. It will show the command line needed to perform the selected analysis. The command line will look like this: +```bash /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-and-callcount -mrte-mode=native -target-duration-type=veryshort -app-working-dir /home/sta545/test -- /home/sta545/test_pgsesv +``` -Copy the line to clipboard and then you can paste it in your jobscript -or in command line. After the collection is run, open the GUI once -again, click the menu button in the upper right corner, and select -"*Open > Result...*". The GUI will load the results from the run. +Copy the line to clipboard and then you can paste it in your jobscript or in command line. After the collection is run, open the GUI once again, click the menu button in the upper right corner, and select "*Open > Result...*". The GUI will load the results from the run. Xeon Phi -------- +>This section is outdated. It will be updated with new information soon. -This section is outdated. It will be updated with new information soon. - -It is possible to analyze both native and offload Xeon Phi applications. -For offload mode, just specify the path to the binary. For native mode, -you need to specify in project properties: +It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties: Application: ssh -Application parameters: mic0 source ~/.profile -&& /path/to/your/bin +Application parameters: mic0 source ~/.profile && /path/to/your/bin -Note that we include source ~/.profile -in the command to setup environment paths [as described -here](../intel-xeon-phi.html). +Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi.html). -If the analysis is interrupted or aborted, further analysis on the card -might be impossible and you will get errors like "ERROR connecting to -MIC card". In this case please contact our support to reboot the MIC -card. +>If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card. -You may also use remote analysis to collect data from the MIC and then -analyze it in the GUI later : +You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : +```bash $ amplxe-cl -collect knc-hotspots -no-auto-finalize -- ssh mic0 "export LD_LIBRARY_PATH=/apps/intel/composer_xe_2015.2.164/compiler/lib/mic/:/apps/intel/composer_xe_2015.2.164/mkl/lib/mic/; export KMP_AFFINITY=compact; /tmp/app.mic" +``` References ---------- - -1. ><https://www.rcac.purdue.edu/tutorials/phi/PerformanceTuningXeonPhi-Tullos.pdf> Performance - Tuning for Intel® Xeon Phi™ Coprocessors - +1. <https://www.rcac.purdue.edu/tutorials/phi/PerformanceTuningXeonPhi-Tullos.pdf> Performance Tuning for Intel® Xeon Phi™ Coprocessors \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md index df34e4232f4b6059c4522aa033e9e8bf0fc5c08f..e7465fbc5b99504cbc36cd7fcebb7cb34b5d599c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/papi.md @@ -1,49 +1,34 @@ -PAPI +PAPI ==== - - Introduction ------------ - dir="auto">Performance Application Programming -Interface >(PAPI)  is a portable interface to access -hardware performance counters (such as instruction counts and cache -misses) found in most modern architectures. With the new component -framework, PAPI is not limited only to CPU counters, but offers also -components for CUDA, network, Infiniband etc. +Performance Application Programming Interface (PAPI)  is a portable interface to access hardware performance counters (such as instruction counts and cache misses) found in most modern architectures. With the new component framework, PAPI is not limited only to CPU counters, but offers also components for CUDA, network, Infiniband etc. -PAPI provides two levels of interface - a simpler, high level -interface and more detailed low level interface. +PAPI provides two levels of interface - a simpler, high level interface and more detailed low level interface. PAPI can be used with parallel as well as serial programs. Usage ----- +To use PAPI, load [module](../../environment-and-modules.html) papi: -To use PAPI, load -[module](../../environment-and-modules.html) -papi : - +```bash $ module load papi +``` -This will load the default version. Execute -module avail papi for a list of installed -versions. +This will load the default version. Execute module avail papi for a list of installed versions. Utilites -------- - -The bin directory of PAPI (which is -automatically added to $PATH upon -loading the module) contains various utilites. +The bin directory of PAPI (which is automatically added to $PATH upon loading the module) contains various utilites. ### papi_avail -Prints which preset events are available on the current CPU. The third -column indicated whether the preset event is available on the current -CPU. +Prints which preset events are available on the current CPU. The third column indicated whether the preset event is available on the current CPU. +```bash $ papi_avail Available events and hardware information. -------------------------------------------------------------------------------- @@ -74,12 +59,12 @@ CPU. PAPI_L1_TCM 0x80000006 Yes Yes Level 1 cache misses PAPI_L2_TCM 0x80000007 Yes No Level 2 cache misses PAPI_L3_TCM 0x80000008 Yes No Level 3 cache misses - .... + .... +``` ### papi_native_avail -Prints which native events are available on the current -CPU. +Prints which native events are available on the current CPU. ### class="s1">papi_cost @@ -87,60 +72,46 @@ Measures the cost (in cycles) of basic PAPI operations. ###papi_mem_info -Prints information about the memory architecture of the current -CPU. +Prints information about the memory architecture of the current CPU. PAPI API -------- +PAPI provides two kinds of events: -PAPI provides two kinds of events: - -- **Preset events** is a set of predefined common CPU events, - >standardized across platforms. -- **Native events **is a set of all events supported by the - current hardware. This is a larger set of features than preset. For - other components than CPU, only native events are usually available. +- **Preset events** is a set of predefined common CPU events, standardized across platforms. +- **Native events **is a set of all events supported by the current hardware. This is a larger set of features than preset. For other components than CPU, only native events are usually available. -To use PAPI in your application, you need to link the appropriate -include file. +To use PAPI in your application, you need to link the appropriate include file. - papi.h for C - f77papi.h for Fortran 77 - f90papi.h for Fortran 90 - fpapi.h for Fortran with preprocessor -The include path is automatically added by papi module to -$INCLUDE. +The include path is automatically added by papi module to $INCLUDE. ### High level API -Please refer -to <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:High_Level> for a -description of the High level API. +Please refer to <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:High_Level> for a description of the High level API. ### Low level API -Please refer -to <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:Low_Level> for a -description of the Low level API. +Please refer to <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:Low_Level> for a description of the Low level API. ### Timers -PAPI provides the most accurate timers the platform can support. -See <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:Timers> +PAPI provides the most accurate timers the platform can support. See <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:Timers> ### System information -PAPI can be used to query some system infromation, such as CPU name and -MHz. -See <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:System_Information> +PAPI can be used to query some system infromation, such as CPU name and MHz. See <http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:System_Information> Example ------- -The following example prints MFLOPS rate of a naive matrix-matrix -multiplication : +The following example prints MFLOPS rate of a naive matrix-matrix multiplication: +```bash #include <stdlib.h> #include <stdio.h> #include "papi.h" @@ -158,7 +129,7 @@ multiplication : mresult[0][i] = 0.0; matrixa[0][i] = matrixb[0][i] = rand()*(float)1.1; } -  + /* Setup PAPI library and begin collecting data from the counters */ if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK) printf("Error!"); @@ -172,96 +143,97 @@ multiplication : /* Collect the data into the variables passed in */ if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK) printf("Error!"); - + printf("Real_time:t%fnProc_time:t%fnTotal flpins:t%lldnMFLOPS:tt%fn", real_time, proc_time, flpins, mflops); PAPI_shutdown(); return 0; } +``` - Now compile and run the example : +Now compile and run the example : +```bash $ gcc matrix.c -o matrix -lpapi $ ./matrix Real_time: 8.852785 Proc_time: 8.850000 Total flpins: 6012390908 - MFLOPS: 679.366211 + MFLOPS: 679.366211 +``` Let's try with optimizations enabled : +```bash $ gcc -O3 matrix.c -o matrix -lpapi $ ./matrix Real_time: 0.000020 Proc_time: 0.000000 Total flpins: 6 MFLOPS: inf +``` -Now we see a seemingly strange result - the multiplication took no time -and only 6 floating point instructions were issued. This is because the -compiler optimizations have completely removed the multiplication loop, -as the result is actually not used anywhere in the program. We can fix -this by adding some "dummy" code at the end of the Matrix-Matrix -multiplication routine : +Now we see a seemingly strange result - the multiplication took no time and only 6 floating point instructions were issued. This is because the compiler optimizations have completely removed the multiplication loop, as the result is actually not used anywhere in the program. We can fix this by adding some "dummy" code at the end of the Matrix-Matrix multiplication routine : +```cpp for (i=0; i<SIZE;i++) for (j=0; j<SIZE; j++) if (mresult[i][j] == -1.0) printf("x"); +``` -Now the compiler won't remove the multiplication loop. (However it is -still not that smart to see that the result won't ever be negative). Now -run the code again: +Now the compiler won't remove the multiplication loop. (However it is still not that smart to see that the result won't ever be negative). Now run the code again: +```bash $ gcc -O3 matrix.c -o matrix -lpapi $ ./matrix Real_time: 8.795956 Proc_time: 8.790000 Total flpins: 18700983160 - MFLOPS: 2127.529297 + MFLOPS: 2127.529297 +``` ### Intel Xeon Phi -PAPI currently supports only a subset of counters on the Intel Xeon Phi -processor compared to Intel Xeon, for example the floating point -operations counter is missing. +>PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing. -To use PAPI in [Intel Xeon -Phi](../intel-xeon-phi.html) native applications, you -need to load module with " -mic" suffix, -for example " papi/5.3.2-mic" : +To use PAPI in [Intel Xeon Phi](../intel-xeon-phi.html) native applications, you need to load module with " -mic" suffix, for example " papi/5.3.2-mic" : +```bash $ module load papi/5.3.2-mic +``` Then, compile your application in the following way: +```bash $ module load intel $ icc -mmic -Wl,-rpath,/apps/intel/composer_xe_2013.5.192/compiler/lib/mic matrix-mic.c -o matrix-mic -lpapi -lpfm +``` -To execute the application on MIC, you need to manually set -LD_LIBRARY_PATH : +To execute the application on MIC, you need to manually set LD_LIBRARY_PATH: +```bash $ qsub -q qmic -A NONE-0-0 -I $ ssh mic0 - $ export LD_LIBRARY_PATH=/apps/tools/papi/5.4.0-mic/lib/ - $ ./matrix-mic + $ export LD_LIBRARY_PATH=/apps/tools/papi/5.4.0-mic/lib/ + $ ./matrix-mic +``` -Alternatively, you can link PAPI statically ( --static flag), then -LD_LIBRARY_PATH does not need to be set. +Alternatively, you can link PAPI statically (-static flag), then LD_LIBRARY_PATH does not need to be set. You can also execute the PAPI tools on MIC : +```bash $ /apps/tools/papi/5.4.0-mic/bin/papi_native_avail +``` -To use PAPI in offload mode, you need to provide both host and MIC -versions of PAPI: +To use PAPI in offload mode, you need to provide both host and MIC versions of PAPI: +```bash $ module load papi/5.4.0 $ icc matrix-offload.c -o matrix-offload -offload-option,mic,compiler,"-L$PAPI_HOME-mic/lib -lpapi" -lpapi +``` References ---------- - 1. <http://icl.cs.utk.edu/papi/> Main project page 2. <http://icl.cs.utk.edu/projects/papi/wiki/Main_Page> Wiki -3. <http://icl.cs.utk.edu/papi/docs/> API Documentation - +3. <http://icl.cs.utk.edu/papi/docs/> API Documentation \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md index df942821880a9b653310e3d01b1489cf13665e76..a91e07b4d8158f16f0986834a164a072ee2e5403 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/scalasca.md @@ -1,107 +1,70 @@ -Scalasca +Scalasca ======== Introduction ------------------------- +[Scalasca](http://www.scalasca.org/) is a software tool that supports the performance optimization of parallel programs by measuring and analyzing their runtime behavior. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes. -[Scalasca](http://www.scalasca.org/) is a software tool -that supports the performance optimization of parallel programs by -measuring and analyzing their runtime behavior. The analysis identifies -potential performance bottlenecks – in particular those concerning -communication and synchronization – and offers guidance in exploring -their causes. - -Scalasca supports profiling of MPI, OpenMP and hybrid MPI+OpenMP -applications. +Scalasca supports profiling of MPI, OpenMP and hybrid MPI+OpenMP applications. Installed versions ------------------ +There are currently two versions of Scalasca 2.0 [modules](../../environment-and-modules.html) installed on Anselm: -There are currently two versions of Scalasca 2.0 -[modules](../../environment-and-modules.html) installed -on Anselm: - -- class="s1"> - scalasca2/2.0-gcc-openmpi, for usage with - [GNU Compiler](../compilers.html) and - [OpenMPI](../mpi-1/Running_OpenMPI.html), - -- class="s1"> - scalasca2/2.0-icc-impi, for usage with - [Intel Compiler](../compilers.html) and [Intel - MPI](../mpi-1/running-mpich2.html). +- scalasca2/2.0-gcc-openmpi, for usage with [GNU Compiler](../compilers.html) and [OpenMPI](../mpi-1/Running_OpenMPI.html), +- scalasca2/2.0-icc-impi, for usage with [Intel Compiler](../compilers.html) and [Intel MPI](../mpi-1/running-mpich2.html). Usage ----- - Profiling a parallel application with Scalasca consists of three steps: -1. Instrumentation, compiling the application such way, that the - profiling data can be generated. -2. Runtime measurement, running the application with the Scalasca - profiler to collect performance data. +1. Instrumentation, compiling the application such way, that the profiling data can be generated. +2. Runtime measurement, running the application with the Scalasca profiler to collect performance data. 3. Analysis of reports ### Instrumentation -Instrumentation via " scalasca --instrument" is discouraged. Use [Score-P -instrumentation](score-p.html). +Instrumentation via " scalasca -instrument" is discouraged. Use [Score-P instrumentation](score-p.html). ### Runtime measurement -After the application is instrumented, runtime measurement can be -performed with the " scalasca -analyze" -command. The syntax is : +After the application is instrumented, runtime measurement can be performed with the " scalasca -analyze" command. The syntax is: - scalasca -analyze [scalasca options] -[launcher] [launcher options] [program] [program options] +scalasca -analyze [scalasca options] [launcher] [launcher options] [program] [program options] An example : +```bash $ scalasca -analyze mpirun -np 4 ./mympiprogram +``` Some notable Scalsca options are: --t Enable trace data collection. By default, only summary data are -collected. --e <directory> Specify a directory to save the collected data to. -By default, Scalasca saves the data to a directory with -prefix >scorep_, followed by name of the executable and launch -configuration. +**-t Enable trace data collection. By default, only summary data are collected.** +**-e <directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep_, followed by name of the executable and launch configuration.** -Scalasca can generate a huge amount of data, especially if tracing is -enabled. Please consider saving the data to a [scratch -directory](../../storage.html). +>Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage.html). ### Analysis of reports -For the analysis, you must have [Score-P](score-p.html) -and [CUBE](cube.html) modules loaded. The analysis is -done in two steps, first, the data is preprocessed and then CUBE GUI -tool is launched. +For the analysis, you must have [Score-P](score-p.html) and [CUBE](cube.html) modules loaded. The analysis is done in two steps, first, the data is preprocessed and then CUBE GUI tool is launched. To launch the analysis, run : -` +```bash scalasca -examine [options] <experiment_directory> -` +``` If you do not wish to launch the GUI tool, use the "-s" option : -` +```bash scalasca -examine -s <experiment_directory> -` +``` -Alternatively you can open CUBE and load the data directly from here. -Keep in mind that in that case the preprocessing is not done and not all -metrics will be shown in the viewer. +Alternatively you can open CUBE and load the data directly from here. Keep in mind that in that case the preprocessing is not done and not all metrics will be shown in the viewer. -Refer to [CUBE documentation](cube.html) on usage of the -GUI viewer. +Refer to [CUBE documentation](cube.html) on usage of the GUI viewer. References ---------- - -1. <http://www.scalasca.org/> - +1. <http://www.scalasca.org/> \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md index 0462b82ff2e97f3e2c0078740dc241bae80272e2..4453d0d121641d6fba326f556d6b88737ca6800c 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/score-p.md @@ -1,86 +1,56 @@ -Score-P +Score-P ======= Introduction ------------ +The [Score-P measurement infrastructure](http://www.vi-hps.org/projects/score-p/) is a highly scalable and easy-to-use tool suite for profiling, event tracing, and online analysis of HPC applications. -The [Score-P measurement -infrastructure](http://www.vi-hps.org/projects/score-p/) -is a highly scalable and easy-to-use tool suite for profiling, event -tracing, and online analysis of HPC applications. - -Score-P can be used as an instrumentation tool for -[Scalasca](scalasca.html). +Score-P can be used as an instrumentation tool for [Scalasca](scalasca.html). Installed versions ------------------ +There are currently two versions of Score-P version 1.2.6 [modules](../../environment-and-modules.html) installed on Anselm : -There are currently two versions of Score-P version 1.2.6 -[modules](../../environment-and-modules.html) installed -on Anselm : - -- class="s1">scorep/1.2.3-gcc-openmpi>, for usage - with [GNU - Compiler](../compilers.html)> and [OpenMPI](../mpi-1/Running_OpenMPI.html){.internal>, - -- class="s1">scorep/1.2.3-icc-impi>, for usage - with [Intel - Compiler](../compilers.html)> and [Intel - MPI](../mpi-1/running-mpich2.html)>. +- scorep/1.2.3-gcc-openmpi, for usage with [GNU Compiler](../compilers.html) and [OpenMPI](../mpi-1/Running_OpenMPI.html) +- scorep/1.2.3-icc-impi, for usage with [Intel Compiler](../compilers.html)> and [Intel MPI](../mpi-1/running-mpich2.html)>. Instrumentation --------------- +There are three ways to instrument your parallel applications in order to enable performance data collection : -There are three ways to instrument your parallel applications in -order to enable performance data collection : - -1. >Automated instrumentation using compiler -2. >Manual instrumentation using API calls -3. >Manual instrumentation using directives +1. Automated instrumentation using compiler +2. Manual instrumentation using API calls +3. Manual instrumentation using directives ### Automated instrumentation -is the easiest method. Score-P will automatically add instrumentation to -every routine entry and exit using compiler hooks, and will intercept -MPI calls and OpenMP regions. This method might, however, produce a -large number of data. If you want to focus on profiler a specific -regions of your code, consider using the manual instrumentation methods. -To use automated instrumentation, simply prepend -scorep to your compilation command. For -example, replace : +is the easiest method. Score-P will automatically add instrumentation to every routine entry and exit using compiler hooks, and will intercept MPI calls and OpenMP regions. This method might, however, produce a large number of data. If you want to focus on profiler a specific regions of your code, consider using the manual instrumentation methods. To use automated instrumentation, simply prepend scorep to your compilation command. For example, replace: -` +```bash $ mpif90 -c foo.f90 $ mpif90 -c bar.f90 $ mpif90 -o myapp foo.o bar.o -` +``` -with : +with: -` +```bash $ scorep mpif90 -c foo.f90 $ scorep mpif90 -c bar.f90 $ scorep mpif90 -o myapp foo.o bar.o -` +``` -Usually your program is compiled using a Makefile or similar script, so -it advisable to add the scorep command to -your definition of variables CC, -CXX, class="monospace">FCC etc. +Usually your program is compiled using a Makefile or similar script, so it advisable to add the scorep command to your definition of variables CC, CXX, FCC etc. -It is important that scorep is prepended -also to the linking command, in order to link with Score-P -instrumentation libraries. +It is important that scorep is prepended also to the linking command, in order to link with Score-P instrumentation libraries. ###Manual instrumentation using API calls -To use this kind of instrumentation, use -scorep with switch ---user. You will then mark regions to be -instrumented by inserting API calls. +To use this kind of instrumentation, use scorep with switch --user. You will then mark regions to be instrumented by inserting API calls. An example in C/C++ : +```cpp #include <scorep/SCOREP_User.h> void foo() { @@ -90,9 +60,11 @@ An example in C/C++ : // do something SCOREP_USER_REGION_END( my_region_handle ) } +```  and Fortran : +```cpp #include "scorep/SCOREP_User.inc" subroutine foo SCOREP_USER_REGION_DEFINE( my_region_handle ) @@ -101,18 +73,17 @@ An example in C/C++ : ! do something SCOREP_USER_REGION_END( my_region_handle ) end subroutine foo +``` -Please refer to the [documentation for description of the -API](https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf). +Please refer to the [documentation for description of the API](https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf). ###Manual instrumentation using directives -This method uses POMP2 directives to mark regions to be instrumented. To -use this method, use command scorep ---pomp. +This method uses POMP2 directives to mark regions to be instrumented. To use this method, use command scorep --pomp. Example directives in C/C++ : +```cpp void foo(...) { /* declarations */ @@ -126,9 +97,11 @@ Example directives in C/C++ : ... #pragma pomp inst end(foo) } +``` and in Fortran : +```cpp subroutine foo(...) !declarations !POMP$ INST BEGIN(foo) @@ -140,9 +113,6 @@ and in Fortran : ... !POMP$ INST END(foo) end subroutine foo +``` -The directives are ignored if the program is compiled without Score-P. -Again, please refer to the -[documentation](https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf) -for a more elaborate description. - +The directives are ignored if the program is compiled without Score-P. Again, please refer to the [documentation](https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf) for a more elaborate description. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/summary.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/summary.md deleted file mode 100644 index 2b1f34c23030b3dc1f0e1ad95218e114964272ff..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/summary.md +++ /dev/null @@ -1,86 +0,0 @@ -Debuggers and profilers summary -=============================== - - - -Introduction ------------- - -We provide state of the art programms and tools to develop, profile and -debug HPC codes at IT4Innovations. -On these pages, we provide an overview of the profiling and debugging -tools available on Anslem at IT4I. - -Intel debugger --------------- - -The intel debugger version 13.0 is available, via module intel. The -debugger works for applications compiled with C and C++ compiler and the -ifort fortran 77/90/95 compiler. The debugger provides java GUI -environment. Use [X -display](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -for running the GUI. - - $ module load intel - $ idb - -Read more at the [Intel -Debugger](../intel-suite/intel-debugger.html) page. - -Allinea Forge (DDT/MAP) ------------------------ - -Allinea DDT, is a commercial debugger primarily for debugging parallel -MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel -Xeon Phi accelerators. DDT provides all the standard debugging features -(stack trace, breakpoints, watches, view variables, threads etc.) for -every thread running as part of your program, or for every process - -even if these processes are distributed across a cluster using an MPI -implementation. - - $ module load Forge - $ forge - -Read more at the [Allinea DDT](allinea-ddt.html) page. - -Allinea Performance Reports ---------------------------- - -Allinea Performance Reports characterize the performance of HPC -application runs. After executing your application through the tool, a -synthetic HTML report is generated automatically, containing information -about several metrics along with clear behavior statements and hints to -help you improve the efficiency of your runs. Our license is limited to -64 MPI processes. - - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 - -Read more at the [Allinea Performance -Reports](allinea-performance-reports.html) page. - -RougeWave Totalview -------------------- - -TotalView is a source- and machine-level debugger for multi-process, -multi-threaded programs. Its wide range of tools provides ways to -analyze, organize, and test programs, making it easy to isolate and -identify problems in individual threads and processes in programs of -great complexity. - - $ module load totalview - $ totalview - -Read more at the [Totalview](total-view.html) page. - -Vampir trace analyzer ---------------------- - -Vampir is a GUI trace analyzer for traces in OTF format. - - $ module load Vampir/8.5.0 - $ vampir - -Read more at -the [Vampir](../../../salomon/software/debuggers/vampir.html) page. - diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md index 6299f5b9f8fcb19bc5381389b708af21b4b94ff3..cc7f78ef4b80046b97294d8657b2db923ccdacaf 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/total-view.md @@ -1,22 +1,22 @@ -Total View +Total View ========== -TotalView is a GUI-based source code multi-process, multi-thread -debugger. +##TotalView is a GUI-based source code multi-process, multi-thread debugger. License and Limitations for Anselm Users ---------------------------------------- +On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. These limitation means that: -On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel -processes. These limitation means that: - +```bash    1 user can debug up 64 processes, or    32 users can debug 2 processes, etc. +``` Debugging of GPU accelerated codes is also supported. You can check the status of the licenses here: +```bash cat /apps/user/licenses/totalview_features_state.txt # totalview @@ -26,73 +26,75 @@ You can check the status of the licenses here: TotalView_Team                    64     0    64 Replay                            64     0    64 CUDA                              64     0    64 +``` Compiling Code to run with TotalView ------------------------------------ - ### Modules Load all necessary modules to compile the code. For example: +```bash module load intel module load impi  ... or ... module load openmpi/X.X.X-icc +``` Load the TotalView module: +```bash module load totalview/8.12 +``` Compile the code: +```bash mpicc -g -O0 -o test_debug test.c mpif90 -g -O0 -o test_debug test.f +``` ### Compiler flags Before debugging, you need to compile your code with theses flags: --g** : Generates extra debugging information usable by GDB. -g3** -includes even more debugging information. This option is available for -GNU and INTEL C/C++ and Fortran compilers. +>**-g** : Generates extra debugging information usable by GDB. **-g3** includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. --O0** : Suppress all optimizations.** +>**-O0** : Suppress all optimizations. Starting a Job with TotalView ----------------------------- +Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: -Be sure to log in with an X window forwarding enabled. This could mean -using the -X in the ssh: - - ssh -X username@anselm.it4i.cz +```bash + ssh -X username@anselm.it4i.cz +``` -Other options is to access login node using VNC. Please see the detailed -information on how to use graphic user interface on Anselm -[here](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33#VNC). +Other options is to access login node using VNC. Please see the detailed information on how to use graphic user interface on Anselm [here](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33#VNC). -From the login node an interactive session with X windows forwarding (-X -option) can be started by following command: +From the login node an interactive session with X windows forwarding (-X option) can be started by following command: - qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +```bash + qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 +``` -Then launch the debugger with the totalview command followed by the name -of the executable to debug. +Then launch the debugger with the totalview command followed by the name of the executable to debug. ### Debugging a serial code To debug a serial code use: +```bash totalview test_debug +``` ### Debugging a parallel code - option 1 -To debug a parallel code compiled with >**OpenMPI** you need -to setup your TotalView environment: +To debug a parallel code compiled with **OpenMPI** you need to setup your TotalView environment: -Please note:** To be able to run parallel debugging procedure from the -command line without stopping the debugger in the mpiexec source code -you have to add the following function to your **~/.tvdrc** file: +>**Please note:** To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file: +```bash proc mpi_auto_run_starter {loaded_id} {    set starter_programs {mpirun mpiexec orterun}    set executable_name [TV::symbol get $loaded_id full_pathname] @@ -110,56 +112,49 @@ you have to add the following function to your **~/.tvdrc** file: # TotalView run this program automatically. dlappend TV::image_load_callbacks mpi_auto_run_starter - +``` The source code of this function can be also found in +```bash /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl +``` -You can also add only following line to you ~/.tvdrc file instead of -the entire function: - -source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl** +>You can also add only following line to you ~/.tvdrc file instead of the entire function: +**source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl** You need to do this step only once. Now you can run the parallel debugger using: +```bash mpirun -tv -n 5 ./test_debug +``` When following dialog appears click on "Yes"  -At this point the main TotalView GUI window will appear and you can -insert the breakpoints and start debugging: +At this point the main TotalView GUI window will appear and you can insert the breakpoints and start debugging:  ### Debugging a parallel code - option 2 -Other option to start new parallel debugging session from a command line -is to let TotalView to execute mpirun by itself. In this case user has -to specify a MPI implementation used to compile the source code. +Other option to start new parallel debugging session from a command line is to let TotalView to execute mpirun by itself. In this case user has to specify a MPI implementation used to compile the source code. -The following example shows how to start debugging session with Intel -MPI: +The following example shows how to start debugging session with Intel MPI: +```bash module load intel/13.5.192 impi/4.1.1.036 totalview/8/13 totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi +``` -After running previous command you will see the same window as shown in -the screenshot above. +After running previous command you will see the same window as shown in the screenshot above. -More information regarding the command line parameters of the TotalView -can be found TotalView Reference Guide, Chapter 7: TotalView Command -Syntax.  +More information regarding the command line parameters of the TotalView can be found TotalView Reference Guide, Chapter 7: TotalView Command Syntax. Documentation ------------- - -[1] The [TotalView -documentation](http://www.roguewave.com/support/product-documentation/totalview-family.aspx#totalview) -web page is a good resource for learning more about some of the advanced -TotalView features. +[1] The [TotalView documentation](http://www.roguewave.com/support/product-documentation/totalview-family.aspx#totalview) web page is a good resource for learning more about some of the advanced TotalView features. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md index 615b26a1fe671bd05d586143f70f955b30f09b93..b70e955d178b41166ecaf1b7619737354d3424ac 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/valgrind.md @@ -1,4 +1,4 @@ -Valgrind +Valgrind ======== Valgrind is a tool for memory debugging and profiling. @@ -6,61 +6,36 @@ Valgrind is a tool for memory debugging and profiling. About Valgrind -------------- -Valgrind is an open-source tool, used mainly for debuggig memory-related -problems, such as memory leaks, use of uninitalized memory etc. in C/C++ -applications. The toolchain was however extended over time with more -functionality, such as debugging of threaded applications, cache -profiling, not limited only to C/C++. +Valgrind is an open-source tool, used mainly for debuggig memory-related problems, such as memory leaks, use of uninitalized memory etc. in C/C++ applications. The toolchain was however extended over time with more functionality, such as debugging of threaded applications, cache profiling, not limited only to C/C++. -Valgind is an extremely useful tool for debugging memory errors such as -[off-by-one](http://en.wikipedia.org/wiki/Off-by-one_error). -Valgrind uses a virtual machine and dynamic recompilation of binary -code, because of that, you can expect that programs being debugged by -Valgrind run 5-100 times slower. +Valgind is an extremely useful tool for debugging memory errors such as [off-by-one](http://en.wikipedia.org/wiki/Off-by-one_error). Valgrind uses a virtual machine and dynamic recompilation of binary code, because of that, you can expect that programs being debugged by Valgrind run 5-100 times slower. The main tools available in Valgrind are : -- **Memcheck**, the original, must used and default tool. Verifies - memory access in you program and can detect use of unitialized - memory, out of bounds memory access, memory leaks, double free, etc. +- **Memcheck**, the original, must used and default tool. Verifies memory access in you program and can detect use of unitialized memory, out of bounds memory access, memory leaks, double free, etc. - **Massif**, a heap profiler. -- **Hellgrind** and **DRD** can detect race conditions in - multi-threaded applications. +- **Hellgrind** and **DRD** can detect race conditions in multi-threaded applications. - **Cachegrind**, a cache profiler. - **Callgrind**, a callgraph analyzer. -- For a full list and detailed documentation, please refer to the - [official Valgrind - documentation](http://valgrind.org/docs/). +- For a full list and detailed documentation, please refer to the [official Valgrind documentation](http://valgrind.org/docs/). Installed versions ------------------ - There are two versions of Valgrind available on Anselm. -- >Version 3.6.0, installed by operating system vendor - in /usr/bin/valgrind. - >This version is available by default, without the need - to load any module. This version however does not provide additional - MPI support. -- >Version 3.9.0 with support for Intel MPI, available in - [module](../../environment-and-modules.html) - valgrind/3.9.0-impi. After loading the - module, this version replaces the default valgrind. +- Version 3.6.0, installed by operating system vendor in /usr/bin/valgrind. This version is available by default, without the need to load any module. This version however does not provide additional MPI support. +- Version 3.9.0 with support for Intel MPI, available in [module](../../environment-and-modules.html) valgrind/3.9.0-impi. After loading the module, this version replaces the default valgrind. Usage ----- - -Compile the application which you want to debug as usual. It is -advisable to add compilation flags -g (to -add debugging information to the binary so that you will see original -source code lines in the output) and -O0 -(to disable compiler optimizations). +Compile the application which you want to debug as usual. It is advisable to add compilation flags -g (to add debugging information to the binary so that you will see original source code lines in the output) and -O0 (to disable compiler optimizations). For example, lets look at this C code, which has two problems : +```cpp #include <stdlib.h> - void f(void) + void f(void) { int* x = malloc(10 * sizeof(int)); x[10] = 0; // problem 1: heap block overrun @@ -71,27 +46,28 @@ For example, lets look at this C code, which has two problems : f(); return 0; } +``` Now, compile it with Intel compiler : +```bash $ module add intel - $ icc -g valgrind-example.c -o valgrind-example + $ icc -g valgrind-example.c -o valgrind-example +``` Now, lets run it with Valgrind. The syntax is : - valgrind [valgrind options] <your program -binary> [your program options] + *valgrind [valgrind options] <your program binary> [your program options]* -If no Valgrind options are specified, Valgrind defaults to running -Memcheck tool. Please refer to the Valgrind documentation for a full -description of command line options. +If no Valgrind options are specified, Valgrind defaults to running Memcheck tool. Please refer to the Valgrind documentation for a full description of command line options. +```bash $ valgrind ./valgrind-example ==12652== Memcheck, a memory error detector ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==12652== Command: ./valgrind-example - ==12652== + ==12652== ==12652== Invalid write of size 4 ==12652== at 0x40053E: f (valgrind-example.c:6) ==12652== by 0x40054E: main (valgrind-example.c:11) @@ -99,12 +75,12 @@ description of command line options. ==12652== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) ==12652== by 0x400528: f (valgrind-example.c:5) ==12652== by 0x40054E: main (valgrind-example.c:11) - ==12652== - ==12652== + ==12652== + ==12652== ==12652== HEAP SUMMARY: ==12652== in use at exit: 40 bytes in 1 blocks ==12652== total heap usage: 1 allocs, 0 frees, 40 bytes allocated - ==12652== + ==12652== ==12652== LEAK SUMMARY: ==12652== definitely lost: 40 bytes in 1 blocks ==12652== indirectly lost: 0 bytes in 0 blocks @@ -112,21 +88,20 @@ description of command line options. ==12652== still reachable: 0 bytes in 0 blocks ==12652== suppressed: 0 bytes in 0 blocks ==12652== Rerun with --leak-check=full to see details of leaked memory - ==12652== + ==12652== ==12652== For counts of detected and suppressed errors, rerun with: -v ==12652== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) +``` -In the output we can see that Valgrind has detected both errors - the -off-by-one memory access at line 5 and a memory leak of 40 bytes. If we -want a detailed analysis of the memory leak, we need to run Valgrind -with --leak-check=full option : +In the output we can see that Valgrind has detected both errors - the off-by-one memory access at line 5 and a memory leak of 40 bytes. If we want a detailed analysis of the memory leak, we need to run Valgrind with --leak-check=full option : +```bash $ valgrind --leak-check=full ./valgrind-example ==23856== Memcheck, a memory error detector ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info ==23856== Command: ./valgrind-example - ==23856== + ==23856== ==23856== Invalid write of size 4 ==23856== at 0x40067E: f (valgrind-example.c:6) ==23856== by 0x40068E: main (valgrind-example.c:11) @@ -134,42 +109,41 @@ with --leak-check=full option : ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) ==23856== by 0x400668: f (valgrind-example.c:5) ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== - ==23856== + ==23856== + ==23856== ==23856== HEAP SUMMARY: ==23856== in use at exit: 40 bytes in 1 blocks ==23856== total heap usage: 1 allocs, 0 frees, 40 bytes allocated - ==23856== + ==23856== ==23856== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) ==23856== by 0x400668: f (valgrind-example.c:5) ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== + ==23856== ==23856== LEAK SUMMARY: ==23856== definitely lost: 40 bytes in 1 blocks ==23856== indirectly lost: 0 bytes in 0 blocks ==23856== possibly lost: 0 bytes in 0 blocks ==23856== still reachable: 0 bytes in 0 blocks ==23856== suppressed: 0 bytes in 0 blocks - ==23856== + ==23856== ==23856== For counts of detected and suppressed errors, rerun with: -v ==23856== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6) +``` -Now we can see that the memory leak is due to the -malloc() at line 6. +Now we can see that the memory leak is due to the malloc() at line 6. Usage with MPI --------------------------- +Although Valgrind is not primarily a parallel debugger, it can be used to debug parallel applications as well. When launching your parallel applications, prepend the valgrind command. For example : -Although Valgrind is not primarily a parallel debugger, it can be used -to debug parallel applications as well. When launching your parallel -applications, prepend the valgrind command. For example : - +```bash $ mpirun -np 4 valgrind myapplication +``` -The default version without MPI support will however report a large -number of false errors in the MPI library, such as : +The default version without MPI support will however report a large number of false errors in the MPI library, such as : +```bash ==30166== Conditional jump or move depends on uninitialised value(s) ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) @@ -182,17 +156,15 @@ number of false errors in the MPI library, such as : ==30166== by 0x554650A: MPIR_Init_thread (initthread.c:539) ==30166== by 0x553369F: PMPI_Init (init.c:195) ==30166== by 0x4008BD: main (valgrind-example-mpi.c:18) +``` -so it is better to use the MPI-enabled valgrind from module. The MPI -version requires library -/apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so, -which must be included in the LD_PRELOAD -environment variable. +so it is better to use the MPI-enabled valgrind from module. The MPI version requires library /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so, which must be included in the LD_PRELOAD environment variable. Lets look at this MPI example : +```cpp #include <stdlib.h> - #include <mpi.h> + #include <mpi.h> int main(int argc, char *argv[]) { @@ -200,32 +172,34 @@ Lets look at this MPI example :      MPI_Init(&argc, &argv);     MPI_Bcast(data, 100, MPI_INT, 0, MPI_COMM_WORLD); -      MPI_Finalize(); +      MPI_Finalize();        return 0; } +``` -There are two errors - use of uninitialized memory and invalid length of -the buffer. Lets debug it with valgrind : +There are two errors - use of uninitialized memory and invalid length of the buffer. Lets debug it with valgrind : +```bash $ module add intel impi $ mpicc -g valgrind-example-mpi.c -o valgrind-example-mpi $ module add valgrind/3.9.0-impi $ mpirun -np 2 -env LD_PRELOAD /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi +``` -Prints this output : (note that there is output printed for every -launched MPI process) +Prints this output : (note that there is output printed for every launched MPI process) +```bash ==31318== Memcheck, a memory error detector ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==31318== Command: ./valgrind-example-mpi - ==31318== + ==31318== ==31319== Memcheck, a memory error detector ==31319== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==31319== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==31319== Command: ./valgrind-example-mpi - ==31319== + ==31319== valgrind MPI wrappers 31319: Active for pid 31319 valgrind MPI wrappers 31319: Try MPIWRAP_DEBUG=help for possible options valgrind MPI wrappers 31318: Active for pid 31318 @@ -237,7 +211,7 @@ launched MPI process) ==31319== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd ==31319== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) ==31319== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31319== + ==31319== ==31318== Uninitialised byte(s) found during client check request ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) @@ -245,7 +219,7 @@ launched MPI process) ==31318== Address 0x6929040 is 0 bytes inside a block of size 396 alloc'd ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31318== + ==31318== ==31318== Unaddressable byte(s) found during client check request ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) @@ -253,17 +227,17 @@ launched MPI process) ==31318== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31318== - ==31318== + ==31318== + ==31318== ==31318== HEAP SUMMARY: ==31318== in use at exit: 3,172 bytes in 67 blocks ==31318== total heap usage: 191 allocs, 124 frees, 81,203 bytes allocated - ==31318== - ==31319== + ==31318== + ==31319== ==31319== HEAP SUMMARY: ==31319== in use at exit: 3,172 bytes in 67 blocks ==31319== total heap usage: 175 allocs, 108 frees, 48,435 bytes allocated - ==31319== + ==31319== ==31318== LEAK SUMMARY: ==31318== definitely lost: 408 bytes in 3 blocks ==31318== indirectly lost: 256 bytes in 1 blocks @@ -271,7 +245,7 @@ launched MPI process) ==31318== still reachable: 2,508 bytes in 63 blocks ==31318== suppressed: 0 bytes in 0 blocks ==31318== Rerun with --leak-check=full to see details of leaked memory - ==31318== + ==31318== ==31318== For counts of detected and suppressed errors, rerun with: -v ==31318== Use --track-origins=yes to see where uninitialised values come from ==31318== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4) @@ -282,11 +256,10 @@ launched MPI process) ==31319== still reachable: 2,508 bytes in 63 blocks ==31319== suppressed: 0 bytes in 0 blocks ==31319== Rerun with --leak-check=full to see details of leaked memory - ==31319== + ==31319== ==31319== For counts of detected and suppressed errors, rerun with: -v ==31319== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) +``` -We can see that Valgrind has reported use of unitialised memory on the -master process (which reads the array to be broadcasted) and use of -unaddresable memory on both processes. +We can see that Valgrind has reported use of unitialised memory on the master process (which reads the array to be broadcasted) and use of unaddresable memory on both processes. diff --git a/docs.it4i/anselm-cluster-documentation/software/debuggers/vampir.md b/docs.it4i/anselm-cluster-documentation/software/debuggers/vampir.md index 25fb484c6bf776ca15594007823269357514b8d3..1575596251bdbb2d1fe719ae4faa34fb48c67dc2 100644 --- a/docs.it4i/anselm-cluster-documentation/software/debuggers/vampir.md +++ b/docs.it4i/anselm-cluster-documentation/software/debuggers/vampir.md @@ -1,33 +1,23 @@ -Vampir +Vampir ====== -Vampir is a commercial trace analysis and visualisation tool. It can -work with traces in OTF and OTF2 formats. It does not have the -functionality to collect traces, you need to use a trace collection tool -(such -as [Score-P](../../../salomon/software/debuggers/score-p.html)) -first to collect the traces. +Vampir is a commercial trace analysis and visualisation tool. It can work with traces in OTF and OTF2 formats. It does not have the functionality to collect traces, you need to use a trace collection tool (such as [Score-P](../../../salomon/software/debuggers/score-p.html)) first to collect the traces.  -------------------------------------- Installed versions ------------------ +Version 8.5.0 is currently installed as module Vampir/8.5.0 : -Version 8.5.0 is currently installed as module -Vampir/8.5.0 : - +```bash $ module load Vampir/8.5.0 $ vampir & +``` User manual ----------- - -You can find the detailed user manual in PDF format in -$EBROOTVAMPIR/doc/vampir-manual.pdf +You can find the detailed user manual in PDF format in $EBROOTVAMPIR/doc/vampir-manual.pdf References ---------- - -1. <https://www.vampir.eu> - +[1]. <https://www.vampir.eu> \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/gpi2.md b/docs.it4i/anselm-cluster-documentation/software/gpi2.md index 79cf34786cbae1912747eb9dc563d189cc9c8df7..4e49e6b61fee8e2694bddfd0385df838b191671e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/gpi2.md +++ b/docs.it4i/anselm-cluster-documentation/software/gpi2.md @@ -1,123 +1,109 @@ -GPI-2 +GPI-2 ===== -A library that implements the GASPI specification - - +##A library that implements the GASPI specification Introduction ------------ +Programming Next Generation Supercomputers: GPI-2 is an API library for asynchronous interprocess, cross-node communication. It provides a flexible, scalable and fault tolerant interface for parallel applications. -Programming Next Generation Supercomputers: GPI-2 is an API library for -asynchronous interprocess, cross-node communication. It provides a -flexible, scalable and fault tolerant interface for parallel -applications. - -The GPI-2 library -([www.gpi-site.com/gpi2/](http://www.gpi-site.com/gpi2/)) -implements the GASPI specification (Global Address Space Programming -Interface, -[www.gaspi.de](http://www.gaspi.de/en/project.html)). -GASPI is a Partitioned Global Address Space (PGAS) API. It aims at -scalable, flexible and failure tolerant computing in massively parallel -environments. +The GPI-2 library ([www.gpi-site.com/gpi2/](http://www.gpi-site.com/gpi2/)) implements the GASPI specification (Global Address Space Programming Interface, [www.gaspi.de](http://www.gaspi.de/en/project.html)). GASPI is a Partitioned Global Address Space (PGAS) API. It aims at scalable, flexible and failure tolerant computing in massively parallel environments. Modules ------- - The GPI-2, version 1.0.2 is available on Anselm via module gpi2: +```bash $ module load gpi2 +``` -The module sets up environment variables, required for linking and -running GPI-2 enabled applications. This particular command loads the -default module, which is gpi2/1.0.2 +The module sets up environment variables, required for linking and running GPI-2 enabled applications. This particular command loads the default module, which is gpi2/1.0.2 Linking ------- +>Link with -lGPI2 -libverbs -Link with -lGPI2 -libverbs - -Load the gpi2 module. Link using **-lGPI2** and *** **-libverbs** -switches to link your code against GPI-2. The GPI-2 requires the OFED -infinband communication library ibverbs. +Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link your code against GPI-2. The GPI-2 requires the OFED infinband communication library ibverbs. ### Compiling and linking with Intel compilers +```bash $ module load intel $ module load gpi2 $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs +``` ### Compiling and linking with GNU compilers +```bash $ module load gcc $ module load gpi2 $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs +``` Running the GPI-2 codes ----------------------- - gaspi_run -gaspi_run starts the GPI-2 application +>gaspi_run starts the GPI-2 application The gaspi_run utility is used to start and run GPI-2 applications: +```bash $ gaspi_run -m machinefile ./myprog.x +``` -A machine file (**machinefile**) with the hostnames of nodes where the -application will run, must be provided. The*** machinefile lists all -nodes on which to run, one entry per node per process. This file may be -hand created or obtained from standard $PBS_NODEFILE: +A machine file (**machinefile**) with the hostnames of nodes where the application will run, must be provided. The machinefile lists all nodes on which to run, one entry per node per process. This file may be hand created or obtained from standard $PBS_NODEFILE: +```bash $ cut -f1 -d"." $PBS_NODEFILE > machinefile +``` machinefile: +```bash cn79 cn80 +``` -This machinefile will run 2 GPI-2 processes, one on node cn79 other on -node cn80. +This machinefile will run 2 GPI-2 processes, one on node cn79 other on node cn80. machinefle: +```bash cn79 cn79 cn80 cn80 +``` -This machinefile will run 4 GPI-2 processes, 2 on node cn79 o 2 on node -cn80. +This machinefile will run 4 GPI-2 processes, 2 on node cn79 o 2 on node cn80. -Use the **mpiprocs** to control how many GPI-2 processes will run per -node +>Use the **mpiprocs** to control how many GPI-2 processes will run per node Example: - $ qsub -A OPEN-0-0 -q qexp -l select=2:ncpus=16:mpiprocs=16 -I +```bash + $ qsub -A OPEN-0-0 -q qexp -l select=2:ncpus=16:mpiprocs=16 -I +``` This example will produce $PBS_NODEFILE with 16 entries per node. ### gaspi_logger -gaspi_logger views the output form GPI-2 application ranks +>gaspi_logger views the output form GPI-2 application ranks -The gaspi_logger utility is used to view the output from all nodes -except the master node (rank 0). The gaspi_logger is started, on -another session, on the master node - the node where the gaspi_run is -executed. The output of the application, when called with -gaspi_printf(), will be redirected to the gaspi_logger. Other I/O -routines (e.g. printf) will not. +The gaspi_logger utility is used to view the output from all nodes except the master node (rank 0). The gaspi_logger is started, on another session, on the master node - the node where the gaspi_run is executed. The output of the application, when called with gaspi_printf(), will be redirected to the gaspi_logger. Other I/O routines (e.g. printf) will not. Example ------- Following is an example GPI-2 enabled code: +```cpp #include <GASPI.h> #include <stdlib.h> - + void success_or_exit ( const char* file, const int line, const int ec) { if (ec != GASPI_SUCCESS) @@ -126,37 +112,41 @@ Following is an example GPI-2 enabled code: exit (1); } } - + #define ASSERT(ec) success_or_exit (__FILE__, __LINE__, ec); - + int main(int argc, char *argv[]) { gaspi_rank_t rank, num; gaspi_return_t ret; - + /* Initialize GPI-2 */ ASSERT( gaspi_proc_init(GASPI_BLOCK) ); - + /* Get ranks information */ ASSERT( gaspi_proc_rank(&rank) ); ASSERT( gaspi_proc_num(&num) ); - + gaspi_printf("Hello from rank %d of %dn", rank, num); - + /* Terminate */ ASSERT( gaspi_proc_term(GASPI_BLOCK) ); - + return 0; } +``` Load modules and compile: +```bash $ module load gcc gpi2 $ gcc helloworld_gpi.c -o helloworld_gpi.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs +``` Submit the job and run the GPI-2 application +```bash $ qsub -q qexp -l select=2:ncpus=1:mpiprocs=1,place=scatter,walltime=00:05:00 -I qsub: waiting for job 171247.dm2 to start qsub: job 171247.dm2 ready @@ -165,18 +155,15 @@ Submit the job and run the GPI-2 application cn79 $ cut -f1 -d"." $PBS_NODEFILE > machinefile cn79 $ gaspi_run -m machinefile ./helloworld_gpi.x Hello from rank 0 of 2 +``` At the same time, in another session, you may start the gaspi logger: +```bash $ ssh cn79 cn79 $ gaspi_logger GASPI Logger (v1.1) [cn80:0] Hello from rank 1 of 2 +``` -In this example, we compile the helloworld_gpi.c code using the **gnu -compiler** (gcc) and link it to the GPI-2 and ibverbs library. The -library search path is compiled in. For execution, we use the qexp -queue, 2 nodes 1 core each. The GPI module must be loaded on the master -compute node (in this example the cn79), gaspi_logger is used from -different session to view the output of the second process. - +In this example, we compile the helloworld_gpi.c code using the **gnu compiler** (gcc) and link it to the GPI-2 and ibverbs library. The library search path is compiled in. For execution, we use the qexp queue, 2 nodes 1 core each. The GPI module must be loaded on the master compute node (in this example the cn79), gaspi_logger is used from different session to view the output of the second process. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite.md deleted file mode 100644 index 7c41a4badd18bd8868a96cf81a5bfd780f2daef5..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite.md +++ /dev/null @@ -1,93 +0,0 @@ -Intel Parallel Studio -===================== - - - -The Anselm cluster provides following elements of the Intel Parallel -Studio XE - - Intel Parallel Studio XE - ------------------------------------------------- - Intel Compilers - Intel Debugger - Intel MKL Library - Intel Integrated Performance Primitives Library - Intel Threading Building Blocks Library - -Intel compilers ---------------- - -The Intel compilers version 13.1.3 are available, via module intel. The -compilers include the icc C and C++ compiler and the ifort fortran -77/90/95 compiler. - - $ module load intel - $ icc -v - $ ifort -v - -Read more at the [Intel -Compilers](intel-suite/intel-compilers.html) page. - -Intel debugger --------------- - - The intel debugger version 13.0 is available, via module intel. The -debugger works for applications compiled with C and C++ compiler and the -ifort fortran 77/90/95 compiler. The debugger provides java GUI -environment. Use [X -display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -for running the GUI. - - $ module load intel - $ idb - -Read more at the [Intel -Debugger](intel-suite/intel-debugger.html) page. - -Intel Math Kernel Library -------------------------- - -Intel Math Kernel Library (Intel MKL) is a library of math kernel -subroutines, extensively threaded and optimized for maximum performance. -Intel MKL unites and provides these basic components: BLAS, LAPACK, -ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and -many more. - - $ module load mkl - -Read more at the [Intel MKL](intel-suite/intel-mkl.html) -page. - -Intel Integrated Performance Primitives ---------------------------------------- - -Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX -is available, via module ipp. The IPP is a library of highly optimized -algorithmic building blocks for media and data applications. This -includes signal, image and frame processing algorithms, such as FFT, -FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many -more. - - $ module load ipp - -Read more at the [Intel -IPP](intel-suite/intel-integrated-performance-primitives.html) -page. - -Intel Threading Building Blocks -------------------------------- - -Intel Threading Building Blocks (Intel TBB) is a library that supports -scalable parallel programming using standard ISO C++ code. It does not -require special languages or compilers. It is designed to promote -scalable data parallel programming. Additionally, it fully supports -nested parallelism, so you can build larger parallel components from -smaller parallel components. To use the library, you specify tasks, not -threads, and let the library map tasks onto threads in an efficient -manner. - - $ module load tbb - -Read more at the [Intel TBB](intel-suite/intel-tbb.html) -page. - diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md index 14ce867f6134899a2dc91b51f5b932d6b752a424..a0fc0a341a657d9c1e61342f08438883093cb0ce 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-compilers.md @@ -1,66 +1,37 @@ -Intel Compilers +Intel Compilers =============== - - -The Intel compilers version 13.1.1 are available, via module intel. The -compilers include the icc C and C++ compiler and the ifort fortran -77/90/95 compiler. +The Intel compilers version 13.1.1 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. +```bash $ module load intel $ icc -v $ ifort -v +``` -The intel compilers provide for vectorization of the code, via the AVX -instructions and support threading parallelization via OpenMP +The intel compilers provide for vectorization of the code, via the AVX instructions and support threading parallelization via OpenMP -For maximum performance on the Anselm cluster, compile your programs -using the AVX instructions, with reporting where the vectorization was -used. We recommend following compilation options for high performance +For maximum performance on the Anselm cluster, compile your programs using the AVX instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance +```bash $ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x $ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x +``` -In this example, we compile the program enabling interprocedural -optimizations between source files (-ipo), aggresive loop optimizations -(-O3) and vectorization (-vec -xAVX) +In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggresive loop optimizations (-O3) and vectorization (-vec -xAVX) -The compiler recognizes the omp, simd, vector and ivdep pragmas for -OpenMP parallelization and AVX vectorization. Enable the OpenMP -parallelization by the **-openmp** compiler switch. +The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch. +```bash $ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x $ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x +``` -Read more at -<http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm> +Read more at <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm> Sandy Bridge/Haswell binary compatibility ----------------------------------------- +Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon will use Haswell architecture. >The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. >To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags >designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing >this: -Anselm nodes are currently equipped with Sandy Bridge CPUs, while -Salomon will use Haswell architecture. >The new processors are -backward compatible with the Sandy Bridge nodes, so all programs that -ran on the Sandy Bridge processors, should also run on the new Haswell -nodes. >To get optimal performance out of the Haswell -processors a program should make use of the special >AVX2 -instructions for this processor. One can do this by recompiling codes -with the compiler flags >designated to invoke these -instructions. For the Intel compiler suite, there are two ways of -doing >this: - -- >Using compiler flag (both for Fortran and C): - -xCORE-AVX2. This will create a - binary class="s1">with AVX2 instructions, specifically - for the Haswell processors. Note that the - executable >will not run on Sandy Bridge nodes. -- >Using compiler flags (both for Fortran and C): - -xAVX -axCORE-AVX2. This - will >generate multiple, feature specific auto-dispatch - code paths for Intel® processors, if there is >a - performance benefit. So this binary will run both on Sandy Bridge - and Haswell >processors. During runtime it will be - decided which path to follow, dependent on - which >processor you are running on. In general this - will result in larger binaries. - +- Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes. +- Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-debugger.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-debugger.md index 35ba0de033b074f26a5a2b1a455f3b3245e012c4..e0fce8a66fd615247a5b3773957b4105084c475f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-debugger.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-debugger.md @@ -1,32 +1,26 @@ -Intel Debugger +Intel Debugger ============== - - Debugging serial applications ----------------------------- +The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. - The intel debugger version 13.0 is available, via module intel. The -debugger works for applications compiled with C and C++ compiler and the -ifort fortran 77/90/95 compiler. The debugger provides java GUI -environment. Use [X -display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -for running the GUI. - +```bash $ module load intel $ idb +``` The debugger may run in text mode. To debug in text mode, use +```bash $ idbc +``` -To debug on the compute nodes, module intel must be loaded. -The GUI on compute nodes may be accessed using the same way as in [the -GUI -section](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) +To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in [the GUI section](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) Example: +```bash $ qsub -q qexp -l select=1:ncpus=16 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19654.srv11 ready @@ -35,63 +29,47 @@ Example: $ module load java $ icc -O0 -g myprog.c -o myprog.x $ idb ./myprog.x +``` -In this example, we allocate 1 full compute node, compile program -myprog.c with debugging options -O0 -g and run the idb debugger -interactively on the myprog.x executable. The GUI access is via X11 port -forwarding provided by the PBS workload manager. +In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger interactively on the myprog.x executable. The GUI access is via X11 port forwarding provided by the PBS workload manager. Debugging parallel applications ------------------------------- - -Intel debugger is capable of debugging multithreaded and MPI parallel -programs as well. +Intel debugger is capable of debugging multithreaded and MPI parallel programs as well. ### Small number of MPI ranks -For debugging small number of MPI ranks, you may execute and debug each -rank in separate xterm terminal (do not forget the [X -display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)). -Using Intel MPI, this may be done in following way: +For debugging small number of MPI ranks, you may execute and debug each rank in separate xterm terminal (do not forget the [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)). Using Intel MPI, this may be done in following way: +```bash $ qsub -q qexp -l select=2:ncpus=16 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready $ module load intel impi $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x +``` -In this example, we allocate 2 full compute node, run xterm on each node -and start idb debugger in command line mode, debugging two ranks of -mympiprog.x application. The xterm will pop up for each rank, with idb -prompt ready. The example is not limited to use of Intel MPI +In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of mympiprog.x application. The xterm will pop up for each rank, with idb prompt ready. The example is not limited to use of Intel MPI ### Large number of MPI ranks -Run the idb debugger from within the MPI debug option. This will cause -the debugger to bind to all ranks and provide aggregated outputs across -the ranks, pausing execution automatically just after startup. You may -then set break points and step the execution manually. Using Intel MPI: +Run the idb debugger from within the MPI debug option. This will cause the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI: +```bash $ qsub -q qexp -l select=2:ncpus=16 -X -I qsub: waiting for job 19654.srv11 to start qsub: job 19655.srv11 ready $ module load intel impi $ mpirun -n 32 -idb ./mympiprog.x +``` ### Debugging multithreaded application -Run the idb debugger in GUI mode. The menu Parallel contains number of -tools for debugging multiple threads. One of the most useful tools is -the **Serialize Execution** tool, which serializes execution of -concurrent threads for easy orientation and identification of -concurrency related bugs. +Run the idb debugger in GUI mode. The menu Parallel contains number of tools for debugging multiple threads. One of the most useful tools is the **Serialize Execution** tool, which serializes execution of concurrent threads for easy orientation and identification of concurrency related bugs. Further information ------------------- - -Exhaustive manual on idb features and usage is published at Intel -website, -<http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/debugger/user_guide/index.htm> +Exhaustive manual on idb features and usage is published at [Intel website](http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/debugger/user_guide/index.htm) diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md index 5cef1f8f67d5d7759953e5579300adeff1c21af1..c88980f6f796a31ad44d00010ee4b1cb9f443b96 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md @@ -1,30 +1,22 @@ -Intel IPP +Intel IPP ========= - - Intel Integrated Performance Primitives --------------------------------------- +Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more. -Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX -vector instructions is available, via module ipp. The IPP is a very rich -library of highly optimized algorithmic building blocks for media and -data applications. This includes signal, image and frame processing -algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough -transform, Sum, MinMax, as well as cryptographic functions, linear -algebra functions and many more. - -Check out IPP before implementing own math functions for data -processing, it is likely already there. +>Check out IPP before implementing own math functions for data processing, it is likely already there. +```bash $ module load ipp +``` -The module sets up environment variables, required for linking and -running ipp enabled applications. +The module sets up environment variables, required for linking and running ipp enabled applications. IPP example ----------- +```cpp #include "ipp.h" #include <stdio.h> int main(int argc, char* argv[]) @@ -63,32 +55,28 @@ IPP example return 0; } +``` - Compile above example, using any compiler and the ipp module. +Compile above example, using any compiler and the ipp module. +```bash $ module load intel $ module load ipp $ icc testipp.c -o testipp.x -lippi -lipps -lippcore +``` -You will need the ipp module loaded to run the ipp enabled executable. -This may be avoided, by compiling library search paths into the -executable +You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable +```bash $ module load intel $ module load ipp $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore +``` Code samples and documentation ------------------------------ +Intel provides number of [Code Samples for IPP](https://software.intel.com/en-us/articles/code-samples-for-intel-integrated-performance-primitives-library), illustrating use of IPP. -Intel provides number of [Code Samples for -IPP](https://software.intel.com/en-us/articles/code-samples-for-intel-integrated-performance-primitives-library), -illustrating use of IPP. - -Read full documentation on IPP [on Intel -website,](http://software.intel.com/sites/products/search/search.php?q=&x=15&y=6&product=ipp&version=7.1&docos=lin) -in particular the [IPP Reference -manual.](http://software.intel.com/sites/products/documentation/doclib/ipp_sa/71/ipp_manual/index.htm) - +Read full documentation on IPP [on Intel website,](http://software.intel.com/sites/products/search/search.php?q=&x=15&y=6&product=ipp&version=7.1&docos=lin) in particular the [IPP Reference manual.](http://software.intel.com/sites/products/documentation/doclib/ipp_sa/71/ipp_manual/index.htm) \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md index 935f78fcce4c90447fcd259318490f69f03fced7..a4792b029cd92198f8758d0cff58640731fa7615 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-mkl.md @@ -1,190 +1,121 @@ -Intel MKL +Intel MKL ========= - - Intel Math Kernel Library ------------------------- +Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels: -Intel Math Kernel Library (Intel MKL) is a library of math kernel -subroutines, extensively threaded and optimized for maximum performance. -Intel MKL provides these basic math kernels: - -- - - - - BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, - offering vector, vector-matrix, and matrix-matrix operations. -- - - - - The PARDISO direct sparse solver, an iterative sparse solver, - and supporting sparse BLAS (level 1, 2, and 3) routines for solving - sparse systems of equations. -- - - - - ScaLAPACK distributed processing linear algebra routines for - Linux* and Windows* operating systems, as well as the Basic Linear - Algebra Communications Subprograms (BLACS) and the Parallel Basic - Linear Algebra Subprograms (PBLAS). -- - - - - Fast Fourier transform (FFT) functions in one, two, or three - dimensions with support for mixed radices (not limited to sizes that - are powers of 2), as well as distributed versions of - these functions. -- - - +- BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations. +- The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations. +- ScaLAPACK distributed processing linear algebra routines for Linux* and Windows* operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). +- Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions. +- Vector Math Library (VML) routines for optimized mathematical operations on vectors. +- Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions. +- Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. +- Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver. - Vector Math Library (VML) routines for optimized mathematical - operations on vectors. -- - - - - Vector Statistical Library (VSL) routines, which offer - high-performance vectorized random number generators (RNG) for - several probability distributions, convolution and correlation - routines, and summary statistics functions. -- - - - - Data Fitting Library, which provides capabilities for - spline-based approximation of functions, derivatives and integrals - of functions, and search. -- Extended Eigensolver, a shared memory version of an eigensolver - based on the Feast Eigenvalue Solver. - -For details see the [Intel MKL Reference -Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm). +For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm). Intel MKL version 13.5.192 is available on Anselm +```bash $ module load mkl +``` -The module sets up environment variables, required for linking and -running mkl enabled applications. The most important variables are the -$MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES +The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES -The MKL library may be linked using any compiler. -With intel compiler use -mkl option to link default threaded MKL. +>The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. ### Interfaces -The MKL library provides number of interfaces. The fundamental once are -the LP64 and ILP64. The Intel MKL ILP64 libraries use the 64-bit integer -type (necessary for indexing large arrays, with more than 231^-1 -elements), whereas the LP64 libraries index arrays with the 32-bit -integer type. +The MKL library provides number of interfaces. The fundamental once are the LP64 and ILP64. The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays, with more than 231^-1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type. - |Interface|Integer type| - ----- |---|---|------------------------------------- - |LP64|32-bit, int, integer(kind=4), MPI_INT| - ILP64 64-bit, long int, integer(kind=8), MPI_INT64 +|Interface|Integer type| +|---|---| +|LP64|32-bit, int, integer(kind=4), MPI_INT| +|ILP64|64-bit, long int, integer(kind=8), MPI_INT64| ### Linking -Linking MKL libraries may be complex. Intel [mkl link line -advisor](http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor) -helps. See also [examples](intel-mkl.html#examples) below. +Linking MKL libraries may be complex. Intel [mkl link line advisor](http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor) helps. See also [examples](intel-mkl.html#examples) below. -You will need the mkl module loaded to run the mkl enabled executable. -This may be avoided, by compiling library search paths into the -executable. Include rpath on the compile line: +You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line: +```bash $ icc .... -Wl,-rpath=$LIBRARY_PATH ... +``` ### Threading -Advantage in using the MKL library is that it brings threaded -parallelization to applications that are otherwise not parallel. +>Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. -For this to work, the application must link the threaded MKL library -(default). Number and behaviour of MKL threads may be controlled via the -OpenMP environment variables, such as OMP_NUM_THREADS and -KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS +For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS +```bash $ export OMP_NUM_THREADS=16 $ export KMP_AFFINITY=granularity=fine,compact,1,0 +``` -The application will run with 16 threads with affinity optimized for -fine grain parallelization. +The application will run with 16 threads with affinity optimized for fine grain parallelization. Examples ------------ - -Number of examples, demonstrating use of the MKL library and its linking -is available on Anselm, in the $MKL_EXAMPLES directory. In the -examples below, we demonstrate linking MKL to Intel and GNU compiled -program for multi-threaded matrix multiplication. +Number of examples, demonstrating use of the MKL library and its linking is available on Anselm, in the $MKL_EXAMPLES directory. In the examples below, we demonstrate linking MKL to Intel and GNU compiled program for multi-threaded matrix multiplication. ### Working with examples +```bash $ module load intel $ module load mkl $ cp -a $MKL_EXAMPLES/cblas /tmp/ $ cd /tmp/cblas $ make sointel64 function=cblas_dgemm +``` -In this example, we compile, link and run the cblas_dgemm example, -demonstrating use of MKL example suite installed on Anselm. +In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on Anselm. ### Example: MKL and Intel compiler +```bash $ module load intel $ module load mkl $ cp -a $MKL_EXAMPLES/cblas /tmp/ $ cd /tmp/cblas - $ + $ $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x $ ./cblas_dgemmx.x data/cblas_dgemmx.d +``` -In this example, we compile, link and run the cblas_dgemm example, -demonstrating use of MKL with icc -mkl option. Using the -mkl option is -equivalent to: +In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to: - $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x +```bash + $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 +``` -In this example, we compile and link the cblas_dgemm example, using -LP64 interface to threaded MKL and Intel OMP threads implementation. +In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation. ### Example: MKL and GNU compiler +```bash $ module load gcc $ module load mkl $ cp -a $MKL_EXAMPLES/cblas /tmp/ $ cd /tmp/cblas - - $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x + + $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm $ ./cblas_dgemmx.x data/cblas_dgemmx.d +``` -In this example, we compile, link and run the cblas_dgemm example, -using LP64 interface to threaded MKL and gnu OMP threads implementation. +In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation. MKL and MIC accelerators ------------------------ - -The MKL is capable to automatically offload the computations o the MIC -accelerator. See section [Intel Xeon -Phi](../intel-xeon-phi.html) for details. +The MKL is capable to automatically offload the computations o the MIC accelerator. See section [Intel XeonPhi](../intel-xeon-phi.html) for details. Further reading --------------- - -Read more on [Intel -website](http://software.intel.com/en-us/intel-mkl), in -particular the [MKL users -guide](https://software.intel.com/en-us/intel-mkl/documentation/linux). - +Read more on [Intel website](http://software.intel.com/en-us/intel-mkl), in particular the [MKL users guide](https://software.intel.com/en-us/intel-mkl/documentation/linux). \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-parallel-studio-introduction.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-parallel-studio-introduction.md deleted file mode 100644 index 26077232e61fd20ffac4312e58be70dcc12c7934..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-parallel-studio-introduction.md +++ /dev/null @@ -1,90 +0,0 @@ -Intel Parallel Studio -===================== - - - -The Anselm cluster provides following elements of the Intel Parallel -Studio XE - - Intel Parallel Studio XE - ------------------------------------------------- - Intel Compilers - Intel Debugger - Intel MKL Library - Intel Integrated Performance Primitives Library - Intel Threading Building Blocks Library - -Intel compilers ---------------- - -The Intel compilers version 13.1.3 are available, via module intel. The -compilers include the icc C and C++ compiler and the ifort fortran -77/90/95 compiler. - - $ module load intel - $ icc -v - $ ifort -v - -Read more at the [Intel Compilers](intel-compilers.html) -page. - -Intel debugger --------------- - - The intel debugger version 13.0 is available, via module intel. The -debugger works for applications compiled with C and C++ compiler and the -ifort fortran 77/90/95 compiler. The debugger provides java GUI -environment. Use [X -display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -for running the GUI. - - $ module load intel - $ idb - -Read more at the [Intel Debugger](intel-debugger.html) -page. - -Intel Math Kernel Library -------------------------- - -Intel Math Kernel Library (Intel MKL) is a library of math kernel -subroutines, extensively threaded and optimized for maximum performance. -Intel MKL unites and provides these basic components: BLAS, LAPACK, -ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and -many more. - - $ module load mkl - -Read more at the [Intel MKL](intel-mkl.html) page. - -Intel Integrated Performance Primitives ---------------------------------------- - -Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX -is available, via module ipp. The IPP is a library of highly optimized -algorithmic building blocks for media and data applications. This -includes signal, image and frame processing algorithms, such as FFT, -FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many -more. - - $ module load ipp - -Read more at the [Intel -IPP](intel-integrated-performance-primitives.html) page. - -Intel Threading Building Blocks -------------------------------- - -Intel Threading Building Blocks (Intel TBB) is a library that supports -scalable parallel programming using standard ISO C++ code. It does not -require special languages or compilers. It is designed to promote -scalable data parallel programming. Additionally, it fully supports -nested parallelism, so you can build larger parallel components from -smaller parallel components. To use the library, you specify tasks, not -threads, and let the library map tasks onto threads in an efficient -manner. - - $ module load tbb - -Read more at the [Intel TBB](intel-tbb.html) page. - diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md index 29c0fa654de6a6bfe8bf4f34b0ba73c756b28a5b..973e31bdfa8c6924222b85392090c673ea68ce85 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/intel-tbb.md @@ -1,54 +1,43 @@ -Intel TBB +Intel TBB ========= - - Intel Threading Building Blocks ------------------------------- - -Intel Threading Building Blocks (Intel TBB) is a library that supports -scalable parallel programming using standard ISO C++ code. It does not -require special languages or compilers. To use the library, you specify -tasks, not threads, and let the library map tasks onto threads in an -efficient manner. The tasks are executed by a runtime scheduler and may -be offloaded to [MIC -accelerator](../intel-xeon-phi.html). +Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. The tasks are executed by a runtime scheduler and may +be offloaded to [MIC accelerator](../intel-xeon-phi.html). Intel TBB version 4.1 is available on Anselm +```bash $ module load tbb +``` -The module sets up environment variables, required for linking and -running tbb enabled applications. +The module sets up environment variables, required for linking and running tbb enabled applications. -Link the tbb library, using -ltbb +>Link the tbb library, using -ltbb Examples -------- +Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory. -Number of examples, demonstrating use of TBB and its built-in scheduler -is available on Anselm, in the $TBB_EXAMPLES directory. - +```bash $ module load intel $ module load tbb $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ $ cd /tmp/parallel_reduce/primes $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb $ ./primes.x +``` -In this example, we compile, link and run the primes example, -demonstrating use of parallel task-based reduce in computation of prime -numbers. +In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime numbers. -You will need the tbb module loaded to run the tbb enabled executable. -This may be avoided, by compiling library search paths into the -executable. +You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable. +```bash $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb +``` Further reading --------------- - -Read more on Intel website, -<http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm> +Read more on Intel website, <http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm> diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-suite/introduction.md b/docs.it4i/anselm-cluster-documentation/software/intel-suite/introduction.md new file mode 100644 index 0000000000000000000000000000000000000000..99867b30bad4389576b5ece35048719b2fb4d365 --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/intel-suite/introduction.md @@ -0,0 +1,65 @@ +Intel Parallel Studio +===================== + +The Anselm cluster provides following elements of the Intel Parallel Studio XE + +|Intel Parallel Studio XE| +|-------------------------------------------------| +|Intel Compilers| +|Intel Debugger| +|Intel MKL Library| +|Intel Integrated Performance Primitives Library| +|Intel Threading Building Blocks Library| + +Intel compilers +--------------- +The Intel compilers version 13.1.3 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. + +```bash + $ module load intel + $ icc -v + $ ifort -v +``` + +Read more at the [Intel Compilers](intel-compilers.html) page. + +Intel debugger +-------------- +The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. + +```bash + $ module load intel + $ idb +``` + +Read more at the [Intel Debugger](intel-debugger.html) page. + +Intel Math Kernel Library +------------------------- +Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. + +```bash + $ module load mkl +``` + +Read more at the [Intel MKL](intel-mkl.html) page. + +Intel Integrated Performance Primitives +--------------------------------------- +Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. + +```bash + $ module load ipp +``` + +Read more at the [Intel IPP](intel-integrated-performance-primitives.html) page. + +Intel Threading Building Blocks +------------------------------- +Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. + +```bash + $ module load tbb +``` + +Read more at the [Intel TBB](intel-tbb.html) page. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md index aba81d0dd10865fbf1eae0bcc866efa394615fe4..d7dd7454e9085bf903534be6be9059adf74afa85 100644 --- a/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/intel-xeon-phi.md @@ -1,35 +1,33 @@ -Intel Xeon Phi +Intel Xeon Phi ============== -A guide to Intel Xeon Phi usage +##A guide to Intel Xeon Phi usage - - -Intel Xeon Phi can be programmed in several modes. The default mode on -Anselm is offload mode, but all modes described in this document are -supported. +Intel Xeon Phi can be programmed in several modes. The default mode on Anselm is offload mode, but all modes described in this document are supported. Intel Utilities for Xeon Phi ---------------------------- +To get access to a compute node with Intel Xeon Phi accelerator, use the PBS interactive session -To get access to a compute node with Intel Xeon Phi accelerator, use the -PBS interactive session - +```bash $ qsub -I -q qmic -A NONE-0-0 +``` To set up the environment module "Intel" has to be loaded +```bash $ module load intel/13.5.192 +``` -Information about the hardware can be obtained by running -the micinfo program on the host. +Information about the hardware can be obtained by running the micinfo program on the host. +```bash $ /usr/bin/micinfo +``` -The output of the "micinfo" utility executed on one of the Anselm node -is as follows. (note: to get PCIe related details the command has to be -run with root privileges) +The output of the "micinfo" utility executed on one of the Anselm node is as follows. (note: to get PCIe related details the command has to be run with root privileges) +```bash MicInfo Utility Log Created Mon Jul 22 00:23:50 2013 @@ -89,27 +87,26 @@ run with root privileges)                GDDR Speed              : 5.000000 GT/s                GDDR Frequency          : 2500000 kHz                GDDR Voltage            : 1501000 uV +``` Offload Mode ------------ +To compile a code for Intel Xeon Phi a MPSS stack has to be installed on the machine where compilation is executed. Currently the MPSS stack is only installed on compute nodes equipped with accelerators. -To compile a code for Intel Xeon Phi a MPSS stack has to be installed on -the machine where compilation is executed. Currently the MPSS stack is -only installed on compute nodes equipped with accelerators. - +```bash $ qsub -I -q qmic -A NONE-0-0 $ module load intel/13.5.192 +``` -For debugging purposes it is also recommended to set environment -variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher -number means more debugging information. +For debugging purposes it is also recommended to set environment variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher number means more debugging information. +```bash export OFFLOAD_REPORT=3 +``` -A very basic example of code that employs offload programming technique -is shown in the next listing. Please note that this code is sequential -and utilizes only single core of the accelerator. +A very basic example of code that employs offload programming technique is shown in the next listing. Please note that this code is sequential and utilizes only single core of the accelerator. +```bash $ vim source-offload.cpp #include <iostream> @@ -127,22 +124,26 @@ and utilizes only single core of the accelerator.    result /= niter;    std::cout << "Pi ~ " << result << 'n'; } +``` To compile a code using Intel compiler run +```bash $ icc source-offload.cpp -o bin-offload +``` To execute the code, run the following command on the host +```bash ./bin-offload +``` ### Parallelization in Offload Mode Using OpenMP -One way of paralelization a code for Xeon Phi is using OpenMP -directives. The following example shows code for parallel vector -addition. +One way of paralelization a code for Xeon Phi is using OpenMP directives. The following example shows code for parallel vector addition. - $ vim ./vect-add +```bash + $ vim ./vect-add #include <stdio.h> @@ -217,68 +218,57 @@ addition.  compare(res, res_cpu, SIZE); } +``` -During the compilation Intel compiler shows which loops have been -vectorized in both host and accelerator. This can be enabled with -compiler option "-vec-report2". To compile and execute the code run +During the compilation Intel compiler shows which loops have been vectorized in both host and accelerator. This can be enabled with compiler option "-vec-report2". To compile and execute the code run +```bash $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add - $ ./vect-add + $ ./vect-add +``` Some interesting compiler flags useful not only for code debugging are: -Debugging - openmp_report[0|1|2] - controls the compiler based vectorization -diagnostic level - vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic -level +>Debugging + openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level + vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level -Performance ooptimization - xhost - FOR HOST ONLY - to generate AVX (Advanced Vector Extensions) -instructions. +>Performance ooptimization + xhost - FOR HOST ONLY - to generate AVX (Advanced Vector Extensions) instructions. Automatic Offload using Intel MKL Library ----------------------------------------- +Intel MKL includes an Automatic Offload (AO) feature that enables computationally intensive MKL functions called in user code to benefit from attached Intel Xeon Phi coprocessors automatically and transparently. -Intel MKL includes an Automatic Offload (AO) feature that enables -computationally intensive MKL functions called in user code to benefit -from attached Intel Xeon Phi coprocessors automatically and -transparently. +Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of controls is listed [ here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). -Behavioral of automatic offload mode is controlled by functions called -within the program or by environmental variables. Complete list of -controls is listed [ -here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). - -The Automatic Offload may be enabled by either an MKL function call -within the code: +The Automatic Offload may be enabled by either an MKL function call within the code: +```cpp mkl_mic_enable(); +``` or by setting environment variable +```bash $ export MKL_MIC_ENABLE=1 +``` -To get more information about automatic offload please refer to "[Using -Intel® MKL Automatic Offload on Intel ® Xeon Phi™ -Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)" -white paper or [ Intel MKL -documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). +To get more information about automatic offload please refer to "[Using Intel® MKL Automatic Offload on Intel ® Xeon Phi™ Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)" white paper or [ Intel MKL documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). ### Automatic offload example -At first get an interactive PBS session on a node with MIC accelerator -and load "intel" module that automatically loads "mkl" module as well. +At first get an interactive PBS session on a node with MIC accelerator and load "intel" module that automatically loads "mkl" module as well. +```bash $ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 $ module load intel +``` -Following example show how to automatically offload an SGEMM (single -precision - g dir="auto">eneral matrix multiply) function to -MIC coprocessor. The code can be copied to a file and compiled without -any necessary modification. +Following example show how to automatically offload an SGEMM (single precision - g dir="auto">eneral matrix multiply) function to MIC coprocessor. The code can be copied to a file and compiled without any necessary modification. +```bash $ vim sgemm-ao-short.c #include <stdio.h> @@ -319,7 +309,7 @@ any necessary modification.        printf("Enabling Automatic Offloadn");        /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */        mkl_mic_enable(); -        +        int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */        printf("Automatic Offload enabled: %d MIC devices presentn",  ndevices); @@ -333,23 +323,25 @@ any necessary modification.    return 0; } +``` -Please note: This example is simplified version of an example from MKL. -The expanded version can be found here: -$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** +>Please note: This example is simplified version of an example from MKL. The expanded version can be found here: **$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: +```bash $ icc -mkl sgemm-ao-short.c -o sgemm +``` -For debugging purposes enable the offload report to see more information -about automatic offloading. +For debugging purposes enable the offload report to see more information about automatic offloading. +```bash $ export OFFLOAD_REPORT=2 +``` -The output of a code should look similar to following listing, where -lines starting with [MKL] are generated by offload reporting: +The output of a code should look similar to following listing, where lines starting with [MKL] are generated by offload reporting: +```bash Computing SGEMM on the host Enabling Automatic Offload Automatic Offload enabled: 1 MIC devices present @@ -361,132 +353,122 @@ lines starting with [MKL] are generated by offload reporting: [MKL] [MIC 00] [AO SGEMM CPU->MIC Data] 52428800 bytes [MKL] [MIC 00] [AO SGEMM MIC->CPU Data] 26214400 bytes Done - - +``` Native Mode ----------- +In the native mode a program is executed directly on Intel Xeon Phi without involvement of the host machine. Similarly to offload mode, the code is compiled on the host computer with Intel compilers. -In the native mode a program is executed directly on Intel Xeon Phi -without involvement of the host machine. Similarly to offload mode, the -code is compiled on the host computer with Intel compilers. - -To compile a code user has to be connected to a compute with MIC and -load Intel compilers module. To get an interactive session on a compute -node with an Intel Xeon Phi and load the module use following commands: +To compile a code user has to be connected to a compute with MIC and load Intel compilers module. To get an interactive session on a compute node with an Intel Xeon Phi and load the module use following commands: +```bash $ qsub -I -q qmic -A NONE-0-0 $ module load intel/13.5.192 +``` -Please note that particular version of the Intel module is specified. -This information is used later to specify the correct library paths. +>Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. -To produce a binary compatible with Intel Xeon Phi architecture user has -to specify "-mmic" compiler flag. Two compilation examples are shown -below. The first example shows how to compile OpenMP parallel code -"vect-add.c" for host only: +To produce a binary compatible with Intel Xeon Phi architecture user has to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: +```bash $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host +``` To run this code on host, use: +```bash $ ./vect-add-host +``` -The second example shows how to compile the same code for Intel Xeon -Phi: +The second example shows how to compile the same code for Intel Xeon Phi: +```bash $ icc -mmic -fopenmp vect-add.c -o vect-add-mic +``` ### Execution of the Program in Native Mode on Intel Xeon Phi -The user access to the Intel Xeon Phi is through the SSH. Since user -home directories are mounted using NFS on the accelerator, users do not -have to copy binary files or libraries between the host and accelerator. - +The user access to the Intel Xeon Phi is through the SSH. Since user home directories are mounted using NFS on the accelerator, users do not have to copy binary files or libraries between the host and accelerator. To connect to the accelerator run: +```bash $ ssh mic0 +``` If the code is sequential, it can be executed directly: +```bash mic0 $ ~/path_to_binary/vect-add-seq-mic +``` -If the code is parallelized using OpenMP a set of additional libraries -is required for execution. To locate these libraries new path has to be -added to the LD_LIBRARY_PATH environment variable prior to the -execution: +If the code is parallelized using OpenMP a set of additional libraries is required for execution. To locate these libraries new path has to be added to the LD_LIBRARY_PATH environment variable prior to the execution: +```bash mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH +``` -Please note that the path exported in the previous example contains path -to a specific compiler (here the version is 5.192). This version number -has to match with the version number of the Intel compiler module that -was used to compile the code on the host computer. +>Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. -For your information the list of libraries and their location required -for execution of an OpenMP parallel code on Intel Xeon Phi is: +For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is: -/apps/intel/composer_xe_2013.5.192/compiler/lib/mic +>/apps/intel/composer_xe_2013.5.192/compiler/lib/mic -libiomp5.so +>libiomp5.so libimf.so libsvml.so libirng.so libintlc.so.5 -Finally, to run the compiled code use: +Finally, to run the compiled code use: + +```bash $ ~/path_to_binary/vect-add-mic +``` OpenCL ------------------- +OpenCL (Open Computing Language) is an open standard for general-purpose parallel programming for diverse mix of multi-core CPUs, GPU coprocessors, and other parallel processors. OpenCL provides a flexible execution model and uniform programming environment for software developers to write portable code for systems running on both the CPU and graphics processors or accelerators like the Intel® Xeon Phi. -OpenCL (Open Computing Language) is an open standard for -general-purpose parallel programming for diverse mix of multi-core CPUs, -GPU coprocessors, and other parallel processors. OpenCL provides a -flexible execution model and uniform programming environment for -software developers to write portable code for systems running on both -the CPU and graphics processors or accelerators like the Intel® Xeon -Phi. - -On Anselm OpenCL is installed only on compute nodes with MIC -accelerator, therefore OpenCL code can be compiled only on these nodes. +On Anselm OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. +```bash module load opencl-sdk opencl-rt +``` -Always load "opencl-sdk" (providing devel files like headers) and -"opencl-rt" (providing dynamic library libOpenCL.so) modules to compile -and link OpenCL code. Load "opencl-rt" for running your compiled code. +Always load "opencl-sdk" (providing devel files like headers) and "opencl-rt" (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load "opencl-rt" for running your compiled code. -There are two basic examples of OpenCL code in the following -directory: +There are two basic examples of OpenCL code in the following directory: +```bash /apps/intel/opencl-examples/ +``` -First example "CapsBasic" detects OpenCL compatible hardware, here -CPU and MIC, and prints basic information about the capabilities of -it. +First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC, and prints basic information about the capabilities of it. +```bash /apps/intel/opencl-examples/CapsBasic/capsbasic +``` -To compile and run the example copy it to your home directory, get -a PBS interactive session on of the nodes with MIC and run make for -compilation. Make files are very basic and shows how the OpenCL code can -be compiled on Anselm. +To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can be compiled on Anselm. +```bash $ cp /apps/intel/opencl-examples/CapsBasic/* . $ qsub -I -q qmic -A NONE-0-0 $ make +``` -The compilation command for this example is: +The compilation command for this example is: +```bash $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ +``` -After executing the complied binary file, following output should -be displayed. +After executing the complied binary file, following output should be displayed. +```bash ./capsbasic Number of available platforms: 1 @@ -510,27 +492,28 @@ be displayed.    CL_DEVICE_AVAILABLE: 1 ... +``` -More information about this example can be found on Intel website: -<http://software.intel.com/en-us/vcsource/samples/caps-basic/> +>More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> -The second example that can be found in -"/apps/intel/opencl-examples" >directory is General Matrix -Multiply. You can follow the the same procedure to download the example -to your directory and compile it. +The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. +```bash $ cp -r /apps/intel/opencl-examples/* . $ qsub -I -q qmic -A NONE-0-0 - $ cd GEMM + $ cd GEMM $ make +``` -The compilation command for this example is: +The compilation command for this example is: +```bash $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ +``` -To see the performance of Intel Xeon Phi performing the DGEMM run -the example as follows: +To see the performance of Intel Xeon Phi performing the DGEMM run the example as follows: +```bash ./gemm -d 1 Platforms (1): [0] Intel(R) OpenCL [Selected] @@ -548,40 +531,42 @@ the example as follows: Host time: 0.293334 sec. Host perf: 426.081 GFLOPS ... +``` -Please note: GNU compiler is used to compile the OpenCL codes for -Intel MIC. You do not need to load Intel compiler module. +>Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. -MPI +MPI ----------------- ### Environment setup and compilation -Again an MPI code for Intel Xeon Phi has to be compiled on a compute -node with accelerator and MPSS software stack installed. To get to a -compute node with accelerator use: +Again an MPI code for Intel Xeon Phi has to be compiled on a compute node with accelerator and MPSS software stack installed. To get to a compute node with accelerator use: +```bash $ qsub -I -q qmic -A NONE-0-0 +``` -The only supported implementation of MPI standard for Intel Xeon Phi is -Intel MPI. To setup a fully functional development environment a -combination of Intel compiler and Intel MPI has to be used. On a host -load following modules before compilation: +The only supported implementation of MPI standard for Intel Xeon Phi is Intel MPI. To setup a fully functional development environment a combination of Intel compiler and Intel MPI has to be used. On a host load following modules before compilation: - $ module load intel/13.5.192 impi/4.1.1.036 +```bash + $ module load intel/13.5.192 impi/4.1.1.036 +``` To compile an MPI code for host use: +```bash $ mpiicc -xhost -o mpi-test mpi-test.c +```bash To compile the same code for Intel Xeon Phi architecture use: +```bash $ mpiicc -mmic -o mpi-test-mic mpi-test.c +``` -An example of basic MPI version of "hello-world" example in C language, -that can be executed on both host and Xeon Phi is (can be directly copy -and pasted to a .c file) +An example of basic MPI version of "hello-world" example in C language, that can be executed on both host and Xeon Phi is (can be directly copy and pasted to a .c file) +```cpp #include <stdio.h> #include <mpi.h> @@ -602,55 +587,48 @@ and pasted to a .c file)  printf( "Hello world from process %d of %d on host %s n", rank, size, node );  MPI_Finalize(); -  return 0; +  return 0; } +``` ### MPI programming models -Intel MPI for the Xeon Phi coprocessors offers different MPI -programming models: +Intel MPI for the Xeon Phi coprocessors offers different MPI programming models: -Host-only model** - all MPI ranks reside on the host. The coprocessors -can be used by using offload pragmas. (Using MPI calls inside offloaded -code is not supported.)** +>**Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.) -Coprocessor-only model** - all MPI ranks reside only on the -coprocessors. +>**Coprocessor-only model** - all MPI ranks reside only on the coprocessors. -Symmetric model** - the MPI ranks reside on both the host and the -coprocessor. Most general MPI case. +>**Symmetric model** - the MPI ranks reside on both the host and the coprocessor. Most general MPI case. ###Host-only model -In this case all environment variables are set by modules, -so to execute the compiled MPI program on a single node, use: +In this case all environment variables are set by modules, so to execute the compiled MPI program on a single node, use: +```bash $ mpirun -np 4 ./mpi-test +``` The output should be similar to: +```bash Hello world from process 1 of 4 on host cn207 Hello world from process 3 of 4 on host cn207 Hello world from process 2 of 4 on host cn207 Hello world from process 0 of 4 on host cn207 +``` ### Coprocessor-only model -There are two ways how to execute an MPI code on a single -coprocessor: 1.) lunch the program using "**mpirun**" from the -coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a -host. +There are two ways how to execute an MPI code on a single coprocessor: 1.) lunch the program using "**mpirun**" from the +coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a host. -Execution on coprocessor** +**Execution on coprocessor** -Similarly to execution of OpenMP programs in native mode, since the -environmental module are not supported on MIC, user has to setup paths -to Intel MPI libraries and binaries manually. One time setup can be done -by creating a "**.profile**" file in user's home directory. This file -sets up the environment on the MIC automatically once user access to the -accelerator through the SSH. +Similarly to execution of OpenMP programs in native mode, since the environmental module are not supported on MIC, user has to setup paths to Intel MPI libraries and binaries manually. One time setup can be done by creating a "**.profile**" file in user's home directory. This file sets up the environment on the MIC automatically once user access to the accelerator through the SSH. - $ vim ~/.profile +```bash + $ vim ~/.profile PS1='[u@h W]$ ' export PATH=/usr/bin:/usr/sbin:/bin:/sbin @@ -658,145 +636,144 @@ accelerator through the SSH. #OpenMP export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH - #Intel MPI + #Intel MPI export LD_LIBRARY_PATH=/apps/intel/impi/4.1.1.036/mic/lib/:$LD_LIBRARY_PATH export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH +``` -Please note: - - this file sets up both environmental variable for both MPI and OpenMP -libraries. - - this file sets up the paths to a particular version of Intel MPI -library and particular version of an Intel compiler. These versions have -to match with loaded modules. +>Please note: + - this file sets up both environmental variable for both MPI and OpenMP libraries. + - this file sets up the paths to a particular version of Intel MPI library and particular version of an Intel compiler. These versions have to match with loaded modules. -To access a MIC accelerator located on a node that user is currently -connected to, use: +To access a MIC accelerator located on a node that user is currently connected to, use: +```bash $ ssh mic0 +``` or in case you need specify a MIC accelerator on a particular node, use: +```bash $ ssh cn207-mic0 +``` -To run the MPI code in parallel on multiple core of the accelerator, -use: +To run the MPI code in parallel on multiple core of the accelerator, use: +```bash $ mpirun -np 4 ./mpi-test-mic +``` The output should be similar to: +```bash Hello world from process 1 of 4 on host cn207-mic0 Hello world from process 2 of 4 on host cn207-mic0 Hello world from process 3 of 4 on host cn207-mic0 Hello world from process 0 of 4 on host cn207-mic0 +``` **Execution on host** -If the MPI program is launched from host instead of the coprocessor, the -environmental variables are not set using the ".profile" file. Therefore -user has to specify library paths from the command line when calling -"mpiexec". +If the MPI program is launched from host instead of the coprocessor, the environmental variables are not set using the ".profile" file. Therefore user has to specify library paths from the command line when calling "mpiexec". -First step is to tell mpiexec that the MPI should be executed on a local -accelerator by setting up the environmental variable "I_MPI_MIC" +First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the environmental variable "I_MPI_MIC" +```bash $ export I_MPI_MIC=1 +``` Now the MPI program can be executed as: +```bash $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +``` or using mpirun +```bash $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic +``` -Please note: - - the full path to the binary has to specified (here: -"**>~/mpi-test-mic**") - - the LD_LIBRARY_PATH has to match with Intel MPI module used to -compile the MPI code +>Please note: + - the full path to the binary has to specified (here: "**>~/mpi-test-mic**") + - the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code The output should be again similar to: +```bash Hello world from process 1 of 4 on host cn207-mic0 Hello world from process 2 of 4 on host cn207-mic0 Hello world from process 3 of 4 on host cn207-mic0 Hello world from process 0 of 4 on host cn207-mic0 +``` -Please note that the "mpiexec.hydra" requires a file -"**>pmi_proxy**" from Intel MPI library to be copied to the -MIC filesystem. If the file is missing please contact the system -administrators. A simple test to see if the file is present is to -execute: +>>Please note that the **"mpiexec.hydra"** requires a file the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: +```bash   $ ssh mic0 ls /bin/pmi_proxy  /bin/pmi_proxy +``` -**Execution on host - MPI processes distributed over multiple -accelerators on multiple nodes** +**Execution on host - MPI processes distributed over multiple accelerators on multiple nodes** -To get access to multiple nodes with MIC accelerator, user has to -use PBS to allocate the resources. To start interactive session, that -allocates 2 compute nodes = 2 MIC accelerators run qsub command with -following parameters: +To get access to multiple nodes with MIC accelerator, user has to use PBS to allocate the resources. To start interactive session, that allocates 2 compute nodes = 2 MIC accelerators run qsub command with following parameters: +```bash $ qsub -I -q qmic -A NONE-0-0 -l select=2:ncpus=16 $ module load intel/13.5.192 impi/4.1.1.036 +``` -This command connects user through ssh to one of the nodes -immediately. To see the other nodes that have been allocated use: +This command connects user through ssh to one of the nodes immediately. To see the other nodes that have been allocated use: +```bash $ cat $PBS_NODEFILE +``` -For example: +For example: +```bash cn204.bullx cn205.bullx +``` -This output means that the PBS allocated nodes cn204 and cn205, -which means that user has direct access to "**cn204-mic0**" and -"**cn-205-mic0**" accelerators. - -Please note: At this point user can connect to any of the -allocated nodes or any of the allocated MIC accelerators using ssh: -- to connect to the second node : ** $ ssh -cn205** -- to connect to the accelerator on the first node from the first -node: **$ ssh cn204-mic0** or - $ ssh mic0** --** to connect to the accelerator on the second node from the first -node: **$ ssh cn205-mic0** - -At this point we expect that correct modules are loaded and binary -is compiled. For parallel execution the mpiexec.hydra is used. -Again the first step is to tell mpiexec that the MPI can be executed on -MIC accelerators by setting up the environmental variable "I_MPI_MIC" +This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and "**cn-205-mic0**" accelerators. +>Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: +- to connect to the second node : ** $ ssh cn205** +- to connect to the accelerator on the first node from the first node: **$ ssh cn204-mic0** or **$ ssh mic0** +- to connect to the accelerator on the second node from the first node: **$ ssh cn205-mic0** + +At this point we expect that correct modules are loaded and binary is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" + +```bash $ export I_MPI_MIC=1 +``` The launch the MPI program use: - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic +```bash + $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -host cn204-mic0 -n 4 ~/mpi-test-mic : -host cn205-mic0 -n 6 ~/mpi-test-mic - +``` or using mpirun: - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic +```bash + $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -host cn204-mic0 -n 4 ~/mpi-test-mic : -host cn205-mic0 -n 6 ~/mpi-test-mic +``` -In this case four MPI processes are executed on accelerator cn204-mic -and six processes are executed on accelerator cn205-mic0. The sample -output (sorted after execution) is: +In this case four MPI processes are executed on accelerator cn204-mic and six processes are executed on accelerator cn205-mic0. The sample output (sorted after execution) is: +```bash Hello world from process 0 of 10 on host cn204-mic0 Hello world from process 1 of 10 on host cn204-mic0 Hello world from process 2 of 10 on host cn204-mic0 @@ -807,89 +784,82 @@ output (sorted after execution) is: Hello world from process 7 of 10 on host cn205-mic0 Hello world from process 8 of 10 on host cn205-mic0 Hello world from process 9 of 10 on host cn205-mic0 +``` -The same way MPI program can be executed on multiple hosts: +The same way MPI program can be executed on multiple hosts: - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp +```bash + $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204 -n 4 ~/mpi-test + -host cn204 -n 4 ~/mpi-test : -host cn205 -n 6 ~/mpi-test +``` -###Symmetric model +###Symmetric model -In a symmetric mode MPI programs are executed on both host -computer(s) and MIC accelerator(s). Since MIC has a different -architecture and requires different binary file produced by the Intel -compiler two different files has to be compiled before MPI program is -executed. +In a symmetric mode MPI programs are executed on both host computer(s) and MIC accelerator(s). Since MIC has a different +architecture and requires different binary file produced by the Intel compiler two different files has to be compiled before MPI program is executed. -In the previous section we have compiled two binary files, one for -hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". -These two binaries can be executed at once using mpiexec.hydra: +In the previous section we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using mpiexec.hydra: - $ mpiexec.hydra - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -host cn205 -n 2 ~/mpi-test +```bash + $ mpiexec.hydra + -genv I_MPI_FABRICS_LIST tcp + -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -host cn205 -n 2 ~/mpi-test : -host cn205-mic0 -n 2 ~/mpi-test-mic +``` -In this example the first two parameters (line 2 and 3) sets up required -environment variables for execution. The third line specifies binary -that is executed on host (here cn205) and the last line specifies the -binary that is execute on the accelerator (here cn205-mic0). +In this example the first two parameters (line 2 and 3) sets up required environment variables for execution. The third line specifies binary that is executed on host (here cn205) and the last line specifies the binary that is execute on the accelerator (here cn205-mic0). -The output of the program is: +The output of the program is: +```bash Hello world from process 0 of 4 on host cn205 Hello world from process 1 of 4 on host cn205 Hello world from process 2 of 4 on host cn205-mic0 Hello world from process 3 of 4 on host cn205-mic0 +``` -The execution procedure can be simplified by using the mpirun -command with the machine file a a parameter. Machine file contains list -of all nodes and accelerators that should used to execute MPI processes. +The execution procedure can be simplified by using the mpirun command with the machine file a a parameter. Machine file contains list of all nodes and accelerators that should used to execute MPI processes. -An example of a machine file that uses 2 >hosts (**cn205** -and **cn206**) and 2 accelerators **(cn205-mic0** and **cn206-mic0**) to -run 2 MPI processes on each of them: +An example of a machine file that uses 2 >hosts (**cn205** and **cn206**) and 2 accelerators **(cn205-mic0** and **cn206-mic0**) to run 2 MPI processes on each of them: +```bash $ cat hosts_file_mix cn205:2 cn205-mic0:2 cn206:2 cn206-mic0:2 +``` -In addition if a naming convention is set in a way that the name -of the binary for host is **"bin_name"** and the name of the binary -for the accelerator is **"bin_name-mic"** then by setting up the -environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not -have to specify the names of booth binaries. In this case mpirun needs -just the name of the host binary file (i.e. "mpi-test") and uses the -suffix to get a name of the binary for accelerator (i..e. -"mpi-test-mic"). +In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not have to specify the names of booth binaries. In this case mpirun needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). +```bash $ export I_MPI_MIC_POSTFIX=-mic +``` - >To run the MPI code using mpirun and the machine file -"hosts_file_mix" use: +To run the MPI code using mpirun and the machine file "hosts_file_mix" use: - $ mpirun - -genv I_MPI_FABRICS shm:tcp - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -machinefile hosts_file_mix +```bash + $ mpirun + -genv I_MPI_FABRICS shm:tcp + -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -machinefile hosts_file_mix ~/mpi-test +``` -A possible output of the MPI "hello-world" example executed on two -hosts and two accelerators is: +A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: - Hello world from process 0 of 8 on host cn204 +```bash + Hello world from process 0 of 8 on host cn204 Hello world from process 1 of 8 on host cn204 Hello world from process 2 of 8 on host cn204-mic0 Hello world from process 3 of 8 on host cn204-mic0 @@ -897,32 +867,21 @@ hosts and two accelerators is: Hello world from process 5 of 8 on host cn205 Hello world from process 6 of 8 on host cn205-mic0 Hello world from process 7 of 8 on host cn205-mic0 +``` -Please note: At this point the MPI communication between MIC -accelerators on different nodes uses 1Gb Ethernet only. +>Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. -Using the PBS automatically generated node-files +**Using the PBS automatically generated node-files** -PBS also generates a set of node-files that can be used instead of -manually creating a new one every time. Three node-files are genereated: +PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -**Host only node-file:** - - /lscratch/${PBS_JOBID}/nodefile-cn -MIC only node-file: - - /lscratch/${PBS_JOBID}/nodefile-mic -Host and MIC node-file: +>**Host only node-file:** + - /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file: + - /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file:  - /lscratch/${PBS_JOBID}/nodefile-mix -Please note each host or accelerator is listed only per files. User has -to specify how many jobs should be executed per node using "-n" -parameter of the mpirun command. +Please note each host or accelerator is listed only per files. User has to specify how many jobs should be executed per node using "-n" parameter of the mpirun command. Optimization ------------ - -For more details about optimization techniques please read Intel -document [Optimization and Performance Tuning for Intel® Xeon Phi™ -Coprocessors](http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization "http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization") - - - +For more details about optimization techniques please read Intel document [Optimization and Performance Tuning for Intel® Xeon Phi™ Coprocessors](http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization "http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization") \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md index 719fa3fd918379f0dd6564387b74270fdc5be2bd..9cce45ebbd2127f8cea56ae89865317a1b18edb7 100644 --- a/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md +++ b/docs.it4i/anselm-cluster-documentation/software/isv_licenses.md @@ -1,60 +1,40 @@ -ISV Licenses +ISV Licenses ============ -A guide to managing Independent Software Vendor licences +##A guide to managing Independent Software Vendor licences - +On Anselm cluster there are also installed commercial software applications, also known as ISV (Independent Software Vendor), which are subjects to licensing. The licenses are limited and their usage may be restricted only to some users or user groups. -On Anselm cluster there are also installed commercial software -applications, also known as ISV (Independent Software Vendor), which are -subjects to licensing. The licenses are limited and their usage may be -restricted only to some users or user groups. +Currently Flex License Manager based licensing is supported on the cluster for products Ansys, Comsol and Matlab. More information about the applications can be found in the general [Software](../software.1.html) section. -Currently Flex License Manager based licensing is supported on the -cluster for products Ansys, Comsol and Matlab. More information about -the applications can be found in the general -[Software](../software.1.html) section. - -If an ISV application was purchased for educational (research) purposes -and also for commercial purposes, then there are always two separate -versions maintained and suffix "edu" is used in the name of the -non-commercial version. +If an ISV application was purchased for educational (research) purposes and also for commercial purposes, then there are always two separate versions maintained and suffix "edu" is used in the name of the non-commercial version. Overview of the licenses usage ------------------------------ - -The overview is generated every minute and is accessible from web or -command line interface. +>The overview is generated every minute and is accessible from web or command line interface. ### Web interface -For each license there is a table, which provides the information about -the name, number of available (purchased/licensed), number of used and -number of free license features - +For each license there is a table, which provides the information about the name, number of available (purchased/licensed), number of used and number of free license features <https://extranet.it4i.cz/anselm/licenses> ### Text interface -For each license there is a unique text file, which provides the -information about the name, number of available (purchased/licensed), -number of used and number of free license features. The text files are -accessible from the Anselm command prompt. +For each license there is a unique text file, which provides the information about the name, number of available (purchased/licensed), number of used and number of free license features. The text files are accessible from the Anselm command prompt. - Product File with license state Note - ------ |---|---|------------------------------------------- --------------------- - ansys /apps/user/licenses/ansys_features_state.txt Commercial - comsol /apps/user/licenses/comsol_features_state.txt Commercial - comsol-edu /apps/user/licenses/comsol-edu_features_state.txt Non-commercial only - matlab /apps/user/licenses/matlab_features_state.txt Commercial - matlab-edu /apps/user/licenses/matlab-edu_features_state.txt Non-commercial only +|Product|File with license state|Note| +|---|---| +|ansys|/apps/user/licenses/ansys_features_state.txt|Commercial| +|comsol|/apps/user/licenses/comsol_features_state.txt|Commercial| +|comsol-edu|/apps/user/licenses/comsol-edu_features_state.txt|Non-commercial only| +|matlab|/apps/user/licenses/matlab_features_state.txt|Commercial| +|matlab-edu|/apps/user/licenses/matlab-edu_features_state.txt|Non-commercial only| -The file has a header which serves as a legend. All the info in the -legend starts with a hash (#) so it can be easily filtered when parsing -the file via a script. +The file has a header which serves as a legend. All the info in the legend starts with a hash (#) so it can be easily filtered when parsing the file via a script. Example of the Commercial Matlab license state: +```bash $ cat /apps/user/licenses/matlab_features_state.txt # matlab # ------------------------------------------------- @@ -71,20 +51,15 @@ Example of the Commercial Matlab license state: Optimization_Toolbox 1 0 1 Signal_Toolbox 1 0 1 Statistics_Toolbox 1 0 1 +``` License tracking in PBS Pro scheduler and users usage ----------------------------------------------------- - -Each feature of each license is accounted and checked by the scheduler -of PBS Pro. If you ask for certain licences, the scheduler won't start -the job until the asked licenses are free (available). This prevents to -crash batch jobs, just because of - unavailability of the -needed licenses. +Each feature of each license is accounted and checked by the scheduler of PBS Pro. If you ask for certain licences, the scheduler won't start the job until the asked licenses are free (available). This prevents to crash batch jobs, just because of unavailability of the needed licenses. The general format of the name is: -feature__APP__FEATURE** +**feature__APP__FEATURE** Names of applications (APP): @@ -94,56 +69,38 @@ Names of applications (APP): - matlab - matlab-edu - +To get the FEATUREs of a license take a look into the corresponding state file ([see above](isv_licenses.html#Licence)), or use: -To get the FEATUREs of a license take a look into the corresponding -state file ([see above](isv_licenses.html#Licence)), or -use: - - |Application |List of provided features | - | --- | --- | - |ansys |<pre><code>$ grep -v "#" /apps/user/licenses/ansys_features_state.txt | cut -f1 -d' '</code></pre> | - |comsol |<pre><code>$ grep -v "#" /apps/user/licenses/comsol_features_state.txt | cut -f1 -d' '</code></pre> | - |comsol-edu |<pre><code>$ grep -v "#" /apps/user/licenses/comsol-edu_features_state.txt | cut -f1 -d' '</code></pre> | - |matlab |<pre><code>$ grep -v "#" /apps/user/licenses/matlab_features_state.txt | cut -f1 -d' '</code></pre> | - |matlab-edu |<pre><code>$ grep -v "#" /apps/user/licenses/matlab-edu_features_state.txt | cut -f1 -d' '</code></pre> | - - +**Application and List of provided features** +- **ansys** $ grep -v "#" /apps/user/licenses/ansys_features_state.txt | cut -f1 -d' ' +- **comsol** $ grep -v "#" /apps/user/licenses/comsol_features_state.txt | cut -f1 -d' ' +- **comsol-ed** $ grep -v "#" /apps/user/licenses/comsol-edu_features_state.txt | cut -f1 -d' ' +- **matlab** $ grep -v "#" /apps/user/licenses/matlab_features_state.txt | cut -f1 -d' ' +- **matlab-edu** $ grep -v "#" /apps/user/licenses/matlab-edu_features_state.txt | cut -f1 -d' ' Example of PBS Pro resource name, based on APP and FEATURE name: -<col width="33%" /> -<col width="33%" /> -<col width="33%" /> |Application |Feature |PBS Pro resource name | | --- | --- | - |ansys |acfd |feature__ansys__acfd | - |ansys |aa_r |feature__ansys__aa_r | - |comsol |COMSOL |feature__comsol__COMSOL | - |comsol |HEATTRANSFER |feature__comsol__HEATTRANSFER | - |comsol-edu |COMSOLBATCH |feature__comsol-edu__COMSOLBATCH | - |comsol-edu |STRUCTURALMECHANICS |feature__comsol-edu__STRUCTURALMECHANICS | - |matlab |MATLAB |feature__matlab__MATLAB | - |matlab |Image_Toolbox |feature__matlab__Image_Toolbox | - |matlab-edu |MATLAB_Distrib_Comp_Engine |feature__matlab-edu__MATLAB_Distrib_Comp_Engine | - |matlab-edu |Image_Acquisition_Toolbox |feature__matlab-edu__Image_Acquisition_Toolbox\ | - -Be aware, that the resource names in PBS Pro are CASE SENSITIVE!** + |ansys |acfd |feature_ansys_acfd | + |ansys |aa_r |feature_ansys_aa_r | + |comsol |COMSOL |feature_comsol_COMSOL | + |comsol |HEATTRANSFER |feature_comsol_HEATTRANSFER | + |comsol-edu |COMSOLBATCH |feature_comsol-edu_COMSOLBATCH | + |comsol-edu |STRUCTURALMECHANICS |feature_comsol-edu_STRUCTURALMECHANICS | + |matlab |MATLAB |feature_matlab_MATLAB | + |matlab |Image_Toolbox |feature_matlab_Image_Toolbox | + |matlab-edu |MATLAB_Distrib_Comp_Engine |feature_matlab-edu_MATLAB_Distrib_Comp_Engine | + |matlab-edu |Image_Acquisition_Toolbox |feature_matlab-edu_Image_Acquisition_Toolbox\ | + +**Be aware, that the resource names in PBS Pro are CASE SENSITIVE!** ### Example of qsub statement -Run an interactive PBS job with 1 Matlab EDU license, 1 Distributed -Computing Toolbox and 32 Distributed Computing Engines (running on 32 -cores): +Run an interactive PBS job with 1 Matlab EDU license, 1 Distributed Computing Toolbox and 32 Distributed Computing Engines (running on 32 cores): +```bash $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 -l feature__matlab-edu__MATLAB=1 -l feature__matlab-edu__Distrib_Computing_Toolbox=1 -l feature__matlab-edu__MATLAB_Distrib_Comp_Engine=32 +``` -The license is used and accounted only with the real usage of the -product. So in this example, the general Matlab is used after Matlab is -run vy the user and not at the time, when the shell of the interactive -job is started. Also the Distributed Computing licenses are used at the -time, when the user uses the distributed parallel computation in Matlab -(e. g. issues pmode start, matlabpool, etc.). - - - +The license is used and accounted only with the real usage of the product. So in this example, the general Matlab is used after Matlab is run vy the user and not at the time, when the shell of the interactive job is started. Also the Distributed Computing licenses are used at the time, when the user uses the distributed parallel computation in Matlab (e. g. issues pmode start, matlabpool, etc.). \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/java.md b/docs.it4i/anselm-cluster-documentation/software/java.md index 9094578fb82ad669d7ec1cd25caaf132bc73fc22..4755ee2ba4eebaf3919be8fb93209ef16cd98831 100644 --- a/docs.it4i/anselm-cluster-documentation/software/java.md +++ b/docs.it4i/anselm-cluster-documentation/software/java.md @@ -1,33 +1,29 @@ -Java +Java ==== -Java on ANSELM +##Java on ANSELM - - -Java is available on Anselm cluster. Activate java by loading the java -module +Java is available on Anselm cluster. Activate java by loading the java module +```bash $ module load java +``` -Note that the java module must be loaded on the compute nodes as well, -in order to run java on compute nodes. +Note that the java module must be loaded on the compute nodes as well, in order to run java on compute nodes. Check for java version and path +```bash $ java -version $ which java +``` -With the module loaded, not only the runtime environment (JRE), but also -the development environment (JDK) with the compiler is available. +With the module loaded, not only the runtime environment (JRE), but also the development environment (JDK) with the compiler is available. +```bash $ javac -version $ which javac +``` -Java applications may use MPI for interprocess communication, in -conjunction with OpenMPI. Read more -on <http://www.open-mpi.org/faq/?category=java>. -This functionality is currently not supported on Anselm cluster. In case -you require the java interface to MPI, please contact [Anselm -support](https://support.it4i.cz/rt/). +Java applications may use MPI for interprocess communication, in conjunction with OpenMPI. Read more on <http://www.open-mpi.org/faq/?category=java>. This functionality is currently not supported on Anselm cluster. In case you require the java interface to MPI, please contact [Anselm support](https://support.it4i.cz/rt/). diff --git a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md index 803b896d1f2dc44638431b0e9c4d24efd4699d7e..1f1043692226c6404be06d049fce26aab7e86a99 100644 --- a/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md +++ b/docs.it4i/anselm-cluster-documentation/software/kvirtualization.md @@ -1,80 +1,45 @@ -Virtualization +Virtualization ============== -Running virtual machines on compute nodes - - +##Running virtual machines on compute nodes Introduction ------------ +There are situations when Anselm's environment is not suitable for user needs. -There are situations when Anselm's environment is not suitable for user -needs. - -- Application requires different operating system (e.g Windows), - application is not available for Linux -- Application requires different versions of base system libraries and - tools -- Application requires specific setup (installation, configuration) of - complex software stack +- Application requires different operating system (e.g Windows), application is not available for Linux +- Application requires different versions of base system libraries and tools +- Application requires specific setup (installation, configuration) of complex software stack - Application requires privileged access to operating system - ... and combinations of above cases - We offer solution for these cases - **virtualization**. Anselm's -environment gives the possibility to run virtual machines on compute -nodes. Users can create their own images of operating system with -specific software stack and run instances of these images as virtual -machines on compute nodes. Run of virtual machines is provided by -standard mechanism of [Resource Allocation and Job -Execution](../../resource-allocation-and-job-execution/introduction.html). +We offer solution for these cases - **virtualization**. Anselm's environment gives the possibility to run virtual machines on compute nodes. Users can create their own images of operating system with specific software stack and run instances of these images as virtual machines on compute nodes. Run of virtual machines is provided by standard mechanism of [Resource Allocation and Job Execution](../../resource-allocation-and-job-execution/introduction.html). -Solution is based on QEMU-KVM software stack and provides -hardware-assisted x86 virtualization. +Solution is based on QEMU-KVM software stack and provides hardware-assisted x86 virtualization. Limitations ----------- -Anselm's infrastructure was not designed for virtualization. Anselm's -environment is not intended primary for virtualization, compute nodes, -storages and all infrastructure of Anselm is intended and optimized for -running HPC jobs, this implies suboptimal configuration of -virtualization and limitations. +Anselm's infrastructure was not designed for virtualization. Anselm's environment is not intended primary for virtualization, compute nodes, storages and all infrastructure of Anselm is intended and optimized for running HPC jobs, this implies suboptimal configuration of virtualization and limitations. -Anselm's virtualization does not provide performance and all features of -native environment. There is significant performance hit (degradation) -in I/O performance (storage, network). Anselm's virtualization is not -suitable for I/O (disk, network) intensive workloads. +Anselm's virtualization does not provide performance and all features of native environment. There is significant performance hit (degradation) in I/O performance (storage, network). Anselm's virtualization is not suitable for I/O (disk, network) intensive workloads. -Virtualization has also some drawbacks, it is not so easy to setup -efficient solution. +Virtualization has also some drawbacks, it is not so easy to setup efficient solution. -Solution described in chapter -[HOWTO](virtualization.html#howto) - is suitable for single node tasks, does not -introduce virtual machine clustering. +Solution described in chapter [HOWTO](virtualization.html#howto) is suitable for single node tasks, does not introduce virtual machine clustering. -Please consider virtualization as last resort solution for your needs. +>Please consider virtualization as last resort solution for your needs. -Please consult use of virtualization with IT4Innovation's support. +>Please consult use of virtualization with IT4Innovation's support. -For running Windows application (when source code and Linux native -application are not available) consider use of Wine, Windows -compatibility layer. Many Windows applications can be run using Wine -with less effort and better performance than when using virtualization. +>For running Windows application (when source code and Linux native application are not available) consider use of Wine, Windows compatibility layer. Many Windows applications can be run using Wine with less effort and better performance than when using virtualization. Licensing --------- -IT4Innovations does not provide any licenses for operating systems and -software of virtual machines. Users are ( > -in accordance with [Acceptable use policy -document](http://www.it4i.cz/acceptable-use-policy.pdf)) -fully responsible for licensing all software running in virtual machines -on Anselm. Be aware of complex conditions of licensing software in -virtual environments. +IT4Innovations does not provide any licenses for operating systems and software of virtual machines. Users are ( in accordance with [Acceptable use policy document](http://www.it4i.cz/acceptable-use-policy.pdf)) fully responsible for licensing all software running in virtual machines on Anselm. Be aware of complex conditions of licensing software in virtual environments. -Users are responsible for licensing OS e.g. MS Windows and all software -running in their virtual machines. +>Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines.  HOWTO ---------- @@ -83,20 +48,9 @@ running in their virtual machines. We propose this job workflow: -Workflow](virtualization-job-workflow "Virtualization Job Workflow") - -Our recommended solution is that job script creates distinct shared job -directory, which makes a central point for data exchange between -Anselm's environment, compute node (host) (e.g HOME, SCRATCH, local -scratch and other local or cluster filesystems) and virtual machine -(guest). Job script links or copies input data and instructions what to -do (run script) for virtual machine to job directory and virtual machine -process input data according instructions in job directory and store -output back to job directory. We recommend, that virtual machine is -running in so called [snapshot -mode](virtualization.html#snapshot-mode), image is -immutable - image does not change, so one image can be used for many -concurrent jobs. + + +Our recommended solution is that job script creates distinct shared job directory, which makes a central point for data exchange between Anselm's environment, compute node (host) (e.g HOME, SCRATCH, local scratch and other local or cluster filesystems) and virtual machine (guest). Job script links or copies input data and instructions what to do (run script) for virtual machine to job directory and virtual machine process input data according instructions in job directory and store output back to job directory. We recommend, that virtual machine is running in so called [snapshot mode](virtualization.html#snapshot-mode), image is immutable - image does not change, so one image can be used for many concurrent jobs. ### Procedure @@ -112,71 +66,48 @@ You can either use your existing image or create new image from scratch. QEMU currently supports these image types or formats: -- raw -- cloop -- cow -- qcow -- qcow2 -- vmdk - VMware 3 & 4, or 6 image format, for exchanging images with - that product -- vdi - VirtualBox 1.1 compatible image format, for exchanging images - with VirtualBox. - -You can convert your existing image using qemu-img convert command. -Supported formats of this command are: blkdebug blkverify bochs cloop -cow dmg file ftp ftps host_cdrom host_device host_floppy http https -nbd parallels qcow qcow2 qed raw sheepdog tftp vdi vhdx vmdk vpc vvfat. +- raw +- cloop +- cow +- qcow +- qcow2 +- vmdk - VMware 3 & 4, or 6 image format, for exchanging images with that product +- vdi - VirtualBox 1.1 compatible image format, for exchanging images with VirtualBox. + +You can convert your existing image using qemu-img convert command. Supported formats of this command are: blkdebug blkverify bochs cloop cow dmg file ftp ftps host_cdrom host_device host_floppy http https nbd parallels qcow qcow2 qed raw sheepdog tftp vdi vhdx vmdk vpc vvfat. We recommend using advanced QEMU native image format qcow2. -[More about QEMU -Images](http://en.wikibooks.org/wiki/QEMU/Images) +[More about QEMU Images](http://en.wikibooks.org/wiki/QEMU/Images) ### Optimize image of your virtual machine -Use virtio devices (for disk/drive and network adapter) and install -virtio drivers (paravirtualized drivers) into virtual machine. There is -significant performance gain when using virtio drivers. For more -information see [Virtio -Linux](http://www.linux-kvm.org/page/Virtio) and [Virtio -Windows](http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers). +Use virtio devices (for disk/drive and network adapter) and install virtio drivers (paravirtualized drivers) into virtual machine. There is significant performance gain when using virtio drivers. For more information see [Virtio Linux](http://www.linux-kvm.org/page/Virtio) and [Virtio Windows](http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers). -Disable all -unnecessary services -and tasks. Restrict all unnecessary operating system operations. +Disable all unnecessary services and tasks. Restrict all unnecessary operating system operations. -Remove all -unnecessary software and -files. +Remove all unnecessary software and files. - -Remove all paging -space, swap files, partitions, etc. +Remove all paging space, swap files, partitions, etc. -Shrink your image. (It is recommended to zero all free space and -reconvert image using qemu-img.) +Shrink your image. (It is recommended to zero all free space and reconvert image using qemu-img.) ### Modify your image for running jobs -Your image should run some kind of operating system startup script. -Startup script should run application and when application exits run -shutdown or quit virtual machine. +Your image should run some kind of operating system startup script. Startup script should run application and when application exits run shutdown or quit virtual machine. We recommend, that startup script -maps Job Directory from host (from compute node) -runs script (we call it "run script") from Job Directory and waits for -application's exit -- for management purposes if run script does not exist wait for some - time period (few minutes) +- maps Job Directory from host (from compute node) +- runs script (we call it "run script") from Job Directory and waits for application's exit + - for management purposes if run script does not exist wait for some time period (few minutes) +- shutdowns/quits OS -shutdowns/quits OS -For Windows operating systems we suggest using Local Group Policy -Startup script, for Linux operating systems rc.local, runlevel init -script or similar service. +For Windows operating systems we suggest using Local Group Policy Startup script, for Linux operating systems rc.local, runlevel init script or similar service. Example startup script for Windows virtual machine: +```bash @echo off set LOG=c:startup.log set MAPDRIVE=z: @@ -212,20 +143,19 @@ Example startup script for Windows virtual machine: echo %DATE% %TIME% Shut down>%LOG% shutdown /s /t 0 +``` -Example startup script maps shared job script as drive z: and looks for -run script called run.bat. If run script is found it is run else wait -for 5 minutes, then shutdown virtual machine. +Example startup script maps shared job script as drive z: and looks for run script called run.bat. If run script is found it is run else wait for 5 minutes, then shutdown virtual machine. ### Create job script for executing virtual machine -Create job script according recommended +Create job script according recommended -[Virtual Machine Job -Workflow](virtualization.html#virtual-machine-job-workflow). +[Virtual Machine Job Workflow](virtualization.html#virtual-machine-job-workflow). Example job for Windows virtual machine: +```bash #/bin/sh JOB_DIR=/scratch/$USER/win/${PBS_JOBID} @@ -244,168 +174,166 @@ Example job for Windows virtual machine: # Run virtual machine export TMPDIR=/lscratch/${PBS_JOBID} module add qemu - qemu-system-x86_64 -  -enable-kvm -  -cpu host -  -smp ${VM_SMP} -  -m ${VM_MEMORY} -  -vga std -  -localtime -  -usb -usbdevice tablet -  -device virtio-net-pci,netdev=net0 -  -netdev user,id=net0,smb=${JOB_DIR},hostfwd=tcp::3389-:3389 -  -drive file=${VM_IMAGE},media=disk,if=virtio -  -snapshot + qemu-system-x86_64 +  -enable-kvm +  -cpu host +  -smp ${VM_SMP} +  -m ${VM_MEMORY} +  -vga std +  -localtime +  -usb -usbdevice tablet +  -device virtio-net-pci,netdev=net0 +  -netdev user,id=net0,smb=${JOB_DIR},hostfwd=tcp::3389-:3389 +  -drive file=${VM_IMAGE},media=disk,if=virtio +  -snapshot  -nographic +``` -Job script links application data (win), input data (data) and run -script (run.bat) into job directory and runs virtual machine. +Job script links application data (win), input data (data) and run script (run.bat) into job directory and runs virtual machine. Example run script (run.bat) for Windows virtual machine: +```bash z: cd winappl call application.bat z:data z:output +``` -Run script runs application from shared job directory (mapped as drive -z:), process input data (z:data) from job directory and store output -to job directory (z:output). +Run script runs application from shared job directory (mapped as drive z:), process input data (z:data) from job directory and store output to job directory (z:output). ### Run jobs -Run jobs as usual, see [Resource Allocation and Job -Execution](../../resource-allocation-and-job-execution/introduction.html). -Use only full node allocation for virtualization jobs. +Run jobs as usual, see [Resource Allocation and Job Execution](../../resource-allocation-and-job-execution/introduction.html). Use only full node allocation for virtualization jobs. ### Running Virtual Machines -Virtualization is enabled only on compute nodes, virtualization does not -work on login nodes. +Virtualization is enabled only on compute nodes, virtualization does not work on login nodes. Load QEMU environment module: +```bash $ module add qemu +``` Get help +```bash $ man qemu - +``` Run virtual machine (simple) +```bash $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -vnc :0 $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -vnc :0 +``` -You can access virtual machine by VNC viewer (option -vnc) connecting to -IP address of compute node. For VNC you must use [VPN -network](../../accessing-the-cluster/vpn-access.html). +You can access virtual machine by VNC viewer (option -vnc) connecting to IP address of compute node. For VNC you must use [VPN network](../../accessing-the-cluster/vpn-access.html). Install virtual machine from iso file +```bash $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -cdrom linux-install.iso -boot d -vnc :0 $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -cdrom win-install.iso -boot d -vnc :0 +``` -Run virtual machine using optimized devices, user network backend with -sharing and port forwarding, in snapshot mode +Run virtual machine using optimized devices, user network backend with sharing and port forwarding, in snapshot mode +```bash $ qemu-system-x86_64 -drive file=linux.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 -vnc :0 -snapshot $ qemu-system-x86_64 -drive file=win.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 -vnc :0 -snapshot +``` -Thanks to port forwarding you can access virtual machine via SSH (Linux) -or RDP (Windows) connecting to IP address of compute node (and port 2222 -for SSH). You must use [VPN -network](../../accessing-the-cluster/vpn-access.html). +Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 for SSH). You must use [VPN network](../../accessing-the-cluster/vpn-access.html). -Keep in mind, that if you use virtio devices, you must have virtio -drivers installed on your virtual machine. +>Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine. ### Networking and data sharing -For networking virtual machine we suggest to use (default) user network -backend (sometimes called slirp). This network backend NATs virtual -machines and provides useful services for virtual machines as DHCP, DNS, -SMB sharing, port forwarding. +For networking virtual machine we suggest to use (default) user network backend (sometimes called slirp). This network backend NATs virtual machines and provides useful services for virtual machines as DHCP, DNS, SMB sharing, port forwarding. -In default configuration IP network 10.0.2.0/24 is used, host has IP -address 10.0.2.2, DNS server 10.0.2.3, SMB server 10.0.2.4 and virtual -machines obtain address from range 10.0.2.15-10.0.2.31. Virtual machines -have access to Anselm's network via NAT on compute node (host). +In default configuration IP network 10.0.2.0/24 is used, host has IP address 10.0.2.2, DNS server 10.0.2.3, SMB server 10.0.2.4 and virtual machines obtain address from range 10.0.2.15-10.0.2.31. Virtual machines have access to Anselm's network via NAT on compute node (host). Simple network setup +```bash $ qemu-system-x86_64 ... -net nic -net user +``` (It is default when no -net options are given.) -Simple network setup with sharing and port forwarding (obsolete but -simpler syntax, lower performance) +Simple network setup with sharing and port forwarding (obsolete but simpler syntax, lower performance) +```bash $ qemu-system-x86_64 ... -net nic -net user,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 +``` Optimized network setup with sharing and port forwarding +```bash $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 +``` ### Advanced networking -Internet access** +**Internet access** -Sometime your virtual machine needs access to internet (install -software, updates, software activation, etc). We suggest solution using -Virtual Distributed Ethernet (VDE) enabled QEMU with SLIRP running on -login node tunnelled to compute node. Be aware, this setup has very low -performance, the worst performance of all described solutions. +Sometime your virtual machine needs access to internet (install software, updates, software activation, etc). We suggest solution using Virtual Distributed Ethernet (VDE) enabled QEMU with SLIRP running on login node tunnelled to compute node. Be aware, this setup has very low performance, the worst performance of all described solutions. -Load VDE enabled QEMU environment module (unload standard QEMU module -first if necessary). +Load VDE enabled QEMU environment module (unload standard QEMU module first if necessary). +```bash $ module add qemu/2.1.2-vde2 +``` Create virtual network switch. +```bash $ vde_switch -sock /tmp/sw0 -mgmt /tmp/sw0.mgmt -daemon +``` -Run SLIRP daemon over SSH tunnel on login node and connect it to virtual -network switch. +Run SLIRP daemon over SSH tunnel on login node and connect it to virtual network switch. +```bash $ dpipe vde_plug /tmp/sw0 = ssh login1 $VDE2_DIR/bin/slirpvde -s - --dhcp & +``` Run qemu using vde network backend, connect to created virtual switch. Basic setup (obsolete syntax) +```bash $ qemu-system-x86_64 ... -net nic -net vde,sock=/tmp/sw0 +``` Setup using virtio device (obsolete syntax) +```bash $ qemu-system-x86_64 ... -net nic,model=virtio -net vde,sock=/tmp/sw0 +``` Optimized setup +```bash $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev vde,id=net0,sock=/tmp/sw0 +``` -TAP interconnect** +**TAP interconnect** -Both user and vde network backend have low performance. For fast -interconnect (10Gbps and more) of compute node (host) and virtual -machine (guest) we suggest using Linux kernel TAP device. +Both user and vde network backend have low performance. For fast interconnect (10Gbps and more) of compute node (host) and virtual machine (guest) we suggest using Linux kernel TAP device. -Cluster Anselm provides TAP device tap0 for your job. TAP interconnect -does not provide any services (like NAT, DHCP, DNS, SMB, etc.) just raw -networking, so you should provide your services if you need them. +Cluster Anselm provides TAP device tap0 for your job. TAP interconnect does not provide any services (like NAT, DHCP, DNS, SMB, etc.) just raw networking, so you should provide your services if you need them. Run qemu with TAP network backend: - $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net1 +```bash + $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net1 -netdev tap,id=net1,ifname=tap0,script=no,downscript=no +``` -Interface tap0 has IP address 192.168.1.1 and network mask 255.255.255.0 -(/24). In virtual machine use IP address from range -192.168.1.2-192.168.1.254. For your convenience some ports on tap0 -interface are redirected to higher numbered ports, so you as -non-privileged user can provide services on these ports. +Interface tap0 has IP address 192.168.1.1 and network mask 255.255.255.0 (/24). In virtual machine use IP address from range 192.168.1.2-192.168.1.254. For your convenience some ports on tap0 interface are redirected to higher numbered ports, so you as non-privileged user can provide services on these ports. Redirected ports: @@ -413,18 +341,17 @@ Redirected ports: - DHCP udp/67->udp3067 - SMB tcp/139->tcp3139, tcp/445->tcp3445). -You can configure IP address of virtual machine statically or -dynamically. For dynamic addressing provide your DHCP server on port -3067 of tap0 interface, you can also provide your DNS server on port -3053 of tap0 interface for example: +You can configure IP address of virtual machine statically or dynamically. For dynamic addressing provide your DHCP server on port 3067 of tap0 interface, you can also provide your DNS server on port 3053 of tap0 interface for example: +```bash $ dnsmasq --interface tap0 --bind-interfaces -p 3053 --dhcp-alternate-port=3067,68 --dhcp-range=192.168.1.15,192.168.1.32 --dhcp-leasefile=/tmp/dhcp.leasefile +``` -You can also provide your SMB services (on ports 3139, 3445) to obtain -high performance data sharing. +You can also provide your SMB services (on ports 3139, 3445) to obtain high performance data sharing. Example smb.conf (not optimized) +```bash [global] socket address=192.168.1.1 smb ports = 3445 3139 @@ -453,32 +380,31 @@ Example smb.conf (not optimized) follow symlinks=yes wide links=yes force user=USER +``` (Replace USER with your login name.) Run SMB services +```bash smbd -s /tmp/qemu-smb/smb.conf +``` - - -Virtual machine can of course have more than one network interface -controller, virtual machine can use more than one network backend. So, -you can combine for example use network backend and TAP interconnect. +Virtual machine can of course have more than one network interface controller, virtual machine can use more than one network backend. So, you can combine for example use network backend and TAP interconnect. ### Snapshot mode -In snapshot mode image is not written, changes are written to temporary -file (and discarded after virtual machine exits). **It is strongly -recommended mode for running your jobs.** Set TMPDIR environment -variable to local scratch directory for placement temporary files. +In snapshot mode image is not written, changes are written to temporary file (and discarded after virtual machine exits). **It is strongly recommended mode for running your jobs.** Set TMPDIR environment variable to local scratch directory for placement temporary files. +```bash $ export TMPDIR=/lscratch/${PBS_JOBID} $ qemu-system-x86_64 ... -snapshot +``` ### Windows guests For Windows guests we recommend these options, life will be easier: +```bash $ qemu-system-x86_64 ... -localtime -usb -usbdevice tablet - +``` \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi-1/mpi.md b/docs.it4i/anselm-cluster-documentation/software/mpi-1/mpi.md deleted file mode 100644 index fb348bb0091106d09774641c992a2f6ecca9cd56..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/mpi-1/mpi.md +++ /dev/null @@ -1,171 +0,0 @@ -MPI -=== - - - -Setting up MPI Environment --------------------------- - -The Anselm cluster provides several implementations of the MPI library: - - |MPI Library |Thread support | - | --- | --- | - |The highly optimized and stable <strong>bullxmpi 1.2.4.1</strong>\ |<strong></strong>Partial thread support up to MPI_THREAD_SERIALIZED | - |The <strong>Intel MPI 4.1</strong> |Full thread support up to MPI_THREAD_MULTIPLE | - |The <a href="http://www.open-mpi.org/" <strong>OpenMPI 1.6.5</strong></a> |Full thread support up to MPI_THREAD_MULTIPLE, BLCR c/r support | - |The OpenMPI 1.8.1 |Full thread support up to MPI_THREAD_MULTIPLE, MPI-3.0 support | - |The <strong><strong>mpich2 1.9</strong></strong> |Full thread support up to <strong></strong> MPI_THREAD_MULTIPLE, BLCR c/r support | - -MPI libraries are activated via the environment modules. - -Look up section modulefiles/mpi in module avail - - $ module avail - ------------------------- /opt/modules/modulefiles/mpi ------------------------- - bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc - impi/4.0.3.008 openmpi/1.6.5-gcc(default) - impi/4.1.0.024 openmpi/1.6.5-gcc46 - impi/4.1.0.030 openmpi/1.6.5-icc - impi/4.1.1.036(default) openmpi/1.8.1-gcc - openmpi/1.8.1-gcc46 - mvapich2/1.9-gcc(default) openmpi/1.8.1-gcc49 - mvapich2/1.9-gcc46 openmpi/1.8.1-icc - -There are default compilers associated with any particular MPI -implementation. The defaults may be changed, the MPI libraries may be -used in conjunction with any compiler. -The defaults are selected via the modules in following way - - Module MPI Compiler suite - -------- |---|---|-------- -------------------------------------------------------------------------------- - PrgEnv-gnu bullxmpi-1.2.4.1 bullx GNU 4.4.6 - PrgEnv-intel Intel MPI 4.1.1 Intel 13.1.1 - bullxmpi bullxmpi-1.2.4.1 none, select via module - impi Intel MPI 4.1.1 none, select via module - openmpi OpenMPI 1.6.5 GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers - openmpi OpenMPI 1.8.1 GNU compilers 4.8.1, GNU compilers 4.4.6, GNU compilers 4.9.0, Intel Compilers - mvapich2 MPICH2 1.9 GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers - -Examples: - - $ module load openmpi - -In this example, we activate the latest openmpi with latest GNU -compilers - -To use openmpi with the intel compiler suite, use - - $ module load intel - $ module load openmpi/1.6.5-icc - -In this example, the openmpi 1.6.5 using intel compilers is activated - -Compiling MPI Programs ----------------------- - -After setting up your MPI environment, compile your program using one of -the mpi wrappers - - $ mpicc -v - $ mpif77 -v - $ mpif90 -v - -Example program: - - // helloworld_mpi.c - #include <stdio.h> - - #include<mpi.h> - - int main(int argc, char **argv) { - - int len; - int rank, size; - char node[MPI_MAX_PROCESSOR_NAME]; - - // Initiate MPI - MPI_Init(&argc, &argv); - MPI_Comm_rank(MPI_COMM_WORLD,&rank); - MPI_Comm_size(MPI_COMM_WORLD,&size); - - // Get hostame and print - MPI_Get_processor_name(node,&len); - printf("Hello world! from rank %d of %d on host %sn",rank,size,node); - - // Finalize and exit - MPI_Finalize(); - - return 0; - } - -Compile the above example with - - $ mpicc helloworld_mpi.c -o helloworld_mpi.x - -Running MPI Programs --------------------- - -The MPI program executable must be compatible with the loaded MPI -module. -Always compile and execute using the very same MPI module. - -It is strongly discouraged to mix mpi implementations. Linking an -application with one MPI implementation and running mpirun/mpiexec form -other implementation may result in unexpected errors. - -The MPI program executable must be available within the same path on all -nodes. This is automatically fulfilled on the /home and /scratch -filesystem. You need to preload the executable, if running on the local -scratch /lscratch filesystem. - -### Ways to run MPI programs - -Optimal way to run an MPI program depends on its memory requirements, -memory access pattern and communication pattern. - -Consider these ways to run an MPI program: -1. One MPI process per node, 16 threads per process -2. Two MPI processes per node, 8 threads per process -3. 16 MPI processes per node, 1 thread per process. - -One MPI** process per node, using 16 threads, is most useful for -memory demanding applications, that make good use of processor cache -memory and are not memory bound. This is also a preferred way for -communication intensive applications as one process per node enjoys full -bandwidth access to the network interface. - -Two MPI** processes per node, using 8 threads each, bound to processor -socket is most useful for memory bandwidth bound applications such as -BLAS1 or FFT, with scalable memory demand. However, note that the two -processes will share access to the network interface. The 8 threads and -socket binding should ensure maximum memory access bandwidth and -minimize communication, migration and numa effect overheads. - -Important! Bind every OpenMP thread to a core! - -In the previous two cases with one or two MPI processes per node, the -operating system might still migrate OpenMP threads between cores. You -want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY -environment variables. - -16 MPI** processes per node, using 1 thread each bound to processor -core is most suitable for highly scalable applications with low -communication demand. - -### Running OpenMPI - -The **bullxmpi-1.2.4.1** and [**OpenMPI -1.6.5**](http://www.open-mpi.org/) are both based on -OpenMPI. Read more on [how to run -OpenMPI](Running_OpenMPI.html) based MPI. - -### Running MPICH2 - -The **Intel MPI** and **mpich2 1.9** are MPICH2 based implementations. -Read more on [how to run MPICH2](running-mpich2.html) -based MPI. - -The Intel MPI may run on the Intel Xeon Phi accelerators as well. Read -more on [how to run Intel MPI on -accelerators](../intel-xeon-phi.html). - diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi-1/Running_OpenMPI.md b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md similarity index 60% rename from docs.it4i/anselm-cluster-documentation/software/mpi-1/Running_OpenMPI.md rename to docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md index 03477ba6e3b3ecd0b61b5086adb72f431dfc91b1..e00940965ed5b4c77a74f8b733b948e8c69557c3 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi-1/Running_OpenMPI.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/Running_OpenMPI.md @@ -1,21 +1,17 @@ -Running OpenMPI -=============== - - +Running OpenMPI +============== OpenMPI program execution ------------------------- - -The OpenMPI programs may be executed only via the PBS Workload manager, -by entering an appropriate queue. On Anselm, the **bullxmpi-1.2.4.1** -and **OpenMPI 1.6.5** are OpenMPI based MPI implementations. +The OpenMPI programs may be executed only via the PBS Workload manager, by entering an appropriate queue. On Anselm, the **bullxmpi-1.2.4.1** and **OpenMPI 1.6.5** are OpenMPI based MPI implementations. ### Basic usage -Use the mpiexec to run the OpenMPI code. +>Use the mpiexec to run the OpenMPI code. Example: +```bash $ qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready @@ -29,24 +25,16 @@ Example: Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 Hello world! from rank 3 of 4 on host cn110 +``` -Please be aware, that in this example, the directive **-pernode** is -used to run only **one task per node**, which is normally an unwanted -behaviour (unless you want to run hybrid code with just one MPI and 16 -OpenMP tasks per node). In normal MPI programs **omit the -pernode -directive** to run up to 16 MPI tasks per each node. +>Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node. -In this example, we allocate 4 nodes via the express queue -interactively. We set up the openmpi environment and interactively run -the helloworld_mpi.x program. -Note that the executable -helloworld_mpi.x must be available within the -same path on all nodes. This is automatically fulfilled on the /home and -/scratch filesystem. +In this example, we allocate 4 nodes via the express queue interactively. We set up the openmpi environment and interactively run the helloworld_mpi.x program. Note that the executable helloworld_mpi.x must be available within the +same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. -You need to preload the executable, if running on the local scratch -/lscratch filesystem +You need to preload the executable, if running on the local scratch /lscratch filesystem +```bash $ pwd /lscratch/15210.srv11 @@ -55,147 +43,134 @@ You need to preload the executable, if running on the local scratch Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 Hello world! from rank 3 of 4 on host cn110 +``` -In this example, we assume the executable -helloworld_mpi.x is present on compute node -cn17 on local scratch. We call the mpiexec whith the ---preload-binary** argument (valid for openmpi). The mpiexec will copy -the executable from cn17 to the -/lscratch/15210.srv11 directory on cn108, cn109 -and cn110 and execute the program. +In this example, we assume the executable helloworld_mpi.x is present on compute node cn17 on local scratch. We call the mpiexec whith the **--preload-binary** argument (valid for openmpi). The mpiexec will copy the executable from cn17 to the /lscratch/15210.srv11 directory on cn108, cn109 and cn110 and execute the program. -MPI process mapping may be controlled by PBS parameters. +>MPI process mapping may be controlled by PBS parameters. -The mpiprocs and ompthreads parameters allow for selection of number of -running MPI processes per node as well as number of OpenMP threads per -MPI process. +The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. ### One MPI process per node -Follow this example to run one MPI process per node, 16 threads per -process. +Follow this example to run one MPI process per node, 16 threads per process. +```bash $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I $ module load openmpi $ mpiexec --bind-to-none ./helloworld_mpi.x +``` -In this example, we demonstrate recommended way to run an MPI -application, using 1 MPI processes per node and 16 threads per socket, -on 4 nodes. +In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 16 threads per socket, on 4 nodes. ### Two MPI processes per node -Follow this example to run two MPI processes per node, 8 threads per -process. Note the options to mpiexec. +Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpiexec. +```bash $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I $ module load openmpi $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x +``` -In this example, we demonstrate recommended way to run an MPI -application, using 2 MPI processes per node and 8 threads per socket, -each process and its threads bound to a separate processor socket of the -node, on 4 nodes +In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 8 threads per socket, each process and its threads bound to a separate processor socket of the node, on 4 nodes ### 16 MPI processes per node -Follow this example to run 16 MPI processes per node, 1 thread per -process. Note the options to mpiexec. +Follow this example to run 16 MPI processes per node, 1 thread per process. Note the options to mpiexec. +```bash $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I $ module load openmpi $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x +``` -In this example, we demonstrate recommended way to run an MPI -application, using 16 MPI processes per node, single threaded. Each -process is bound to separate processor core, on 4 nodes. +In this example, we demonstrate recommended way to run an MPI application, using 16 MPI processes per node, single threaded. Each process is bound to separate processor core, on 4 nodes. ### OpenMP thread affinity -Important! Bind every OpenMP thread to a core! +>Important! Bind every OpenMP thread to a core! -In the previous two examples with one or two MPI processes per node, the -operating system might still migrate OpenMP threads between cores. You -might want to avoid this by setting these environment variable for GCC -OpenMP: +In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: +```bash $ export GOMP_CPU_AFFINITY="0-15" +``` or this one for Intel OpenMP: +```bash $ export KMP_AFFINITY=granularity=fine,compact,1,0 +`` -As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and -later) the following variables may be used for Intel or GCC: +As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: +```bash $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores + $ export OMP_PLACES=cores +``` OpenMPI Process Mapping and Binding ------------------------------------------------ +The mpiexec allows for precise selection of how the MPI processes will be mapped to the computational nodes and how these processes will bind to particular processor sockets and cores. -The mpiexec allows for precise selection of how the MPI processes will -be mapped to the computational nodes and how these processes will bind -to particular processor sockets and cores. - -MPI process mapping may be specified by a hostfile or rankfile input to -the mpiexec program. Altough all implementations of MPI provide means -for process mapping and binding, following examples are valid for the -openmpi only. +MPI process mapping may be specified by a hostfile or rankfile input to the mpiexec program. Altough all implementations of MPI provide means for process mapping and binding, following examples are valid for the openmpi only. ### Hostfile Example hostfile +```bash cn110.bullx cn109.bullx cn108.bullx cn17.bullx +``` Use the hostfile to control process placement +```bash $ mpiexec -hostfile hostfile ./helloworld_mpi.x Hello world! from rank 0 of 4 on host cn110 Hello world! from rank 1 of 4 on host cn109 Hello world! from rank 2 of 4 on host cn108 Hello world! from rank 3 of 4 on host cn17 +``` -In this example, we see that ranks have been mapped on nodes according -to the order in which nodes show in the hostfile +In this example, we see that ranks have been mapped on nodes according to the order in which nodes show in the hostfile ### Rankfile -Exact control of MPI process placement and resource binding is provided -by specifying a rankfile +Exact control of MPI process placement and resource binding is provided by specifying a rankfile -Appropriate binding may boost performance of your application. +>Appropriate binding may boost performance of your application. Example rankfile +```bash rank 0=cn110.bullx slot=1:0,1 rank 1=cn109.bullx slot=0:* rank 2=cn108.bullx slot=1:1-2 rank 3=cn17.bullx slot=0:1,1:0-2 rank 4=cn109.bullx slot=0:*,1:* +``` -This rankfile assumes 5 ranks will be running on 4 nodes and provides -exact mapping and binding of the processes to the processor sockets and -cores +This rankfile assumes 5 ranks will be running on 4 nodes and provides exact mapping and binding of the processes to the processor sockets and cores Explanation: rank 0 will be bounded to cn110, socket1 core0 and core1 rank 1 will be bounded to cn109, socket0, all cores rank 2 will be bounded to cn108, socket1, core1 and core2 -rank 3 will be bounded to cn17, socket0 core1, socket1 core0, core1, -core2 +rank 3 will be bounded to cn17, socket0 core1, socket1 core0, core1, core2 rank 4 will be bounded to cn109, all cores on both sockets +```bash $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x [cn17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . .][B B B . . . . .] (slot list 0:1,1:0-2) [cn110:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . .][B B . . . . . .] (slot list 1:0,1) @@ -207,28 +182,24 @@ rank 4 will be bounded to cn109, all cores on both sockets Hello world! from rank 0 of 5 on host cn110 Hello world! from rank 4 of 5 on host cn109 Hello world! from rank 2 of 5 on host cn108 +``` -In this example we run 5 MPI processes (5 ranks) on four nodes. The -rankfile defines how the processes will be mapped on the nodes, sockets -and cores. The **--report-bindings** option was used to print out the -actual process location and bindings. Note that ranks 1 and 4 run on the -same node and their core binding overlaps. +In this example we run 5 MPI processes (5 ranks) on four nodes. The rankfile defines how the processes will be mapped on the nodes, sockets and cores. The **--report-bindings** option was used to print out the actual process location and bindings. Note that ranks 1 and 4 run on the same node and their core binding overlaps. -It is users responsibility to provide correct number of ranks, sockets -and cores. +It is users responsibility to provide correct number of ranks, sockets and cores. ### Bindings verification -In all cases, binding and threading may be verified by executing for -example: +In all cases, binding and threading may be verified by executing for example: +```bash $ mpiexec -bysocket -bind-to-socket --report-bindings echo $ mpiexec -bysocket -bind-to-socket numactl --show $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS +``` Changes in OpenMPI 1.8 ---------------------- - Some options have changed in OpenMPI version 1.8. |version 1.6.5 |version 1.8.1 | @@ -238,5 +209,4 @@ Some options have changed in OpenMPI version 1.8. |--bind-to-socket |--bind-to socket | |-bysocket |--map-by socket | |-bycore |--map-by core | - |-pernode |--map-by ppr:1:node\ | - + |-pernode |--map-by ppr:1:node | \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md new file mode 100644 index 0000000000000000000000000000000000000000..b853d1360a5e3474ee3c173e2d9995733ea1180a --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi.md @@ -0,0 +1,144 @@ +MPI +=== + +Setting up MPI Environment +-------------------------- +The Anselm cluster provides several implementations of the MPI library: + + |MPI Library |Thread support | + | --- | --- | + |The highly optimized and stable **bullxmpi 1.2.4.1** |Partial thread support up to MPI_THREAD_SERIALIZED | + |The **Intel MPI 4.1** |Full thread support up to MPI_THREAD_MULTIPLE | + |The [OpenMPI 1.6.5](href="http://www.open-mpi.org)| Full thread support up to MPI_THREAD_MULTIPLE, BLCR c/r support | + |The OpenMPI 1.8.1 |Full thread support up to MPI_THREAD_MULTIPLE, MPI-3.0 support | + |The **mpich2 1.9** |Full thread support up to MPI_THREAD_MULTIPLE, BLCR c/r support | + +MPI libraries are activated via the environment modules. + +Look up section modulefiles/mpi in module avail + +```bash + $ module avail + ------------------------- /opt/modules/modulefiles/mpi ------------------------- + bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc + impi/4.0.3.008 openmpi/1.6.5-gcc(default) + impi/4.1.0.024 openmpi/1.6.5-gcc46 + impi/4.1.0.030 openmpi/1.6.5-icc + impi/4.1.1.036(default) openmpi/1.8.1-gcc + openmpi/1.8.1-gcc46 + mvapich2/1.9-gcc(default) openmpi/1.8.1-gcc49 + mvapich2/1.9-gcc46 openmpi/1.8.1-icc +``` + +There are default compilers associated with any particular MPI implementation. The defaults may be changed, the MPI libraries may be used in conjunction with any compiler. The defaults are selected via the modules in following way + +|Module|MPI|Compiler suite| +|-------- |---|---| +|PrgEnv-gnu|bullxmpi-1.2.4.1|bullx GNU 4.4.6| +|PrgEnv-intel|Intel MPI 4.1.1|Intel 13.1.1| +|bullxmpi|bullxmpi-1.2.4.1|none, select via module| +|impi|Intel MPI 4.1.1|none, select via module| +|openmpi|OpenMPI 1.6.5|GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers| +|openmpi|OpenMPI 1.8.1|GNU compilers 4.8.1, GNU compilers 4.4.6, GNU compilers 4.9.0, Intel Compilers| +|mvapich2|MPICH2 1.9|GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers| + +Examples: + +```bash + $ module load openmpi +``` + +In this example, we activate the latest openmpi with latest GNU compilers + +To use openmpi with the intel compiler suite, use + +```bash + $ module load intel + $ module load openmpi/1.6.5-icc +``` + +In this example, the openmpi 1.6.5 using intel compilers is activated + +Compiling MPI Programs +---------------------- +>After setting up your MPI environment, compile your program using one of the mpi wrappers + +```bash + $ mpicc -v + $ mpif77 -v + $ mpif90 -v +``` + +Example program: + +```cpp + // helloworld_mpi.c + #include <stdio.h> + + #include<mpi.h> + + int main(int argc, char **argv) { + + int len; + int rank, size; + char node[MPI_MAX_PROCESSOR_NAME]; + + // Initiate MPI + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD,&rank); + MPI_Comm_size(MPI_COMM_WORLD,&size); + + // Get hostame and print + MPI_Get_processor_name(node,&len); + printf("Hello world! from rank %d of %d on host %sn",rank,size,node); + + // Finalize and exit + MPI_Finalize(); + + return 0; + } +``` + +Compile the above example with + +```bash + $ mpicc helloworld_mpi.c -o helloworld_mpi.x +``` + +Running MPI Programs +-------------------- +>The MPI program executable must be compatible with the loaded MPI module. +Always compile and execute using the very same MPI module. + +It is strongly discouraged to mix mpi implementations. Linking an application with one MPI implementation and running mpirun/mpiexec form other implementation may result in unexpected errors. + +The MPI program executable must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. You need to preload the executable, if running on the local scratch /lscratch filesystem. + +### Ways to run MPI programs + +Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern. + +>Consider these ways to run an MPI program: +1. One MPI process per node, 16 threads per process +2. Two MPI processes per node, 8 threads per process +3. 16 MPI processes per node, 1 thread per process. + +**One MPI** process per node, using 16 threads, is most useful for memory demanding applications, that make good use of processor cache memory and are not memory bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface. + +**Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and numa effect overheads. + +>Important! Bind every OpenMP thread to a core! + +In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. + +**16 MPI** processes per node, using 1 thread each bound to processor core is most suitable for highly scalable applications with low communication demand. + +### Running OpenMPI + +The **bullxmpi-1.2.4.1** and [**OpenMPI 1.6.5**](http://www.open-mpi.org/) are both based on OpenMPI. Read more on [how to run OpenMPI](Running_OpenMPI.html) based MPI. + +### Running MPICH2 + +The **Intel MPI** and **mpich2 1.9** are MPICH2 based implementations. Read more on [how to run MPICH2](running-mpich2.html) based MPI. + +The Intel MPI may run on the Intel Xeon Phi accelerators as well. Read more on [how to run Intel MPI on accelerators](../intel-xeon-phi.html). \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi-1/mpi4py-mpi-for-python.md b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi4py-mpi-for-python.md similarity index 64% rename from docs.it4i/anselm-cluster-documentation/software/mpi-1/mpi4py-mpi-for-python.md rename to docs.it4i/anselm-cluster-documentation/software/mpi/mpi4py-mpi-for-python.md index e79ef4b1f0d649557a23691f3cc9e03070193127..9025af6c90774bee2d007db0aa391b476d86e7c0 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi-1/mpi4py-mpi-for-python.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/mpi4py-mpi-for-python.md @@ -1,59 +1,51 @@ -MPI4Py (MPI for Python) +MPI4Py (MPI for Python) ======================= OpenMPI interface to Python - - Introduction ------------ +MPI for Python provides bindings of the Message Passing Interface (MPI) standard for the Python programming language, allowing any Python program to exploit multiple processors. -MPI for Python provides bindings of the Message Passing Interface (MPI) -standard for the Python programming language, allowing any Python -program to exploit multiple processors. - -This package is constructed on top of the MPI-1/2 specifications and -provides an object oriented interface which closely follows MPI-2 C++ -bindings. It supports point-to-point (sends, receives) and collective -(broadcasts, scatters, gathers) communications of any picklable Python -object, as well as optimized communications of Python object exposing -the single-segment buffer interface (NumPy arrays, builtin -bytes/string/array objects). +This package is constructed on top of the MPI-1/2 specifications and provides an object oriented interface which closely follows MPI-2 C++ bindings. It supports point-to-point (sends, receives) and collective (broadcasts, scatters, gathers) communications of any picklable Python object, as well as optimized communications of Python object exposing the single-segment buffer interface (NumPy arrays, builtin bytes/string/array objects). On Anselm MPI4Py is available in standard Python modules. Modules ------- +MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to load Python and OpenMPI modules. -MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to -load Python and OpenMPI modules. - +```bash $ module load python $ module load openmpi +``` Execution --------- +You need to import MPI to your python program. Include the following line to the python script: -You need to import MPI to your python program. Include the following -line to the python script: - +```cpp from mpi4py import MPI +``` -The MPI4Py enabled python programs [execute as any other -OpenMPI](Running_OpenMPI.html) code.The simpliest way is -to run +The MPI4Py enabled python programs [execute as any other OpenMPI](Running_OpenMPI.html) code.The simpliest way is to run +```bash $ mpiexec python <script>.py +``` For example +```bash $ mpiexec python hello_world.py +``` Examples -------- ### Hello world! +```cpp from mpi4py import MPI comm = MPI.COMM_WORLD @@ -61,9 +53,11 @@ Examples print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size) comm.Barrier()  # wait for everybody to synchronize +``` ###Collective Communication with NumPy arrays +```cpp from mpi4py import MPI from __future__ import division import numpy as np @@ -88,18 +82,17 @@ Examples # Everybody should now have the same... print "[%02d] %s" % (comm.rank, A) +``` Execute the above code as: +```bash $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I $ module load python openmpi $ mpiexec -bycore -bind-to-core python hello_world.py +``` -In this example, we run MPI4Py enabled code on 4 nodes, 16 cores per -node (total of 64 processes), each python process is bound to a -different core. -More examples and documentation can be found on [MPI for Python -webpage](https://pythonhosted.org/mpi4py/usrman/index.html). +In this example, we run MPI4Py enabled code on 4 nodes, 16 cores per node (total of 64 processes), each python process is bound to a different core. More examples and documentation can be found on [MPI for Python webpage](https://pythonhosted.org/mpi4py/usrman/index.html). diff --git a/docs.it4i/anselm-cluster-documentation/software/mpi-1/running-mpich2.md b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md similarity index 50% rename from docs.it4i/anselm-cluster-documentation/software/mpi-1/running-mpich2.md rename to docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md index cf4b32cc7d7574df4928fcb0e9c1b64afab0dce2..13ce6296e70e8e4ba5905d372f14305508d44306 100644 --- a/docs.it4i/anselm-cluster-documentation/software/mpi-1/running-mpich2.md +++ b/docs.it4i/anselm-cluster-documentation/software/mpi/running-mpich2.md @@ -1,22 +1,17 @@ -Running MPICH2 +Running MPICH2 ============== - - MPICH2 program execution ------------------------ - -The MPICH2 programs use mpd daemon or ssh connection to spawn processes, -no PBS support is needed. However the PBS allocation is required to -access compute nodes. On Anselm, the **Intel MPI** and **mpich2 1.9** -are MPICH2 based MPI implementations. +The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS support is needed. However the PBS allocation is required to access compute nodes. On Anselm, the **Intel MPI** and **mpich2 1.9** are MPICH2 based MPI implementations. ### Basic usage -Use the mpirun to execute the MPICH2 code. +>Use the mpirun to execute the MPICH2 code. Example: +```bash $ qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready @@ -28,18 +23,14 @@ Example: Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 Hello world! from rank 3 of 4 on host cn110 +``` -In this example, we allocate 4 nodes via the express queue -interactively. We set up the intel MPI environment and interactively run -the helloworld_mpi.x program. We request MPI to spawn 1 process per -node. -Note that the executable helloworld_mpi.x must be available within the -same path on all nodes. This is automatically fulfilled on the /home and -/scratch filesystem. +In this example, we allocate 4 nodes via the express queue interactively. We set up the intel MPI environment and interactively run the helloworld_mpi.x program. We request MPI to spawn 1 process per node. +Note that the executable helloworld_mpi.x must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. -You need to preload the executable, if running on the local scratch -/lscratch filesystem +You need to preload the executable, if running on the local scratch /lscratch filesystem +```bash $ pwd /lscratch/15210.srv11 $ mpirun -ppn 1 -hostfile $PBS_NODEFILE cp /home/username/helloworld_mpi.x . @@ -48,145 +39,124 @@ You need to preload the executable, if running on the local scratch Hello world! from rank 1 of 4 on host cn108 Hello world! from rank 2 of 4 on host cn109 Hello world! from rank 3 of 4 on host cn110 +``` -In this example, we assume the executable helloworld_mpi.x is present -on shared home directory. We run the cp command via mpirun, copying the -executable from shared home to local scratch . Second mpirun will -execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, -cn108, cn109 and cn110, one process per node. +In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node. -MPI process mapping may be controlled by PBS parameters. +>MPI process mapping may be controlled by PBS parameters. -The mpiprocs and ompthreads parameters allow for selection of number of -running MPI processes per node as well as number of OpenMP threads per -MPI process. +The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process. ### One MPI process per node -Follow this example to run one MPI process per node, 16 threads per -process. Note that no options to mpirun are needed +Follow this example to run one MPI process per node, 16 threads per process. Note that no options to mpirun are needed +```bash $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I $ module load mvapich2 $ mpirun ./helloworld_mpi.x +``` -In this example, we demonstrate recommended way to run an MPI -application, using 1 MPI processes per node and 16 threads per socket, -on 4 nodes. +In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 16 threads per socket, on 4 nodes. ### Two MPI processes per node -Follow this example to run two MPI processes per node, 8 threads per -process. Note the options to mpirun for mvapich2. No options are needed -for impi. +Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpirun for mvapich2. No options are needed for impi. +```bash $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I $ module load mvapich2 $ mpirun -bind-to numa ./helloworld_mpi.x +``` -In this example, we demonstrate recommended way to run an MPI -application, using 2 MPI processes per node and 8 threads per socket, -each process and its threads bound to a separate processor socket of the -node, on 4 nodes +In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 8 threads per socket, each process and its threads bound to a separate processor socket of the node, on 4 nodes ### 16 MPI processes per node -Follow this example to run 16 MPI processes per node, 1 thread per -process. Note the options to mpirun for mvapich2. No options are needed -for impi. +Follow this example to run 16 MPI processes per node, 1 thread per process. Note the options to mpirun for mvapich2. No options are needed for impi. +```bash $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I $ module load mvapich2 $ mpirun -bind-to core ./helloworld_mpi.x +``` -In this example, we demonstrate recommended way to run an MPI -application, using 16 MPI processes per node, single threaded. Each -process is bound to separate processor core, on 4 nodes. +In this example, we demonstrate recommended way to run an MPI application, using 16 MPI processes per node, single threaded. Each process is bound to separate processor core, on 4 nodes. ### OpenMP thread affinity -Important! Bind every OpenMP thread to a core! +>Important! Bind every OpenMP thread to a core! -In the previous two examples with one or two MPI processes per node, the -operating system might still migrate OpenMP threads between cores. You -might want to avoid this by setting these environment variable for GCC -OpenMP: +In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: +```bash $ export GOMP_CPU_AFFINITY="0-15" +``` or this one for Intel OpenMP: +```bash $ export KMP_AFFINITY=granularity=fine,compact,1,0 +``` -As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and -later) the following variables may be used for Intel or GCC: +As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: +```bash $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores - - + $ export OMP_PLACES=cores +``` MPICH2 Process Mapping and Binding ---------------------------------- - -The mpirun allows for precise selection of how the MPI processes will be -mapped to the computational nodes and how these processes will bind to -particular processor sockets and cores. +The mpirun allows for precise selection of how the MPI processes will be mapped to the computational nodes and how these processes will bind to particular processor sockets and cores. ### Machinefile -Process mapping may be controlled by specifying a machinefile input to -the mpirun program. Altough all implementations of MPI provide means for -process mapping and binding, following examples are valid for the impi -and mvapich2 only. +Process mapping may be controlled by specifying a machinefile input to the mpirun program. Altough all implementations of MPI provide means for process mapping and binding, following examples are valid for the impi and mvapich2 only. Example machinefile +```bash cn110.bullx cn109.bullx cn108.bullx cn17.bullx cn108.bullx +``` Use the machinefile to control process placement +```bash $ mpirun -machinefile machinefile helloworld_mpi.x Hello world! from rank 0 of 5 on host cn110 Hello world! from rank 1 of 5 on host cn109 Hello world! from rank 2 of 5 on host cn108 Hello world! from rank 3 of 5 on host cn17 Hello world! from rank 4 of 5 on host cn108 +``` -In this example, we see that ranks have been mapped on nodes according -to the order in which nodes show in the machinefile +In this example, we see that ranks have been mapped on nodes according to the order in which nodes show in the machinefile ### Process Binding -The Intel MPI automatically binds each process and its threads to the -corresponding portion of cores on the processor socket of the node, no -options needed. The binding is primarily controlled by environment -variables. Read more about mpi process binding on [Intel -website](https://software.intel.com/sites/products/documentation/hpc/ics/impi/41/lin/Reference_Manual/Environment_Variables_Process_Pinning.htm). -The MPICH2 uses the -bind-to option Use -bind-to numa or -bind-to core -to bind the process on single core or entire socket. +The Intel MPI automatically binds each process and its threads to the corresponding portion of cores on the processor socket of the node, no options needed. The binding is primarily controlled by environment variables. Read more about mpi process binding on [Intel website](https://software.intel.com/sites/products/documentation/hpc/ics/impi/41/lin/Reference_Manual/Environment_Variables_Process_Pinning.htm). The MPICH2 uses the -bind-to option Use -bind-to numa or -bind-to core to bind the process on single core or entire socket. ### Bindings verification In all cases, binding and threading may be verified by executing +```bash $ mpirun -bindto numa numactl --show $ mpirun -bindto numa echo $OMP_NUM_THREADS +``` Intel MPI on Xeon Phi --------------------- -The[MPI section of Intel Xeon Phi -chapter](../intel-xeon-phi.html) provides details on how -to run Intel MPI code on Xeon Phi architecture. - +The[MPI section of Intel Xeon Phi chapter](../intel-xeon-phi.html) provides details on how to run Intel MPI code on Xeon Phi architecture. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/copy_of_matlab.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/copy_of_matlab.md deleted file mode 100644 index eec66a690a4eb80bcb53fe398bfe0fe6d514147d..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/copy_of_matlab.md +++ /dev/null @@ -1,346 +0,0 @@ -Matlab -====== - - - -Introduction ------------- - -Matlab is available in versions R2015a and R2015b. There are always two -variants of the release: - -- Non commercial or so called EDU variant, which can be used for - common research and educational purposes. -- Commercial or so called COM variant, which can used also for - commercial activities. The licenses for commercial variant are much - more expensive, so usually the commercial variant has only subset of - features compared to the EDU available. - - - -To load the latest version of Matlab load the module - - $ module load MATLAB - -By default the EDU variant is marked as default. If you need other -version or variant, load the particular version. To obtain the list of -available versions use - - $ module avail MATLAB - -If you need to use the Matlab GUI to prepare your Matlab programs, you -can use Matlab directly on the login nodes. But for all computations use -Matlab on the compute nodes via PBS Pro scheduler. - -If you require the Matlab GUI, please follow the general informations -about [running graphical -applications](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). - -Matlab GUI is quite slow using the X forwarding built in the PBS (qsub --X), so using X11 display redirection either via SSH or directly by -xauth (please see the "GUI Applications on Compute Nodes over VNC" part -[here](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)) -is recommended. - -To run Matlab with GUI, use - - $ matlab - -To run Matlab in text mode, without the Matlab Desktop GUI environment, -use - - $ matlab -nodesktop -nosplash - -plots, images, etc... will be still available. - -Running parallel Matlab using Distributed Computing Toolbox / Engine ------------------------------------------------------------------------- - -Distributed toolbox is available only for the EDU variant - -The MPIEXEC mode available in previous versions is no longer available -in MATLAB 2015. Also, the programming interface has changed. Refer -to [Release -Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1). - -Delete previously used file mpiLibConf.m, we have observed crashes when -using Intel MPI. - -To use Distributed Computing, you first need to setup a parallel -profile. We have provided the profile for you, you can either import it -in MATLAB command line: - - > parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonPBSPro.settings') - - ans = - - SalomonPBSPro - -Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster -Profiles..., click Import and navigate to : - -/apps/all/MATLAB/2015a-EDU/SalomonPBSPro.settings - -With the new mode, MATLAB itself launches the workers via PBS, so you -can either use interactive mode or a batch mode on one node, but the -actual parallel processing will be done in a separate job started by -MATLAB itself. Alternatively, you can use "local" mode to run parallel -code on just a single node. - -The profile is confusingly named Salomon, but you can use it also on -Anselm. - -### Parallel Matlab interactive session - -Following example shows how to start interactive session with support -for Matlab GUI. For more information about GUI based applications on -Anselm see [this -page](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). - - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 - -This qsub command example shows how to run Matlab on a single node. - -The second part of the command shows how to request all necessary -licenses. In this case 1 Matlab-EDU license and 48 Distributed Computing -Engines licenses. - -Once the access to compute nodes is granted by PBS, user can load -following modules and start Matlab: - - r1i0n17$ module load MATLAB/2015b-EDU - r1i0n17$ matlab & - -### Parallel Matlab batch job in Local mode - -To run matlab in batch mode, write an matlab script, then write a bash -jobscript and execute via the qsub command. By default, matlab will -execute one matlab worker instance per allocated core. - - #!/bin/bash - #PBS -A PROJECT ID - #PBS -q qprod - #PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1 - - # change to shared scratch directory - SCR=/scratch/work/user/$USER/$PBS_JOBID - mkdir -p $SCR ; cd $SCR || exit - - # copy input file to scratch - cp $PBS_O_WORKDIR/matlabcode.m . - - # load modules - module load MATLAB/2015a-EDU - - # execute the calculation - matlab -nodisplay -r matlabcode > output.out - - # copy output file to home - cp output.out $PBS_O_WORKDIR/. - -This script may be submitted directly to the PBS workload manager via -the qsub command. The inputs and matlab script are in matlabcode.m -file, outputs in output.out file. Note the missing .m extension in the -matlab -r matlabcodefile call, **the .m must not be included**. Note -that the **shared /scratch must be used**. Further, it is **important to -include quit** statement at the end of the matlabcode.m script. - -Submit the jobscript using qsub - - $ qsub ./jobscript - -### Parallel Matlab Local mode program example - -The last part of the configuration is done directly in the user Matlab -script before Distributed Computing Toolbox is started. - - cluster = parcluster('local') - -This script creates scheduler object "cluster" of type "local" that -starts workers locally. - -Please note: Every Matlab script that needs to initialize/use matlabpool -has to contain these three lines prior to calling parpool(sched, ...) -function. - -The last step is to start matlabpool with "cluster" object and correct -number of workers. We have 24 cores per node, so we start 24 workers. - - parpool(cluster,16); - - - ... parallel code ... - - - parpool close - -The complete example showing how to use Distributed Computing Toolbox in -local mode is shown here. - - cluster = parcluster('local'); - cluster - - parpool(cluster,24); - - n=2000; - - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') -    -    T = W*x; % Calculation performed on labs, in parallel. -             % T and W are both codistributed arrays here. - end - T; - whos        % T and W are both distributed arrays here. - - parpool close - quit - -You can copy and paste the example in a .m file and execute. Note that -the parpool size should correspond to **total number of cores** -available on allocated nodes. - -### Parallel Matlab Batch job using PBS mode (workers spawned in a separate job) - -This mode uses PBS scheduler to launch the parallel pool. It uses the -SalomonPBSPro profile that needs to be imported to Cluster Manager, as -mentioned before. This methodod uses MATLAB's PBS Scheduler interface - -it spawns the workers in a separate job submitted by MATLAB using qsub. - -This is an example of m-script using PBS mode: - - cluster = parcluster('SalomonPBSPro'); - set(cluster, 'SubmitArguments', '-A OPEN-0-0'); - set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=16'); - set(cluster, 'NumWorkers', 160); - - pool = parpool(cluster, 160); - - n=2000; - - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') - - T = W*x; % Calculation performed on labs, in parallel. - % T and W are both codistributed arrays here. - end - whos % T and W are both distributed arrays here. - - % shut down parallel pool - delete(pool) - -Note that we first construct a cluster object using the imported -profile, then set some important options, namely : SubmitArguments, -where you need to specify accounting id, and ResourceTemplate, where you -need to specify number of nodes to run the job. - -You can start this script using batch mode the same way as in Local mode -example. - -### Parallel Matlab Batch with direct launch (workers spawned within the existing job) - -This method is a "hack" invented by us to emulate the mpiexec -functionality found in previous MATLAB versions. We leverage the MATLAB -Generic Scheduler interface, but instead of submitting the workers to -PBS, we launch the workers directly within the running job, thus we -avoid the issues with master script and workers running in separate jobs -(issues with license not available, waiting for the worker's job to -spawn etc.) - -Please note that this method is experimental. - -For this method, you need to use SalomonDirect profile, import it -using [the same way as -SalomonPBSPro](copy_of_matlab.html#running-parallel-matlab-using-distributed-computing-toolbox---engine) - -This is an example of m-script using direct mode: - - parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonDirect.settings') - cluster = parcluster('SalomonDirect'); - set(cluster, 'NumWorkers', 48); - - pool = parpool(cluster, 48); - - n=2000; - - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') - - T = W*x; % Calculation performed on labs, in parallel. - % T and W are both codistributed arrays here. - end - whos % T and W are both distributed arrays here. - - % shut down parallel pool - delete(pool) - -### Non-interactive Session and Licenses - -If you want to run batch jobs with Matlab, be sure to request -appropriate license features with the PBS Pro scheduler, at least the " --l __feature__matlab__MATLAB=1" for EDU variant of Matlab. More -information about how to check the license features states and how to -request them with PBS Pro, please [look -here](../isv_licenses.html). - -In case of non-interactive session please read the [following -information](../isv_licenses.html) on how to modify the -qsub command to test for available licenses prior getting the resource -allocation. - -### Matlab Distributed Computing Engines start up time - -Starting Matlab workers is an expensive process that requires certain -amount of time. For your information please see the following table: - - |compute nodes|number of workers|start-up time[s]| - |---|---|---| - |16|384|831| - |8|192|807| - |4|96|483| - |2|48|16| - -MATLAB on UV2000 ------------------ - -UV2000 machine available in queue "qfat" can be used for MATLAB -computations. This is a SMP NUMA machine with large amount of RAM, which -can be beneficial for certain types of MATLAB jobs. CPU cores are -allocated in chunks of 8 for this machine. - -You can use MATLAB on UV2000 in two parallel modes : - -### Threaded mode - -Since this is a SMP machine, you can completely avoid using Parallel -Toolbox and use only MATLAB's threading. MATLAB will automatically -detect the number of cores you have allocated and will set -maxNumCompThreads accordingly and certain -operations, such as fft, , eig, svd, -etc. will be automatically run in threads. The advantage of this mode is -that you don't need to modify your existing sequential codes. - -### Local cluster mode - -You can also use Parallel Toolbox on UV2000. Use l[ocal cluster -mode](copy_of_matlab.html#parallel-matlab-batch-job-in-local-mode), -"SalomonPBSPro" profile will not work. - - - - - diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/introduction.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/introduction.md index 6c425451b294e75ad697aaca3de1dfe83491ab1a..26594af551c0d33f51e72b56d5373fd90da1d060 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/introduction.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/introduction.md @@ -1,48 +1,42 @@ -Numerical languages +Numerical languages =================== Interpreted languages for numerical computations and analysis - - Introduction ------------ - -This section contains a collection of high-level interpreted languages, -primarily intended for numerical computations. +This section contains a collection of high-level interpreted languages, primarily intended for numerical computations. Matlab ------ +MATLAB®^ is a high-level language and interactive environment for numerical computation, visualization, and programming. -MATLAB®^ is a high-level language and interactive environment for -numerical computation, visualization, and programming. - +```bash $ module load MATLAB/2015b-EDU $ matlab +``` -Read more at the [Matlab -page](matlab.html). +Read more at the [Matlab page](matlab.md). Octave ------ +GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to Matlab so that most programs are easily portable. -GNU Octave is a high-level interpreted language, primarily intended for -numerical computations. The Octave language is quite similar to Matlab -so that most programs are easily portable. - +```bash $ module load Octave $ octave +``` -Read more at the [Octave page](octave.html). +Read more at the [Octave page](octave.md). R -- +--- -The R is an interpreted language and environment for statistical -computing and graphics. +The R is an interpreted language and environment for statistical computing and graphics. +```bash $ module load R $ R +``` -Read more at the [R page](r.html). - +Read more at the [R page](r.md). \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab 2013-2014.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab 2013-2014.md new file mode 100644 index 0000000000000000000000000000000000000000..3ccc8a4f114dd286e1df2441bd31207a5d304323 --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab 2013-2014.md @@ -0,0 +1,205 @@ +Matlab 2013-2014 +================ + +Introduction +------------ +>This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](copy_of_matlab.html). + +Matlab is available in the latest stable version. There are always two variants of the release: + +- Non commercial or so called EDU variant, which can be used for common research and educational purposes. +- Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. + +To load the latest version of Matlab load the module + +```bash + $ module load matlab +``` + +By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use + +```bash + $ module avail matlab +``` + +If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. + +If you require the Matlab GUI, please follow the general informations about [running graphical applications](https://docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). + +Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so using X11 display redirection either via SSH or directly by xauth (please see the "GUI Applications on Compute Nodes over VNC" part [here](https://docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)) is recommended. + +To run Matlab with GUI, use + +```bash + $ matlab +``` + +To run Matlab in text mode, without the Matlab Desktop GUI environment, use + +```bash```bash + $ matlab -nodesktop -nosplash +``` + +plots, images, etc... will be still available. + +Running parallel Matlab using Distributed Computing Toolbox / Engine +-------------------------------------------------------------------- +Recommended parallel mode for running parallel Matlab on Anselm is MPIEXEC mode. In this mode user allocates resources through PBS prior to starting Matlab. Once resources are granted the main Matlab instance is started on the first compute node assigned to job by PBS and workers are started on all remaining nodes. User can use both interactive and non-interactive PBS sessions. This mode guarantees that the data processing is not performed on login nodes, but all processing is on compute nodes. + + + +For the performance reasons Matlab should use system MPI. On Anselm the supported MPI implementation for Matlab is Intel MPI. To switch to system MPI user has to override default Matlab setting by creating new configuration file in its home directory. The path and file name has to be exactly the same as in the following listing: + +```bash + $ vim ~/matlab/mpiLibConf.m + + function [lib, extras] = mpiLibConf + %MATLAB MPI Library overloading for Infiniband Networks + + mpich = '/opt/intel/impi/4.1.1.036/lib64/'; + + disp('Using Intel MPI 4.1.1.036 over Infiniband') + + lib = strcat(mpich, 'libmpich.so'); + mpl = strcat(mpich, 'libmpl.so'); + opa = strcat(mpich, 'libopa.so'); + + extras = {}; +``` + +System MPI library allows Matlab to communicate through 40Gbps Infiniband QDR interconnect instead of slower 1Gb ethernet network. + +>Please note: The path to MPI library in "mpiLibConf.m" has to match with version of loaded Intel MPI module. In this example the version 4.1.1.036 of Iintel MPI is used by Matlab and therefore module impi/4.1.1.036 has to be loaded prior to starting Matlab. + +### Parallel Matlab interactive session + +Once this file is in place, user can request resources from PBS. Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see [this page](https://docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). + +```bash + $ xhost + + $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=4:ncpus=16:mpiprocs=16 -l walltime=00:30:00 + -l feature__matlab__MATLAB=1 +``` + +This qsub command example shows how to run Matlab with 32 workers in following configuration: 2 nodes (use all 16 cores per node) and 16 workers = mpirocs per node (-l select=2:ncpus=16:mpiprocs=16). If user requires to run smaller number of workers per node then the "mpiprocs" parameter has to be changed. + +The second part of the command shows how to request all necessary licenses. In this case 1 Matlab-EDU license and 32 Distributed Computing Engines licenses. + +Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: + +```bash + cn79$ module load matlab/R2013a-EDU + cn79$ module load impi/4.1.1.036 + cn79$ matlab & +``` + +### Parallel Matlab batch job + +To run matlab in batch mode, write an matlab script, then write a bash jobscript and execute via the qsub command. By default, matlab will execute one matlab worker instance per allocated core. + +```bash + #!/bin/bash + #PBS -A PROJECT ID + #PBS -q qprod + #PBS -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 + + # change to shared scratch directory + SCR=/scratch/$USER/$PBS_JOBID + mkdir -p $SCR ; cd $SCR || exit + + # copy input file to scratch + cp $PBS_O_WORKDIR/matlabcode.m . + + # load modules + module load matlab/R2013a-EDU + module load impi/4.1.1.036 + + # execute the calculation + matlab -nodisplay -r matlabcode > output.out + + # copy output file to home + cp output.out $PBS_O_WORKDIR/. +``` + +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs and matlab script are in matlabcode.m file, outputs in output.out file. Note the missing .m extension in the matlab -r matlabcodefile call, **the .m must not be included**. Note that the **shared /scratch must be used**. Further, it is **important to include quit** statement at the end of the matlabcode.m script. + +Submit the jobscript using qsub + +```bash + $ qsub ./jobscript +``` + +### Parallel Matlab program example + +The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. + +```bash + sched = findResource('scheduler', 'type', 'mpiexec'); + set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun'); + set(sched, 'EnvironmentSetMethod', 'setenv'); +``` + +This script creates scheduler object "sched" of type "mpiexec" that starts workers using mpirun tool. To use correct version of mpirun, the second line specifies the path to correct version of system Intel MPI library. + +>Please note: Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling matlabpool(sched, ...) function. + +The last step is to start matlabpool with "sched" object and correct number of workers. In this case qsub asked for total number of 32 cores, therefore the number of workers is also set to 32. + +```bash + matlabpool(sched,32); + + + ... parallel code ... + + + matlabpool close +``` + +The complete example showing how to use Distributed Computing Toolbox is show here. + +```bash + sched = findResource('scheduler', 'type', 'mpiexec'); + set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun') + set(sched, 'EnvironmentSetMethod', 'setenv') + set(sched, 'SubmitArguments', '') + sched + + matlabpool(sched,32); + + n=2000; + + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') + +    T = W*x; % Calculation performed on labs, in parallel. +             % T and W are both codistributed arrays here. + end + T; + whos        % T and W are both distributed arrays here. + + matlabpool close + quit +``` + +You can copy and paste the example in a .m file and execute. Note that the matlabpool size should correspond to **total number of cores** available on allocated nodes. + +### Non-interactive Session and Licenses + +If you want to run batch jobs with Matlab, be sure to request appropriate license features with the PBS Pro scheduler, at least the " -l _feature_matlab_MATLAB=1" for EDU variant of Matlab. More information about how to check the license features states and how to request them with PBS Pro, please [look here](../isv_licenses.html). + +In case of non-interactive session please read the [following information](../isv_licenses.html) on how to modify the qsub command to test for available licenses prior getting the resource allocation. + +### Matlab Distributed Computing Engines start up time + +Starting Matlab workers is an expensive process that requires certain amount of time. For your information please see the following table: + + |compute nodes|number of workers|start-up time[s]| + |---|---|---| + |16|256|1008| + |8|128|534| + |4|64|333| + |2|32|210| \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md index 9b6b8a062e6c52bfc5a860b3078ece7ea9e14be2..b57aa7ee2ba31b78dba59595f3347abe21cc12c5 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/matlab.md @@ -1,213 +1,157 @@ -Matlab 2013-2014 -================ - - +Matlab +====== Introduction ------------ +Matlab is available in versions R2015a and R2015b. There are always two variants of the release: -This document relates to the old versions R2013 and R2014. For MATLAB -2015, please use [this documentation -instead](copy_of_matlab.html). - -Matlab is available in the latest stable version. There are always two -variants of the release: - -- Non commercial or so called EDU variant, which can be used for - common research and educational purposes. -- Commercial or so called COM variant, which can used also for - commercial activities. The licenses for commercial variant are much - more expensive, so usually the commercial variant has only subset of - features compared to the EDU available. - - +- Non commercial or so called EDU variant, which can be used for common research and educational purposes. +- Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. To load the latest version of Matlab load the module - $ module load matlab +```bash + $ module load MATLAB +``` -By default the EDU variant is marked as default. If you need other -version or variant, load the particular version. To obtain the list of -available versions use +By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use - $ module avail matlab +```bash + $ module avail MATLAB +``` -If you need to use the Matlab GUI to prepare your Matlab programs, you -can use Matlab directly on the login nodes. But for all computations use -Matlab on the compute nodes via PBS Pro scheduler. +If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. -If you require the Matlab GUI, please follow the general informations -about [running graphical -applications](https://docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). +If you require the Matlab GUI, please follow the general informations about [running graphical applications](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). -Matlab GUI is quite slow using the X forwarding built in the PBS (qsub --X), so using X11 display redirection either via SSH or directly by -xauth (please see the "GUI Applications on Compute Nodes over VNC" part -[here](https://docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)) -is recommended. +Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so using X11 display redirection either via SSH or directly by xauth (please see the "GUI Applications on Compute Nodes over VNC" part [here](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)) is recommended. To run Matlab with GUI, use +```bash $ matlab +``` -To run Matlab in text mode, without the Matlab Desktop GUI environment, -use +To run Matlab in text mode, without the Matlab Desktop GUI environment, use +```bash $ matlab -nodesktop -nosplash +``` plots, images, etc... will be still available. Running parallel Matlab using Distributed Computing Toolbox / Engine --------------------------------------------------------------------- +------------------------------------------------------------------------ +>Distributed toolbox is available only for the EDU variant -Recommended parallel mode for running parallel Matlab on Anselm is -MPIEXEC mode. In this mode user allocates resources through PBS prior to -starting Matlab. Once resources are granted the main Matlab instance is -started on the first compute node assigned to job by PBS and workers are -started on all remaining nodes. User can use both interactive and -non-interactive PBS sessions. This mode guarantees that the data -processing is not performed on login nodes, but all processing is on -compute nodes. +The MPIEXEC mode available in previous versions is no longer available in MATLAB 2015. Also, the programming interface has changed. Refer to [Release Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1). -  +Delete previously used file mpiLibConf.m, we have observed crashes when using Intel MPI. -For the performance reasons Matlab should use system MPI. On Anselm the -supported MPI implementation for Matlab is Intel MPI. To switch to -system MPI user has to override default Matlab setting by creating new -configuration file in its home directory. The path and file name has to -be exactly the same as in the following listing: +To use Distributed Computing, you first need to setup a parallel profile. We have provided the profile for you, you can either import it in MATLAB command line: - $ vim ~/matlab/mpiLibConf.m +```bash + >> parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonPBSPro.settings') - function [lib, extras] = mpiLibConf - %MATLAB MPI Library overloading for Infiniband Networks + ans = - mpich = '/opt/intel/impi/4.1.1.036/lib64/'; + SalomonPBSPro +``` - disp('Using Intel MPI 4.1.1.036 over Infiniband') +Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click Import and navigate to: - lib = strcat(mpich, 'libmpich.so'); - mpl = strcat(mpich, 'libmpl.so'); - opa = strcat(mpich, 'libopa.so'); +/apps/all/MATLAB/2015a-EDU/SalomonPBSPro.settings - extras = {}; +With the new mode, MATLAB itself launches the workers via PBS, so you can either use interactive mode or a batch mode on one node, but the actual parallel processing will be done in a separate job started by MATLAB itself. Alternatively, you can use "local" mode to run parallel code on just a single node. -System MPI library allows Matlab to communicate through 40Gbps -Infiniband QDR interconnect instead of slower 1Gb ethernet network. - -Please note: The path to MPI library in "mpiLibConf.m" has to match with -version of loaded Intel MPI module. In this example the version -4.1.1.036 of Iintel MPI is used by Matlab and therefore module -impi/4.1.1.036 has to be loaded prior to starting Matlab. +>The profile is confusingly named Salomon, but you can use it also on Anselm. ### Parallel Matlab interactive session -Once this file is in place, user can request resources from PBS. -Following example shows how to start interactive session with support -for Matlab GUI. For more information about GUI based applications on -Anselm see [this -page](https://docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). +Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see [this page](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). +```bash $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=4:ncpus=16:mpiprocs=16 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 + $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 + -l feature__matlab__MATLAB=1 +``` -This qsub command example shows how to run Matlab with 32 workers in -following configuration: 2 nodes (use all 16 cores per node) and 16 -workers = mpirocs per node (-l select=2:ncpus=16:mpiprocs=16). If user -requires to run smaller number of workers per node then the "mpiprocs" -parameter has to be changed. +This qsub command example shows how to run Matlab on a single node. -The second part of the command shows how to request all necessary -licenses. In this case 1 Matlab-EDU license and 32 Distributed Computing -Engines licenses. +The second part of the command shows how to request all necessary licenses. In this case 1 Matlab-EDU license and 48 Distributed Computing Engines licenses. -Once the access to compute nodes is granted by PBS, user can load -following modules and start Matlab: +Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: - cn79$ module load matlab/R2013a-EDU - cn79$ module load impi/4.1.1.036 - cn79$ matlab & +```bash + r1i0n17$ module load MATLAB/2015b-EDU + r1i0n17$ matlab & +``` -### Parallel Matlab batch job +### Parallel Matlab batch job in Local mode -To run matlab in batch mode, write an matlab script, then write a bash -jobscript and execute via the qsub command. By default, matlab will -execute one matlab worker instance per allocated core. +To run matlab in batch mode, write an matlab script, then write a bash jobscript and execute via the qsub command. By default, matlab will execute one matlab worker instance per allocated core. +```bash #!/bin/bash #PBS -A PROJECT ID #PBS -q qprod - #PBS -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 + #PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1 # change to shared scratch directory - SCR=/scratch/$USER/$PBS_JOBID + SCR=/scratch/work/user/$USER/$PBS_JOBID mkdir -p $SCR ; cd $SCR || exit - # copy input file to scratch + # copy input file to scratch cp $PBS_O_WORKDIR/matlabcode.m . # load modules - module load matlab/R2013a-EDU - module load impi/4.1.1.036 + module load MATLAB/2015a-EDU # execute the calculation matlab -nodisplay -r matlabcode > output.out # copy output file to home cp output.out $PBS_O_WORKDIR/. +``` -This script may be submitted directly to the PBS workload manager via -the qsub command. The inputs and matlab script are in matlabcode.m -file, outputs in output.out file. Note the missing .m extension in the -matlab -r matlabcodefile call, **the .m must not be included**. Note -that the **shared /scratch must be used**. Further, it is **important to -include quit** statement at the end of the matlabcode.m script. +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs and matlab script are in matlabcode.m file, outputs in output.out file. Note the missing .m extension in the matlab -r matlabcodefile call, **the .m must not be included**. Note that the **shared /scratch must be used**. Further, it is **important to include quit** statement at the end of the matlabcode.m script. Submit the jobscript using qsub +```bash $ qsub ./jobscript +``` + +### Parallel Matlab Local mode program example + +The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. -### Parallel Matlab program example +```bash + cluster = parcluster('local') +``` -The last part of the configuration is done directly in the user Matlab -script before Distributed Computing Toolbox is started. +This script creates scheduler object "cluster" of type "local" that starts workers locally. - sched = findResource('scheduler', 'type', 'mpiexec'); - set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun'); - set(sched, 'EnvironmentSetMethod', 'setenv'); +>Please note: Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. -This script creates scheduler object "sched" of type "mpiexec" that -starts workers using mpirun tool. To use correct version of mpirun, the -second line specifies the path to correct version of system Intel MPI -library. +The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. -Please note: Every Matlab script that needs to initialize/use matlabpool -has to contain these three lines prior to calling matlabpool(sched, ...) -function. +```bash + parpool(cluster,16); -The last step is to start matlabpool with "sched" object and correct -number of workers. In this case qsub asked for total number of 32 cores, -therefore the number of workers is also set to 32. - matlabpool(sched,32); - - ... parallel code ... - - - matlabpool close -The complete example showing how to use Distributed Computing Toolbox is -show here. - sched = findResource('scheduler', 'type', 'mpiexec'); - set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun') - set(sched, 'EnvironmentSetMethod', 'setenv') - set(sched, 'SubmitArguments', '') - sched + parpool close +``` +The complete example showing how to use Distributed Computing Toolbox in local mode is shown here. - matlabpool(sched,32); +```bash + cluster = parcluster('local'); + cluster + + parpool(cluster,24); n=2000; @@ -217,47 +161,117 @@ show here. x = distributed(x); spmd [~, name] = system('hostname') -    +    T = W*x; % Calculation performed on labs, in parallel.             % T and W are both codistributed arrays here. end T; whos        % T and W are both distributed arrays here. - matlabpool close + parpool close quit +``` + +You can copy and paste the example in a .m file and execute. Note that the parpool size should correspond to **total number of cores** available on allocated nodes. + +### Parallel Matlab Batch job using PBS mode (workers spawned in a separate job) + +This mode uses PBS scheduler to launch the parallel pool. It uses the SalomonPBSPro profile that needs to be imported to Cluster Manager, as mentioned before. This methodod uses MATLAB's PBS Scheduler interface - it spawns the workers in a separate job submitted by MATLAB using qsub. + +This is an example of m-script using PBS mode: + +```bash + cluster = parcluster('SalomonPBSPro'); + set(cluster, 'SubmitArguments', '-A OPEN-0-0'); + set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=16'); + set(cluster, 'NumWorkers', 160); + + pool = parpool(cluster, 160); + + n=2000; + + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') + + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. + end + whos % T and W are both distributed arrays here. + + % shut down parallel pool + delete(pool) +``` + +Note that we first construct a cluster object using the imported profile, then set some important options, namely: SubmitArguments, where you need to specify accounting id, and ResourceTemplate, where you need to specify number of nodes to run the job. + +You can start this script using batch mode the same way as in Local mode example. + +### Parallel Matlab Batch with direct launch (workers spawned within the existing job) + +This method is a "hack" invented by us to emulate the mpiexec functionality found in previous MATLAB versions. We leverage the MATLAB Generic Scheduler interface, but instead of submitting the workers to PBS, we launch the workers directly within the running job, thus we avoid the issues with master script and workers running in separate jobs (issues with license not available, waiting for the worker's job to spawn etc.) + +Please note that this method is experimental. -You can copy and paste the example in a .m file and execute. Note that -the matlabpool size should correspond to **total number of cores** -available on allocated nodes. +For this method, you need to use SalomonDirect profile, import it using [the same way as SalomonPBSPro](copy_of_matlab.html#running-parallel-matlab-using-distributed-computing-toolbox---engine) + +This is an example of m-script using direct mode: + +```bash + parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonDirect.settings') + cluster = parcluster('SalomonDirect'); + set(cluster, 'NumWorkers', 48); + + pool = parpool(cluster, 48); + + n=2000; + + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') + + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. + end + whos % T and W are both distributed arrays here. + + % shut down parallel pool + delete(pool) +``` ### Non-interactive Session and Licenses -If you want to run batch jobs with Matlab, be sure to request -appropriate license features with the PBS Pro scheduler, at least the " --l __feature__matlab__MATLAB=1" for EDU variant of Matlab. More -information about how to check the license features states and how to -request them with PBS Pro, please [look -here](../isv_licenses.html). +If you want to run batch jobs with Matlab, be sure to request appropriate license features with the PBS Pro scheduler, at least the " -l _feature_matlab_MATLAB=1" for EDU variant of Matlab. More information about how to check the license features states and how to request them with PBS Pro, please [look here](../isv_licenses.html). -In case of non-interactive session please read the [following -information](../isv_licenses.html) on how to modify the -qsub command to test for available licenses prior getting the resource -allocation. +In case of non-interactive session please read the [following information](../isv_licenses.html) on how to modify the qsub command to test for available licenses prior getting the resource allocation. ### Matlab Distributed Computing Engines start up time -Starting Matlab workers is an expensive process that requires certain -amount of time. For your information please see the following table: +Starting Matlab workers is an expensive process that requires certain amount of time. For your information please see the following table: |compute nodes|number of workers|start-up time[s]| |---|---|---| - 16 256 1008 - 8 128 534 - 4 64 333 - 2 32 210 + |16|384|831| + |8|192|807| + |4|96|483| + |2|48|16| + +MATLAB on UV2000 +----------------- +UV2000 machine available in queue "qfat" can be used for MATLAB computations. This is a SMP NUMA machine with large amount of RAM, which can be beneficial for certain types of MATLAB jobs. CPU cores are allocated in chunks of 8 for this machine. + +You can use MATLAB on UV2000 in two parallel modes: + +### Threaded mode - +Since this is a SMP machine, you can completely avoid using Parallel Toolbox and use only MATLAB's threading. MATLAB will automatically detect the number of cores you have allocated and will set maxNumCompThreads accordingly and certain operations, such as fft, , eig, svd, etc. will be automatically run in threads. The advantage of this mode is that you don't need to modify your existing sequential codes. - +### Local cluster mode +You can also use Parallel Toolbox on UV2000. Use l[ocal cluster mode](copy_of_matlab.html#parallel-matlab-batch-job-in-local-mode), "SalomonPBSPro" profile will not work. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md index 6db86f6251cf7ff58eda6530eddab0dae8ab4de9..91b59910cbe8af9c0f087832bc97d03804f674dd 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/octave.md @@ -1,56 +1,40 @@ -Octave +Octave ====== - - Introduction ------------ - -GNU Octave is a high-level interpreted language, primarily intended for -numerical computations. It provides capabilities for the numerical -solution of linear and nonlinear problems, and for performing other -numerical experiments. It also provides extensive graphics capabilities -for data visualization and manipulation. Octave is normally used through -its interactive command line interface, but it can also be used to write -non-interactive programs. The Octave language is quite similar to Matlab -so that most programs are easily portable. Read more on -<http://www.gnu.org/software/octave/>*** +GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable. Read more on <http://www.gnu.org/software/octave/> Two versions of octave are available on Anselm, via module - Version module - ----------------------------------------------------- |---|---|----------------- - Octave 3.8.2, compiled with GCC and Multithreaded MKL Octave/3.8.2-gimkl-2.11.5 - Octave 4.0.1, compiled with GCC and Multithreaded MKL Octave/4.0.1-gimkl-2.11.5 - Octave 4.0.0, compiled with >GCC and OpenBLAS Octave/4.0.0-foss-2015g +|Version|module| +|---|---| +|Octave 3.8.2, compiled with GCC and Multithreaded MKL|Octave/3.8.2-gimkl-2.11.5| +|Octave 4.0.1, compiled with GCC and Multithreaded MKL|Octave/4.0.1-gimkl-2.11.5| +|Octave 4.0.0, compiled with >GCC and OpenBLAS|Octave/4.0.0-foss-2015g|  Modules and execution ---------------------- $ module load Octave -The octave on Anselm is linked to highly optimized MKL mathematical -library. This provides threaded parallelization to many octave kernels, -notably the linear algebra subroutines. Octave runs these heavy -calculation kernels without any penalty. By default, octave would -parallelize to 16 threads. You may control the threads by setting the -OMP_NUM_THREADS environment variable. +The octave on Anselm is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, octave would parallelize to 16 threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. -To run octave interactively, log in with ssh -X parameter for X11 -forwarding. Run octave: +To run octave interactively, log in with ssh -X parameter for X11 forwarding. Run octave: +```bash $ octave +``` -To run octave in batch mode, write an octave script, then write a bash -jobscript and execute via the qsub command. By default, octave will use -16 threads when running MKL kernels. +To run octave in batch mode, write an octave script, then write a bash jobscript and execute via the qsub command. By default, octave will use 16 threads when running MKL kernels. +```bash #!/bin/bash # change to local scratch directory cd /lscratch/$PBS_JOBID || exit - # copy input file to scratch + # copy input file to scratch cp $PBS_O_WORKDIR/octcode.m . # load octave module @@ -64,40 +48,29 @@ jobscript and execute via the qsub command. By default, octave will use #exit exit +``` -This script may be submitted directly to the PBS workload manager via -the qsub command. The inputs are in octcode.m file, outputs in -output.out file. See the single node jobscript example in the [Job -execution -section](http://support.it4i.cz/docs/anselm-cluster-documentation/resource-allocation-and-job-execution). +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in octcode.m file, outputs in output.out file. See the single node jobscript example in the [Job execution section](http://support.it4i.cz/docs/anselm-cluster-documentation/resource-allocation-and-job-execution). -The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling -native c code. This is very useful for running native c subroutines in -octave environment. +The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. +```bash $ mkoctfile -v +``` -Octave may use MPI for interprocess communication -This functionality is currently not supported on Anselm cluster. In case -you require the octave interface to MPI, please contact [Anselm -support](https://support.it4i.cz/rt/). +Octave may use MPI for interprocess communication This functionality is currently not supported on Anselm cluster. In case you require the octave interface to MPI, please contact [Anselm support](https://support.it4i.cz/rt/). Xeon Phi Support ---------------- - -Octave may take advantage of the Xeon Phi accelerators. This will only -work on the [Intel Xeon Phi](../intel-xeon-phi.html) -[accelerated nodes](../../compute-nodes.html). +Octave may take advantage of the Xeon Phi accelerators. This will only work on the [Intel Xeon Phi](../intel-xeon-phi.html) [accelerated nodes](../../compute-nodes.html). ### Automatic offload support -Octave can accelerate BLAS type operations (in particular the Matrix -Matrix multiplications] on the Xeon Phi accelerator, via [Automatic -Offload using the MKL -library](../intel-xeon-phi.html#section-3) +Octave can accelerate BLAS type operations (in particular the Matrix Matrix multiplications] on the Xeon Phi accelerator, via [Automatic Offload using the MKL library](../intel-xeon-phi.html#section-3) Example +```bash $ export OFFLOAD_REPORT=2 $ export MKL_MIC_ENABLE=1 $ module load octave @@ -111,38 +84,26 @@ Example [MKL] [MIC 00] [AO DGEMM CPU->MIC Data]   1347200000 bytes [MKL] [MIC 00] [AO DGEMM MIC->CPU Data]   2188800000 bytes Elapsed time is 2.93701 seconds. +``` -In this example, the calculation was automatically divided among the CPU -cores and the Xeon Phi MIC accelerator, reducing the total runtime from -6.3 secs down to 2.9 secs. +In this example, the calculation was automatically divided among the CPU cores and the Xeon Phi MIC accelerator, reducing the total runtime from 6.3 secs down to 2.9 secs. ### Native support -A version of [native](../intel-xeon-phi.html#section-4) -Octave is compiled for Xeon Phi accelerators. Some limitations apply for -this version: +A version of [native](../intel-xeon-phi.html#section-4) Octave is compiled for Xeon Phi accelerators. Some limitations apply for this version: -- Only command line support. GUI, graph plotting etc. is - not supported. +- Only command line support. GUI, graph plotting etc. is not supported. - Command history in interactive mode is not supported. -Octave is linked with parallel Intel MKL, so it best suited for batch -processing of tasks that utilize BLAS, LAPACK and FFT operations. By -default, number of threads is set to 120, you can control this -with > OMP_NUM_THREADS environment -variable. +Octave is linked with parallel Intel MKL, so it best suited for batch processing of tasks that utilize BLAS, LAPACK and FFT operations. By default, number of threads is set to 120, you can control this with > OMP_NUM_THREADS environment +variable. -Calculations that do not employ parallelism (either by using parallel -MKL eg. via matrix operations, fork() -function, [parallel -package](http://octave.sourceforge.net/parallel/) or -other mechanism) will actually run slower than on host CPU. +>Calculations that do not employ parallelism (either by using parallel MKL eg. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. To use Octave on a node with Xeon Phi: +```bash $ ssh mic0 # login to the MIC card $ source /apps/tools/octave/3.8.2-mic/bin/octave-env.sh # set up environment variables - $ octave -q /apps/tools/octave/3.8.2-mic/example/test0.m # run an example - - - + $ octave -q /apps/tools/octave/3.8.2-mic/example/test0.m # run an example +``` \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md index 694a9d570eb57a6c8a23934110518d13bca2ae08..779b24cc605063b4da9ee4f69bebd535ba316876 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-languages/r.md @@ -1,76 +1,57 @@ -R -= +R +=== - - -Introduction +Introduction ------------ +The R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. -The R is a language and environment for statistical computing and -graphics. R provides a wide variety of statistical (linear and -nonlinear modelling, classical statistical tests, time-series analysis, -classification, clustering, ...) and graphical techniques, and is highly -extensible. - -One of R's strengths is the ease with which well-designed -publication-quality plots can be produced, including mathematical -symbols and formulae where needed. Great care has been taken over the -defaults for the minor design choices in graphics, but the user retains -full control. +One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. -Another convenience is the ease with which the C code or third party -libraries may be integrated within R. +Another convenience is the ease with which the C code or third party libraries may be integrated within R. Extensive support for parallel computing is available within R. -Read more on <http://www.r-project.org/>, -<http://cran.r-project.org/doc/manuals/r-release/R-lang.html> +Read more on <http://www.r-project.org/>, <http://cran.r-project.org/doc/manuals/r-release/R-lang.html> Modules ------- +The R version 3.0.1 is available on Anselm, along with GUI interface Rstudio -**The R version 3.0.1 is available on Anselm, along with GUI interface -Rstudio - - |Application|Version|module| - ------- |---|---|---- --------- - **R** R 3.0.1 R - |**Rstudio**|Rstudio 0.97|Rstudio| +|Application|Version|module| +|---|---| +| **R**|R 3.0.1|R| +|**Rstudio**|Rstudio 0.97|Rstudio| +```bash $ module load R +``` Execution --------- - -The R on Anselm is linked to highly optimized MKL mathematical -library. This provides threaded parallelization to many R kernels, -notably the linear algebra subroutines. The R runs these heavy -calculation kernels without any penalty. By default, the R would -parallelize to 16 threads. You may control the threads by setting the -OMP_NUM_THREADS environment variable. +The R on Anselm is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many R kernels, notably the linear algebra subroutines. The R runs these heavy calculation kernels without any penalty. By default, the R would parallelize to 16 threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. ### Interactive execution -To run R interactively, using Rstudio GUI, log in with ssh -X parameter -for X11 forwarding. Run rstudio: +To run R interactively, using Rstudio GUI, log in with ssh -X parameter for X11 forwarding. Run rstudio: +```bash $ module load Rstudio $ rstudio +``` ### Batch execution -To run R in batch mode, write an R script, then write a bash jobscript -and execute via the qsub command. By default, R will use 16 threads when -running MKL kernels. +To run R in batch mode, write an R script, then write a bash jobscript and execute via the qsub command. By default, R will use 16 threads when running MKL kernels. Example jobscript: +```bash #!/bin/bash # change to local scratch directory cd /lscratch/$PBS_JOBID || exit - # copy input file to scratch + # copy input file to scratch cp $PBS_O_WORKDIR/rscript.R . # load R module @@ -84,57 +65,45 @@ Example jobscript: #exit exit +``` -This script may be submitted directly to the PBS workload manager via -the qsub command. The inputs are in rscript.R file, outputs in -routput.out file. See the single node jobscript example in the [Job -execution -section](../../resource-allocation-and-job-execution/job-submission-and-execution.html). +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in rscript.R file, outputs in routput.out file. See the single node jobscript example in the [Job execution section](../../resource-allocation-and-job-execution/job-submission-and-execution.html). Parallel R ---------- - -Parallel execution of R may be achieved in many ways. One approach is -the implied parallelization due to linked libraries or specially enabled -functions, as [described -above](r.html#interactive-execution). In the following -sections, we focus on explicit parallelization, where parallel -constructs are directly stated within the R script. +Parallel execution of R may be achieved in many ways. One approach is the implied parallelization due to linked libraries or specially enabled functions, as [described above](r.html#interactive-execution). In the following sections, we focus on explicit parallelization, where parallel constructs are directly stated within the R script. Package parallel -------------------- - -The package parallel provides support for parallel computation, -including by forking (taken from package multicore), by sockets (taken -from package snow) and random-number generation. +The package parallel provides support for parallel computation, including by forking (taken from package multicore), by sockets (taken from package snow) and random-number generation. The package is activated this way: +```bash $ R > library(parallel) +``` -More information and examples may be obtained directly by reading the -documentation available in R +More information and examples may be obtained directly by reading the documentation available in R +```bash > ?parallel > library(help = "parallel") > vignette("parallel") +``` -Download the package -[parallell](package-parallel-vignette) vignette. - -The forking is the most simple to use. Forking family of functions -provide parallelized, drop in replacement for the serial apply() family -of functions. +Download the package [parallell](package-parallel-vignette) vignette. -Forking via package parallel provides functionality similar to OpenMP -construct -#omp parallel for +The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. -Only cores of single node can be utilized this way! +>Forking via package parallel provides functionality similar to OpenMP construct +>omp parallel for +> +>Only cores of single node can be utilized this way! Forking example: +```bash library(parallel) #integrand function @@ -164,47 +133,39 @@ Forking example: #print results cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) } +``` -The above example is the classic parallel example for calculating the -number Ď€. Note the **detectCores()** and **mclapply()** functions. -Execute the example as: +The above example is the classic parallel example for calculating the number Ď€. Note the **detectCores()** and **mclapply()** functions. Execute the example as: +```bash $ R --slave --no-save --no-restore -f pi3p.R +``` -Every evaluation of the integrad function runs in parallel on different -process. +Every evaluation of the integrad function runs in parallel on different process. Package Rmpi ------------ +>package Rmpi provides an interface (wrapper) to MPI APIs. -package Rmpi provides an interface (wrapper) to MPI APIs. - -It also provides interactive R slave environment. On Anselm, Rmpi -provides interface to the -[OpenMPI](../mpi-1/Running_OpenMPI.html). +It also provides interactive R slave environment. On Anselm, Rmpi provides interface to the [OpenMPI](../mpi-1/Running_OpenMPI.html). -Read more on Rmpi at <http://cran.r-project.org/web/packages/Rmpi/>, -reference manual is available at -<http://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf> +Read more on Rmpi at <http://cran.r-project.org/web/packages/Rmpi/>, reference manual is available at <http://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf> When using package Rmpi, both openmpi and R modules must be loaded +```bash $ module load openmpi $ module load R - -Rmpi may be used in three basic ways. The static approach is identical -to executing any other MPI programm. In addition, there is Rslaves -dynamic MPI approach and the mpi.apply approach. In the following -section, we will use the number Ď€ integration example, to illustrate all -these concepts. +``` +Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI programm. In addition, there is Rslaves dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number Ď€ integration example, to illustrate all these concepts. ### static Rmpi -Static Rmpi programs are executed via mpiexec, as any other MPI -programs. Number of processes is static - given at the launch time. +Static Rmpi programs are executed via mpiexec, as any other MPI programs. Number of processes is static - given at the launch time. Static Rmpi example: +```cpp library(Rmpi) #integrand function @@ -246,21 +207,23 @@ Static Rmpi example: } mpi.quit() +``` + +The above is the static MPI example for calculating the number Ď€. Note the **library(Rmpi)** and **mpi.comm.dup()** function calls. -The above is the static MPI example for calculating the number Ď€. Note -the **library(Rmpi)** and **mpi.comm.dup()** function calls. Execute the example as: +```bash $ mpiexec R --slave --no-save --no-restore -f pi3.R +``` ### dynamic Rmpi -Dynamic Rmpi programs are executed by calling the R directly. openmpi -module must be still loaded. The R slave processes will be spawned by a -function call within the Rmpi program. +Dynamic Rmpi programs are executed by calling the R directly. openmpi module must be still loaded. The R slave processes will be spawned by a function call within the Rmpi program. Dynamic Rmpi example: +```cpp #integrand function f <- function(i,h) { x <- h*(i-0.5) @@ -316,26 +279,27 @@ Dynamic Rmpi example: workerpi() mpi.quit() +``` + +The above example is the dynamic MPI example for calculating the number Ď€. Both master and slave processes carry out the calculation. Note the mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the mpi.bcast.cmd()** function calls. -The above example is the dynamic MPI example for calculating the number -Ď€. Both master and slave processes carry out the calculation. Note the -mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the -mpi.bcast.cmd()** function calls. Execute the example as: +```bash $ R --slave --no-save --no-restore -f pi3Rslaves.R +``` ### mpi.apply Rmpi mpi.apply is a specific way of executing Dynamic Rmpi programs. -mpi.apply() family of functions provide MPI parallelized, drop in -replacement for the serial apply() family of functions. +>mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. Execution is identical to other dynamic Rmpi programs. mpi.apply Rmpi example: +```bash #integrand function f <- function(i,h) { x <- h*(i-0.5) @@ -381,18 +345,15 @@ mpi.apply Rmpi example: } mpi.quit() +``` -The above is the mpi.apply MPI example for calculating the number Ď€. -Only the slave processes carry out the calculation. Note the -mpi.parSapply(), ** function call. The package -parallel -[example](r.html#package-parallel)[above](r.html#package-parallel){.anchor -may be trivially adapted (for much better performance) to this structure -using the mclapply() in place of mpi.parSapply(). +The above is the mpi.apply MPI example for calculating the number Ď€. Only the slave processes carry out the calculation. Note the **mpi.parSapply()**, function call. The package parallel [example](r.html#package-parallel)[above](r.html#package-parallel) may be trivially adapted (for much better performance) to this structure using the mclapply() in place of mpi.parSapply(). Execute the example as: +```bash $ R --slave --no-save --no-restore -f pi3parSapply.R +``` Combining parallel and Rmpi --------------------------- @@ -402,13 +363,11 @@ Currently, the two packages can not be combined for hybrid calculations. Parallel execution ------------------ -The R parallel jobs are executed via the PBS queue system exactly as any -other parallel jobs. User must create an appropriate jobscript and -submit via the **qsub** +The R parallel jobs are executed via the PBS queue system exactly as any other parallel jobs. User must create an appropriate jobscript and submit via the **qsub** -Example jobscript for [static Rmpi](r.html#static-rmpi) -parallel R execution, running 1 process per core: +Example jobscript for [static Rmpi](r.html#static-rmpi) parallel R execution, running 1 process per core: +```bash #!/bin/bash #PBS -q qprod #PBS -N Rjob @@ -418,7 +377,7 @@ parallel R execution, running 1 process per core: SCRDIR=/scratch/$USER/myjob cd $SCRDIR || exit - # copy input file to scratch + # copy input file to scratch cp $PBS_O_WORKDIR/rscript.R . # load R and openmpi module @@ -433,9 +392,6 @@ parallel R execution, running 1 process per core: #exit exit +``` -For more information about jobscripts and MPI execution refer to the -[Job -submission](../../resource-allocation-and-job-execution/job-submission-and-execution.html) -and general [MPI](../mpi-1.html) sections. - +For more information about jobscripts and MPI execution refer to the [Job submission](../../resource-allocation-and-job-execution/job-submission-and-execution.html) and general [MPI](../mpi-1.html) sections. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/fftw.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/fftw.md index dc843fe8be9b69939bcbdbb202714a220d4cfdfd..97093adebedae07dc396ff3803068974c2f48f6f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/fftw.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/fftw.md @@ -1,48 +1,33 @@ -FFTW +FFTW ==== The discrete Fourier transform in one or more dimensions, MPI parallel - +FFTW is a C subroutine library for computing the discrete Fourier transform in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). The FFTW library allows for MPI parallel, in-place discrete Fourier transform, with data distributed over number of nodes. - +Two versions, **3.3.3** and **2.1.5** of FFTW are available on Anselm, each compiled for **Intel MPI** and **OpenMPI** using **intel** and **gnu** compilers. These are available via modules: -FFTW is a C subroutine library for computing the discrete Fourier -transform in one or more dimensions, of arbitrary input size, and of -both real and complex data (as well as of even/odd data, i.e. the -discrete cosine/sine transforms or DCT/DST). The FFTW library allows for -MPI parallel, in-place discrete Fourier transform, with data distributed -over number of nodes. - -Two versions, **3.3.3** and **2.1.5** of FFTW are available on Anselm, -each compiled for **Intel MPI** and **OpenMPI** using **intel** and -gnu** compilers. These are available via modules: - -<col width="25%" /> -<col width="25%" /> -<col width="25%" /> -<col width="25%" /> |Version |Parallelization |module |linker options | | --- | --- | |FFTW3 gcc3.3.3 |pthread, OpenMP |fftw3/3.3.3-gcc |-lfftw3, -lfftw3_threads-lfftw3_omp | - |FFTW3 icc3.3.3\ |pthread, OpenMP |fftw3 |-lfftw3, -lfftw3_threads-lfftw3_omp | - |FFTW2 gcc2.1.5\ |pthread |fftw2/2.1.5-gcc |-lfftw, -lfftw_threads | + |FFTW3 icc3.3.3 |pthread, OpenMP |fftw3 |-lfftw3, -lfftw3_threads-lfftw3_omp | + |FFTW2 gcc2.1.5 |pthread |fftw2/2.1.5-gcc |-lfftw, -lfftw_threads | |FFTW2 icc2.1.5 |pthread |fftw2 |-lfftw, -lfftw_threads | |FFTW3 gcc3.3.3 |OpenMPI |fftw-mpi3/3.3.3-gcc |-lfftw3_mpi | |FFTW3 icc3.3.3 |Intel MPI |fftw3-mpi |-lfftw3_mpi | |FFTW2 gcc2.1.5 |OpenMPI |fftw2-mpi/2.1.5-gcc |-lfftw_mpi | |FFTW2 gcc2.1.5 |IntelMPI |fftw2-mpi/2.1.5-gcc |-lfftw_mpi | +```bash $ module load fftw3 +``` -The module sets up environment variables, required for linking and -running fftw enabled applications. Make sure that the choice of fftw -module is consistent with your choice of MPI library. Mixing MPI of -different implementations may have unpredictable results. +The module sets up environment variables, required for linking and running fftw enabled applications. Make sure that the choice of fftw module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. Example ------- +```cpp #include <fftw3-mpi.h> int main(int argc, char **argv) { @@ -75,17 +60,17 @@ Example    MPI_Finalize(); } +``` Load modules and compile: +```bash $ module load impi intel $ module load fftw3-mpi $ mpicc testfftw3mpi.c -o testfftw3mpi.x -Wl,-rpath=$LIBRARY_PATH -lfftw3_mpi +``` - Run the example as [Intel MPI -program](../mpi-1/running-mpich2.html). - -Read more on FFTW usage on the [FFTW -website.](http://www.fftw.org/fftw3_doc/) +Run the example as [Intel MPI program](../mpi-1/running-mpich2.html). +Read more on FFTW usage on the [FFTW website.](http://www.fftw.org/fftw3_doc/) \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/gsl.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/gsl.md index 35894dad44cd6d9d3990a456db65894d75c7961e..0258e866cdd560234b95313479c1dd42f4163d1f 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/gsl.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/gsl.md @@ -1,107 +1,91 @@ -GSL +GSL === -The GNU Scientific Library. Provides a wide range of mathematical -routines. - - +The GNU Scientific Library. Provides a wide range of mathematical routines. Introduction ------------ +The GNU Scientific Library (GSL) provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total. The routines have been written from scratch in C, and present a modern Applications Programming Interface (API) for C programmers, allowing wrappers to be written for very high level languages. -The GNU Scientific Library (GSL) provides a wide range of mathematical -routines such as random number generators, special functions and -least-squares fitting. There are over 1000 functions in total. The -routines have been written from scratch in C, and present a modern -Applications Programming Interface (API) for C programmers, allowing -wrappers to be written for very high level languages. +The library covers a wide range of topics in numerical computing. Routines are available for the following areas: -The library covers a wide range of topics in numerical computing. -Routines are available for the following areas: - ------------------ |---|---|-------------- ------------------------ - Complex Numbers Roots of Polynomials + Complex Numbers Roots of Polynomials - Special Functions Vectors and Matrices + Special Functions Vectors and Matrices - Permutations Combinations + Permutations Combinations - Sorting BLAS Support + Sorting BLAS Support - Linear Algebra CBLAS Library + Linear Algebra CBLAS Library - Fast Fourier Transforms Eigensystems + Fast Fourier Transforms Eigensystems - Random Numbers Quadrature + Random Numbers Quadrature - Random Distributions Quasi-Random Sequences + Random Distributions Quasi-Random Sequences - Histograms Statistics + Histograms Statistics - Monte Carlo Integration N-Tuples + Monte Carlo Integration N-Tuples - Differential Equations Simulated Annealing + Differential Equations Simulated Annealing - Numerical Interpolation - Differentiation + Numerical Differentiation Interpolation - Series Acceleration Chebyshev Approximations + Series Acceleration Chebyshev Approximations - Root-Finding Discrete Hankel - Transforms + Root-Finding Discrete Hankel Transforms - Least-Squares Fitting Minimization + Least-Squares Fitting Minimization - IEEE Floating-Point Physical Constants + IEEE Floating-Point Physical Constants - Basis Splines Wavelets - ------------------ |---|---|-------------- ------------------------ + Basis Splines Wavelets Modules ------- -The GSL 1.16 is available on Anselm, compiled for GNU and Intel -compiler. These variants are available via modules: +The GSL 1.16 is available on Anselm, compiled for GNU and Intel compiler. These variants are available via modules: - Module Compiler - ----------------- |---|---|- - gsl/1.16-gcc gcc 4.8.6 - gsl/1.16-icc(default) icc +|Module|Compiler| +|---|---| +| gsl/1.16-gcc|gcc 4.8.6| +|gsl/1.16-icc(default)|icc| +```bash  $ module load gsl +``` -The module sets up environment variables, required for linking and -running GSL enabled applications. This particular command loads the -default module, which is gsl/1.16-icc +The module sets up environment variables, required for linking and running GSL enabled applications. This particular command loads the default module, which is gsl/1.16-icc Linking ------- - -Load an appropriate gsl module. Link using **-lgsl** switch to link your -code against GSL. The GSL depends on cblas API to BLAS library, which -must be supplied for linking. The BLAS may be provided, for example from -the MKL library, as well as from the BLAS GSL library (-lgslcblas). -Using the MKL is recommended. +Load an appropriate gsl module. Link using **-lgsl** switch to link your code against GSL. The GSL depends on cblas API to BLAS library, which must be supplied for linking. The BLAS may be provided, for example from the MKL library, as well as from the BLAS GSL library (-lgslcblas). Using the MKL is recommended. ### Compiling and linking with Intel compilers +```bash $ module load intel $ module load gsl $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl +``` ### Compiling and linking with GNU compilers +```bash $ module load gcc $ module load mkl $ module load gsl/1.16-gcc $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl +``` Example ------- +Following is an example of discrete wavelet transform implemented by GSL: -Following is an example of discrete wavelet transform implemented by -GSL: - +```cpp #include <stdio.h> #include <math.h> #include <gsl/gsl_sort.h> @@ -130,19 +114,19 @@ GSL:    {      abscoeff[i] = fabs (data[i]);    } -  +  gsl_sort_index (p, abscoeff, 1, n); -  +  for (i = 0; (i + nc) < n; i++)    data[p[i]] = 0; -  +  gsl_wavelet_transform_inverse (w, data, 1, n, work); -  +  for (i = 0; i < n; i++)    {      printf ("%gn", data[i]);    } -  +  gsl_wavelet_free (w);  gsl_wavelet_workspace_free (work); @@ -151,14 +135,13 @@ GSL:  free (p);  return 0; } +``` Load modules and compile: +```bash $ module load intel gsl icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl +``` -In this example, we compile the dwt.c code using the Intel compiler and -link it to the MKL and GSL library, note the -mkl and -lgsl options. The -library search path is compiled in, so that no modules are necessary to -run the code. - +In this example, we compile the dwt.c code using the Intel compiler and link it to the MKL and GSL library, note the -mkl and -lgsl options. The library search path is compiled in, so that no modules are necessary to run the code. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md index 7c6d9def6a429dcfb1695f4872af6cab1c8bc091..d41a4ed5ed7482692a136de9f9af18c54a31eec2 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/hdf5.md @@ -1,56 +1,35 @@ -HDF5 +HDF5 ==== Hierarchical Data Format library. Serial and MPI parallel version. - +[HDF5 (Hierarchical Data Format)](http://www.hdfgroup.org/HDF5/) is a general purpose library and file format for storing scientific data. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic objects, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. -[HDF5 (Hierarchical Data Format)](http://www.hdfgroup.org/HDF5/) is a -general purpose library and file format for storing scientific data. -HDF5 can store two primary objects: datasets and groups. A dataset is -essentially a multidimensional array of data elements, and a group is a -structure for organizing objects in an HDF5 file. Using these two basic -objects, one can create and store almost any kind of scientific data -structure, such as images, arrays of vectors, and structured and -unstructured grids. You can also mix and match them in HDF5 files -according to your needs. +Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, compiled for **Intel MPI** and **OpenMPI** using **intel** and **gnu** compilers. These are available via modules: -Versions **1.8.11** and **1.8.13** of HDF5 library are available on -Anselm, compiled for **Intel MPI** and **OpenMPI** using **intel** and -gnu** compilers. These are available via modules: - - |Version |Parallelization |module |C linker options<th align="left">C++ linker options<th align="left">Fortran linker options | + |Version |Parallelization |module |C linker options|C++ linker options|Fortran linker options | | --- | --- | |HDF5 icc serial |pthread |hdf5/1.8.11 |$HDF5_INC $HDF5_SHLIB |$HDF5_INC $HDF5_CPP_LIB |$HDF5_INC $HDF5_F90_LIB | - |HDF5 icc parallel MPI\ |pthread, IntelMPI |hdf5-parallel/1.8.11 |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | + |HDF5 icc parallel MPI |pthread, IntelMPI |hdf5-parallel/1.8.11 |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | |HDF5 icc serial |pthread |hdf5/1.8.13 |$HDF5_INC $HDF5_SHLIB |$HDF5_INC $HDF5_CPP_LIB |$HDF5_INC $HDF5_F90_LIB | - |HDF5 icc parallel MPI\ |pthread, IntelMPI |hdf5-parallel/1.8.13 |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | - |HDF5 gcc parallel MPI\ |pthread, OpenMPI 1.6.5, gcc 4.8.1 |hdf5-parallel/1.8.11-gcc |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | - |HDF5 gcc parallel MPI\ |pthread, OpenMPI 1.6.5, gcc 4.8.1 |hdf5-parallel/1.8.13-gcc |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | - |HDF5 gcc parallel MPI\ |pthread, OpenMPI 1.8.1, gcc 4.9.0 |hdf5-parallel/1.8.13-gcc49 |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | - - + |HDF5 icc parallel MPI |pthread, IntelMPI |hdf5-parallel/1.8.13 |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | + |HDF5 gcc parallel MPI |pthread, OpenMPI 1.6.5, gcc 4.8.1 |hdf5-parallel/1.8.11-gcc |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | + |HDF5 gcc parallel MPI|pthread, OpenMPI 1.6.5, gcc 4.8.1 |hdf5-parallel/1.8.13-gcc |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | + |HDF5 gcc parallel MPI |pthread, OpenMPI 1.8.1, gcc 4.9.0 |hdf5-parallel/1.8.13-gcc49 |$HDF5_INC $HDF5_SHLIB |Not supported |$HDF5_INC $HDF5_F90_LIB | +```bash $ module load hdf5-parallel +``` + +The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. -The module sets up environment variables, required for linking and -running HDF5 enabled applications. Make sure that the choice of HDF5 -module is consistent with your choice of MPI library. Mixing MPI of -different implementations may have unpredictable results. - -Be aware, that GCC version of **HDF5 1.8.11** has serious performance -issues, since it's compiled with -O0 optimization flag. This version is -provided only for testing of code compiled only by GCC and IS NOT -recommended for production computations. For more informations, please -see: -<http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> -All GCC versions of **HDF5 1.8.13** are not affected by the bug, are -compiled with -O3 optimizations and are recommended for production -computations. +>Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more informations, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> +All GCC versions of **HDF5 1.8.13** are not affected by the bug, are compiled with -O3 optimizations and are recommended for production computations. Example ------- +```cpp #include "hdf5.h" #define FILE "dset.h5" @@ -94,27 +73,17 @@ Example /* Close the file. */ status = H5Fclose(file_id); } +``` Load modules and compile: +```bash $ module load intel impi $ module load hdf5-parallel $ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB +``` - Run the example as [Intel MPI -program](../anselm-cluster-documentation/software/mpi-1/running-mpich2.html). - -For further informations, please see the website: -<http://www.hdfgroup.org/HDF5/> - - - - - - - -class="smarterwiki-popup-bubble-tip"> - -btnI=I'm+Feeling+Lucky&btnI=I'm+Feeling+Lucky&q=HDF5%20icc%20serial%09pthread%09hdf5%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09%24HDF5_INC%20%24HDF5_CPP_LIB%09%24HDF5_INC%20%24HDF5_F90_LIB%0A%0AHDF5%20icc%20parallel%20MPI%0A%09pthread%2C%20IntelMPI%09hdf5-parallel%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09Not%20supported%09%24HDF5_INC%20%24HDF5_F90_LIB+wikipedia "Search Wikipedia"){.smarterwiki-popup-bubble +Run the example as [Intel MPI program](../anselm-cluster-documentation/software/mpi-1/running-mpich2.html). +For further informations, please see the website: <http://www.hdfgroup.org/HDF5/> \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md index eb98c60fa8913a2ed75576197daf7dfbbe68d988..67c64a22321830163df2173d7dc44898dff2d6de 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md @@ -1,54 +1,34 @@ -Intel numerical libraries +Intel numerical libraries ========================= Intel libraries for high performance in numerical computing - - Intel Math Kernel Library ------------------------- +Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. -Intel Math Kernel Library (Intel MKL) is a library of math kernel -subroutines, extensively threaded and optimized for maximum performance. -Intel MKL unites and provides these basic components: BLAS, LAPACK, -ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and -many more. - +```bash $ module load mkl +``` -Read more at the [Intel -MKL](../intel-suite/intel-mkl.html) page. +Read more at the [Intel MKL](../intel-suite/intel-mkl.html) page. Intel Integrated Performance Primitives --------------------------------------- +Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. -Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX -is available, via module ipp. The IPP is a library of highly optimized -algorithmic building blocks for media and data applications. This -includes signal, image and frame processing algorithms, such as FFT, -FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many -more. - +```bash $ module load ipp +``` -Read more at the [Intel -IPP](../intel-suite/intel-integrated-performance-primitives.html) -page. +Read more at the [Intel IPP](../intel-suite/intel-integrated-performance-primitives.html) page. Intel Threading Building Blocks ------------------------------- +Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. -Intel Threading Building Blocks (Intel TBB) is a library that supports -scalable parallel programming using standard ISO C++ code. It does not -require special languages or compilers. It is designed to promote -scalable data parallel programming. Additionally, it fully supports -nested parallelism, so you can build larger parallel components from -smaller parallel components. To use the library, you specify tasks, not -threads, and let the library map tasks onto threads in an efficient -manner. - +```bash $ module load tbb +``` -Read more at the [Intel -TBB](../intel-suite/intel-tbb.html) page. - +Read more at the [Intel TBB](../intel-suite/intel-tbb.html) page. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md index 94aae9e9cdec1250d4392676ea5f40b0f6c767bc..201fac4dc200667a1c3dad4dbda39d82a924d9fe 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -1,52 +1,46 @@ -MAGMA for Intel Xeon Phi +MAGMA for Intel Xeon Phi ======================== -Next generation dense algebra library for heterogeneous systems with -accelerators +Next generation dense algebra library for heterogeneous systems with accelerators ### Compiling and linking with MAGMA -To be able to compile and link code with MAGMA library user has to load -following module: +To be able to compile and link code with MAGMA library user has to load following module: +```bash $ module load magma/1.3.0-mic +``` -To make compilation more user friendly module also sets these two -environment variables: +To make compilation more user friendly module also sets these two environment variables: -MAGMA_INC - contains paths to the MAGMA header files (to be used for -compilation step) +>MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) -MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking -step).  +>MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). Compilation example: +```bash $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall $MAGMA_INC -c testing_dgetrf_mic.cpp -o testing_dgetrf_mic.o $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS - - +``` ### Running MAGMA code -MAGMA implementation for Intel MIC requires a MAGMA server running on -accelerator prior to executing the user application. The server can be -started and stopped using following scripts: +MAGMA implementation for Intel MIC requires a MAGMA server running on accelerator prior to executing the user application. The server can be started and stopped using following scripts: -To start MAGMA server use: -$MAGMAROOT/start_magma_server** +>To start MAGMA server use: +**$MAGMAROOT/start_magma_server** -To stop the server use: -$MAGMAROOT/stop_magma_server** +>To stop the server use: +**$MAGMAROOT/stop_magma_server** -For deeper understanding how the MAGMA server is started, see the -following script: -$MAGMAROOT/launch_anselm_from_mic.sh** +>For deeper understanding how the MAGMA server is started, see the following script: +**$MAGMAROOT/launch_anselm_from_mic.sh** -To test if the MAGMA server runs properly we can run one of examples -that are part of the MAGMA installation: +To test if the MAGMA server runs properly we can run one of examples that are part of the MAGMA installation: +```bash [user@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic [user@cn204 ~]$ export OMP_NUM_THREADS=16 @@ -66,28 +60,17 @@ that are part of the MAGMA installation:  8256 8256    ---  ( --- )   446.97 (  0.84)    ---  9280 9280    ---  ( --- )   461.15 (  1.16)    --- 10304 10304    ---  ( --- )   500.70 (  1.46)    --- +``` - - -Please note: MAGMA contains several benchmarks and examples that can be -found in: -$MAGMAROOT/testing/** +>Please note: MAGMA contains several benchmarks and examples that can be found in: +**$MAGMAROOT/testing/** -MAGMA relies on the performance of all CPU cores as well as on the -performance of the accelerator. Therefore on Anselm number of CPU OpenMP -threads has to be set to 16:  ** -export OMP_NUM_THREADS=16** +>MAGMA relies on the performance of all CPU cores as well as on the performance of the accelerator. Therefore on Anselm number of CPU OpenMP threads has to be set to 16: +**export OMP_NUM_THREADS=16** - -See more details at [MAGMA home -page](http://icl.cs.utk.edu/magma/). +See more details at [MAGMA home page](http://icl.cs.utk.edu/magma/). References ---------- - -[1] MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors, -Jack Dongarra et. al, -[http://icl.utk.edu/projectsfiles/magma/pubs/24-MAGMA_MIC_03.pdf -](http://icl.utk.edu/projectsfiles/magma/pubs/24-MAGMA_MIC_03.pdf) - +[1] MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors, Jack Dongarra et. al, [http://icl.utk.edu/projectsfiles/magma/pubs/24-MAGMA_MIC_03.pdf](http://icl.utk.edu/projectsfiles/magma/pubs/24-MAGMA_MIC_03.pdf) \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md index 5bf88ae2c57af507465e8de23e1a3b25c1253966..1afdbb886abd4e7a81abda016be2702b5058499e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/petsc.md @@ -1,89 +1,49 @@ -PETSc +PETSc ===== -PETSc is a suite of building blocks for the scalable solution of -scientific and engineering applications modelled by partial differential -equations. It supports MPI, shared memory, and GPUs through CUDA or -OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism. - - +PETSc is a suite of building blocks for the scalable solution of scientific and engineering applications modelled by partial differential equations. It supports MPI, shared memory, and GPUs through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism. Introduction ------------ - -PETSc (Portable, Extensible Toolkit for Scientific Computation) is a -suite of building blocks (data structures and routines) for the scalable -solution of scientific and engineering applications modelled by partial -differential equations. It allows thinking in terms of high-level -objects (matrices) instead of low-level objects (raw arrays). Written in -C language but can also be called from FORTRAN, C++, Python and Java -codes. It supports MPI, shared memory, and GPUs through CUDA or OpenCL, -as well as hybrid MPI-shared memory or MPI-GPU parallelism. +PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of building blocks (data structures and routines) for the scalable solution of scientific and engineering applications modelled by partial differential equations. It allows thinking in terms of high-level objects (matrices) instead of low-level objects (raw arrays). Written in C language but can also be called from FORTRAN, C++, Python and Java codes. It supports MPI, shared memory, and GPUs through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism. Resources --------- - - [project webpage](http://www.mcs.anl.gov/petsc/) - [documentation](http://www.mcs.anl.gov/petsc/documentation/) - [PETSc Users Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) - [index of all manual pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html) -- PRACE Video Tutorial [part - 1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part - 2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part - 3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part - 4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part - 5](http://www.youtube.com/watch?v=iXkbLEBFjlM) +- PRACE Video Tutorial [part1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part5](http://www.youtube.com/watch?v=iXkbLEBFjlM) Modules ------- -You can start using PETSc on Anselm by loading the PETSc module. Module -names obey this pattern: +You can start using PETSc on Anselm by loading the PETSc module. Module names obey this pattern: +```bash # module load petsc/version-compiler-mpi-blas-variant, e.g. module load petsc/3.4.4-icc-impi-mkl-opt +``` -where `variant` is replaced by one of -`{dbg, opt, threads-dbg, threads-opt}`. The `opt` variant is compiled -without debugging information (no `-g` option) and with aggressive -compiler optimizations (`-O3 -xAVX`). This variant is suitable for -performance measurements and production runs. In all other cases use the -debug (`dbg`) variant, because it contains debugging information, -performs validations and self-checks, and provides a clear stack trace -and message in case of an error. The other two variants `threads-dbg` -and `threads-opt` are `dbg` and `opt`, respectively, built with [OpenMP -and pthreads threading -support](http://www.mcs.anl.gov/petsc/features/threads.html). +where `variant` is replaced by one of `{dbg, opt, threads-dbg, threads-opt}`. The `opt` variant is compiled without debugging information (no `-g` option) and with aggressive compiler optimizations (`-O3 -xAVX`). This variant is suitable for performance measurements and production runs. In all other cases use the debug (`dbg`) variant, because it contains debugging information, performs validations and self-checks, and provides a clear stack trace and message in case of an error. The other two variants `threads-dbg` and `threads-opt` are `dbg` and `opt`, respectively, built with [OpenMP and pthreads threading support](http://www.mcs.anl.gov/petsc/features/threads.html). External libraries ------------------ +PETSc needs at least MPI, BLAS and LAPACK. These dependencies are currently satisfied with Intel MPI and Intel MKL in Anselm `petsc` modules. -PETSc needs at least MPI, BLAS and LAPACK. These dependencies are -currently satisfied with Intel MPI and Intel MKL in Anselm `petsc` -modules. +PETSc can be linked with a plethora of [external numerical libraries](http://www.mcs.anl.gov/petsc/miscellaneous/external.html), extending PETSc functionality, e.g. direct linear system solvers, preconditioners or partitioners. See below a list of libraries currently included in Anselm `petsc` modules. -PETSc can be linked with a plethora of [external numerical -libraries](http://www.mcs.anl.gov/petsc/miscellaneous/external.html), -extending PETSc functionality, e.g. direct linear system solvers, -preconditioners or partitioners. See below a list of libraries currently -included in Anselm `petsc` modules. - -All these libraries can be used also alone, without PETSc. Their static -or shared program libraries are available in -`$PETSC_DIR/$PETSC_ARCH/lib` and header files in -`$PETSC_DIR/$PETSC_ARCH/include`. `PETSC_DIR` and `PETSC_ARCH` are -environment variables pointing to a specific PETSc instance based on the -petsc module loaded. +All these libraries can be used also alone, without PETSc. Their static or shared program libraries are available in +`$PETSC_DIR/$PETSC_ARCH/lib` and header files in `$PETSC_DIR/$PETSC_ARCH/include`. `PETSC_DIR` and `PETSC_ARCH` are environment variables pointing to a specific PETSc instance based on the petsc module loaded. ### Libraries linked to PETSc on Anselm (as of 11 April 2015) - dense linear algebra - [Elemental](http://libelemental.org/) - sparse linear system solvers - - [Intel MKL - Pardiso](https://software.intel.com/en-us/node/470282) + - [Intel MKL Pardiso](https://software.intel.com/en-us/node/470282) - [MUMPS](http://mumps.enseeiht.fr/) - [PaStiX](http://pastix.gforge.inria.fr/) - [SuiteSparse](http://faculty.cse.tamu.edu/davis/suitesparse.html) @@ -101,6 +61,4 @@ petsc module loaded. - preconditioners & multigrid - [Hypre](http://acts.nersc.gov/hypre/) - [Trilinos ML](http://trilinos.sandia.gov/packages/ml/) - - [SPAI - Sparse Approximate - Inverse](https://bitbucket.org/petsc/pkg-spai) - + - [SPAI - Sparse Approximate Inverse](https://bitbucket.org/petsc/pkg-spai) \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md index ddd041eeb6ecad4e54a7d009a55fb64a29d7dc78..b19b95b2245d2a20838674139e87c4bd015b6ccd 100644 --- a/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md +++ b/docs.it4i/anselm-cluster-documentation/software/numerical-libraries/trilinos.md @@ -1,74 +1,50 @@ -Trilinos +Trilinos ======== -Packages for large scale scientific and engineering problems. Provides -MPI and hybrid parallelization. +Packages for large scale scientific and engineering problems. Provides MPI and hybrid parallelization. ### Introduction -Trilinos is a collection of software packages for the numerical solution -of large scale scientific and engineering problems. It is based on C++ -and feautures modern object-oriented design. Both serial as well as -parallel computations based on MPI and hybrid parallelization are -supported within Trilinos packages. +Trilinos is a collection of software packages for the numerical solution of large scale scientific and engineering problems. It is based on C++ and feautures modern object-oriented design. Both serial as well as parallel computations based on MPI and hybrid parallelization are supported within Trilinos packages. ### Installed packages -Current Trilinos installation on ANSELM contains (among others) the -following main packages - -- **Epetra** - core linear algebra package containing classes for - manipulation with serial and distributed vectors, matrices, - and graphs. Dense linear solvers are supported via interface to BLAS - and LAPACK (Intel MKL on ANSELM). Its extension **EpetraExt** - contains e.g. methods for matrix-matrix multiplication. -- **Tpetra** - next-generation linear algebra package. Supports 64bit - indexing and arbitrary data type using C++ templates. -- **Belos** - library of various iterative solvers (CG, block CG, - GMRES, block GMRES etc.). +Current Trilinos installation on ANSELM contains (among others) the following main packages + +- **Epetra** - core linear algebra package containing classes for manipulation with serial and distributed vectors, matrices, and graphs. Dense linear solvers are supported via interface to BLAS and LAPACK (Intel MKL on ANSELM). Its extension **EpetraExt** contains e.g. methods for matrix-matrix multiplication. +- **Tpetra** - next-generation linear algebra package. Supports 64bit indexing and arbitrary data type using C++ templates. +- **Belos** - library of various iterative solvers (CG, block CG, GMRES, block GMRES etc.). - **Amesos** - interface to direct sparse solvers. - **Anasazi** - framework for large-scale eigenvalue algorithms. -- **IFPACK** - distributed algebraic preconditioner (includes e.g. - incomplete LU factorization) -- **Teuchos** - common tools packages. This package contains classes - for memory management, output, performance monitoring, BLAS and - LAPACK wrappers etc. +- **IFPACK** - distributed algebraic preconditioner (includes e.g. incomplete LU factorization) +- **Teuchos** - common tools packages. This package contains classes for memory management, output, performance monitoring, BLAS and LAPACK wrappers etc. -For the full list of Trilinos packages, descriptions of their -capabilities, and user manuals see -[http://trilinos.sandia.gov.](http://trilinos.sandia.gov) +For the full list of Trilinos packages, descriptions of their capabilities, and user manuals see [http://trilinos.sandia.gov.](http://trilinos.sandia.gov) ### Installed version -Currently, Trilinos in version 11.2.3 compiled with Intel Compiler is -installed on ANSELM. +Currently, Trilinos in version 11.2.3 compiled with Intel Compiler is installed on ANSELM. ### Compilling against Trilinos First, load the appropriate module: +```bash $ module load trilinos +``` -For the compilation of CMake-aware project, Trilinos provides the -FIND_PACKAGE( Trilinos ) capability, which makes it easy to build -against Trilinos, including linking against the correct list of -libraries. For details, see -<http://trilinos.sandia.gov/Finding_Trilinos.txt> +For the compilation of CMake-aware project, Trilinos provides the FIND_PACKAGE( Trilinos ) capability, which makes it easy to build against Trilinos, including linking against the correct list of libraries. For details, see <http://trilinos.sandia.gov/Finding_Trilinos.txt> -For compiling using simple makefiles, Trilinos provides Makefile.export -system, which allows users to include important Trilinos variables -directly into their makefiles. This can be done simply by inserting the -following line into the makefile: +For compiling using simple makefiles, Trilinos provides Makefile.export system, which allows users to include important Trilinos variables directly into their makefiles. This can be done simply by inserting the following line into the makefile: +```bash include Makefile.export.Trilinos +``` or +```bash include Makefile.export.<package> +``` -if you are interested only in a specific Trilinos package. This will -give you access to the variables such as Trilinos_CXX_COMPILER, -Trilinos_INCLUDE_DIRS, Trilinos_LIBRARY_DIRS etc. For the detailed -description and example makefile see -<http://trilinos.sandia.gov/Export_Makefile.txt>. - +if you are interested only in a specific Trilinos package. This will give you access to the variables such as Trilinos_CXX_COMPILER, Trilinos_INCLUDE_DIRS, Trilinos_LIBRARY_DIRS etc. For the detailed description and example makefile see <http://trilinos.sandia.gov/Export_Makefile.txt>. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md index 01168124d53b6b2006bb10669c056b33eb97b2c5..dfa94b296da1f33ededea9a9b6ea0d7e5c74ac8e 100644 --- a/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md +++ b/docs.it4i/anselm-cluster-documentation/software/nvidia-cuda.md @@ -1,56 +1,54 @@ -nVidia CUDA +nVidia CUDA =========== -A guide to nVidia CUDA programming and GPU usage - - +##A guide to nVidia CUDA programming and GPU usage CUDA Programming on Anselm -------------------------- +The default programming model for GPU accelerators on Anselm is Nvidia CUDA. To set up the environment for CUDA use -The default programming model for GPU accelerators on Anselm is Nvidia -CUDA. To set up the environment for CUDA use - +```bash $ module load cuda +``` -If the user code is hybrid and uses both CUDA and MPI, the MPI -environment has to be set up as well. One way to do this is to use -the PrgEnv-gnu module, which sets up correct combination of GNU compiler -and MPI library. +If the user code is hybrid and uses both CUDA and MPI, the MPI environment has to be set up as well. One way to do this is to use the PrgEnv-gnu module, which sets up correct combination of GNU compiler and MPI library. +```bash $ module load PrgEnv-gnu +``` -CUDA code can be compiled directly on login1 or login2 nodes. User does -not have to use compute nodes with GPU accelerator for compilation. To -compile a CUDA source code, use nvcc compiler. +CUDA code can be compiled directly on login1 or login2 nodes. User does not have to use compute nodes with GPU accelerator for compilation. To compile a CUDA source code, use nvcc compiler. +```bash $ nvcc --version +``` -CUDA Toolkit comes with large number of examples, that can be -helpful to start with. To compile and test these examples user should -copy them to its home directory +CUDA Toolkit comes with large number of examples, that can be helpful to start with. To compile and test these examples user should copy them to its home directory +```bash $ cd ~ $ mkdir cuda-samples $ cp -R /apps/nvidia/cuda/6.5.14/samples/* ~/cuda-samples/ +``` -To compile an examples, change directory to the particular example (here -the example used is deviceQuery) and run "make" to start the compilation +To compile an examples, change directory to the particular example (here the example used is deviceQuery) and run "make" to start the compilation +```bash $ cd ~/cuda-samples/1_Utilities/deviceQuery - $ make + $ make +``` -To run the code user can use PBS interactive session to get access to a -node from qnvidia queue (note: use your project name with parameter -A -in the qsub command) and execute the binary file +To run the code user can use PBS interactive session to get access to a node from qnvidia queue (note: use your project name with parameter -A in the qsub command) and execute the binary file +```bash $ qsub -I -q qnvidia -A OPEN-0-0 $ module load cuda $ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery +``` -Expected output of the deviceQuery example executed on a node with Tesla -K20m is +Expected output of the deviceQuery example executed on a node with Tesla K20m is +```bash CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) @@ -61,17 +59,17 @@ K20m is Total amount of global memory: 4800 MBytes (5032706048 bytes) (13) Multiprocessors x (192) CUDA Cores/MP: 2496 CUDA Cores GPU Clock rate: 706 MHz (0.71 GHz) - Memory Clock rate: 2600 Mhz + Memory Clock rate: 2600 Mhz Memory Bus Width: 320-bit L2 Cache Size: 1310720 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 - Total amount of constant memory: 65536 bytes + Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 - Maximum number of threads per block: 1024 + Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes @@ -80,19 +78,20 @@ K20m is Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes - Alignment requirement for Surfaces: Yes - Device has ECC support: Enabled + Alignment requirement for Surfaces: Yes + Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 2 / 0 - Compute Mode: - < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > - deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = Tesla K20m + Compute Mode: + < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > + deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = Tesla K20m +``` ### Code example -In this section we provide a basic CUDA based vector addition code -example. You can directly copy and paste the code to test it. +In this section we provide a basic CUDA based vector addition code example. You can directly copy and paste the code to test it. +```bash $ vim test.cu #define N (2048*2048) @@ -101,14 +100,14 @@ example. You can directly copy and paste the code to test it. #include <stdio.h> #include <stdlib.h> - // GPU kernel function to add two vectors + // GPU kernel function to add two vectors __global__ void add_gpu( int *a, int *b, int *c, int n){  int index = threadIdx.x + blockIdx.x * blockDim.x;  if (index < n)  c[index] = a[index] + b[index]; } - // CPU function to add two vectors + // CPU function to add two vectors void add_cpu (int *a, int *b, int *c, int n) {  for (int i=0; i < n; i++) c[i] = a[i] + b[i]; @@ -120,7 +119,7 @@ example. You can directly copy and paste the code to test it.  a[i] = rand() % 10000; // random number between 0 and 9999 } - // CPU function to compare two vectors + // CPU function to compare two vectors int compare_ints( int *a, int *b, int n ){  int pass = 0;  for (int i = 0; i < N; i++){ @@ -134,7 +133,7 @@ example. You can directly copy and paste the code to test it. } int main( void ) { -  +  int *a, *b, *c; // host copies of a, b, c  int *dev_a, *dev_b, *dev_c; // device copies of a, b, c  int size = N * sizeof( int ); // we need space for N integers @@ -148,7 +147,7 @@ example. You can directly copy and paste the code to test it.  a = (int*)malloc( size );  b = (int*)malloc( size );  c = (int*)malloc( size ); -  +  // Fill input vectors with random integer numbers  random_ints( a, N );  random_ints( b, N ); @@ -163,7 +162,7 @@ example. You can directly copy and paste the code to test it.  // copy device result back to host copy of c  cudaMemcpy( c, dev_c, size, cudaMemcpyDeviceToHost ); -  //Check the results with CPU implementation +  //Check the results with CPU implementation  int *c_h; c_h = (int*)malloc( size );  add_cpu (a, b, c_h, N);  compare_ints(c, c_h, N); @@ -178,37 +177,34 @@ example. You can directly copy and paste the code to test it.  return 0; } +``` This code can be compiled using following command +```bash $ nvcc test.cu -o test_cuda +``` -To run the code use interactive PBS session to get access to one of the -GPU accelerated nodes +To run the code use interactive PBS session to get access to one of the GPU accelerated nodes +```bash $ qsub -I -q qnvidia -A OPEN-0-0 $ module load cuda $ ./test.cuda +``` CUDA Libraries -------------- ### CuBLAS -The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a -GPU-accelerated version of the complete standard BLAS library with 152 -standard BLAS routines. Basic description of the library together with -basic performance comparison with MKL can be found -[here](https://developer.nvidia.com/cublas "Nvidia cuBLAS"). +The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library with 152 standard BLAS routines. Basic description of the library together with basic performance comparison with MKL can be found [here](https://developer.nvidia.com/cublas "Nvidia cuBLAS"). -CuBLAS example: SAXPY** +**CuBLAS example: SAXPY** -SAXPY function multiplies the vector x by the scalar alpha and adds it -to the vector y overwriting the latest vector with the result. The -description of the cuBLAS function can be found in [NVIDIA CUDA -documentation](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy "Nvidia CUDA documentation "). -Code can be pasted in the file and compiled without any modification. +SAXPY function multiplies the vector x by the scalar alpha and adds it to the vector y overwriting the latest vector with the result. The description of the cuBLAS function can be found in [NVIDIA CUDA documentation](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy "Nvidia CUDA documentation "). Code can be pasted in the file and compiled without any modification. +```cpp /* Includes, system */ #include <stdio.h> #include <stdlib.h> @@ -284,29 +280,29 @@ Code can be pasted in the file and compiled without any modification.    /* Shutdown */    cublasDestroy(handle); } +``` - Please note: cuBLAS has its own function for data transfers between CPU -and GPU memory: - - -[cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) -- transfers data from CPU to GPU memory - - -[cublasGetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector) -- transfers data from GPU to CPU memory +>Please note: cuBLAS has its own function for data transfers between CPU and GPU memory: + - [cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) - transfers data from CPU to GPU memory + - [cublasGetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector) - transfers data from GPU to CPU memory - To compile the code using NVCC compiler a "-lcublas" compiler flag has -to be specified: +To compile the code using NVCC compiler a "-lcublas" compiler flag has to be specified: +```bash $ module load cuda $ nvcc -lcublas test_cublas.cu -o test_cublas_nvcc +``` To compile the same code with GCC: +```bash $ module load cuda $ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart +``` To compile the same code with Intel compiler: +```bash $ module load cuda intel $ icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart - +``` \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/diagnostic-component-team.md b/docs.it4i/anselm-cluster-documentation/software/omics-master-1/diagnostic-component-team.md deleted file mode 100644 index 3a6927548f74855361b39ee291a0ccf95aac8046..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/diagnostic-component-team.md +++ /dev/null @@ -1,49 +0,0 @@ -Diagnostic component (TEAM) -=========================== - - - -### Access - -TEAM is available at the following address -: <http://omics.it4i.cz/team/> - -The address is accessible only via -[VPN. ](../../accessing-the-cluster/vpn-access.html) - -### Diagnostic component (TEAM) {#diagnostic-component-team} - -VCF files are scanned by this diagnostic tool for known diagnostic -disease-associated variants. When no diagnostic mutation is found, the -file can be sent to the disease-causing gene discovery tool to see -wheter new disease associated variants can be found. - -TEAM >(27) is an intuitive and easy-to-use web tool that -fills the gap between the predicted mutations and the final diagnostic -in targeted enrichment sequencing analysis. The tool searches for known -diagnostic mutations, corresponding to a disease panel, among the -predicted patient’s variants. Diagnostic variants for the disease are -taken from four databases of disease-related variants (HGMD-public, -HUMSAVAR , ClinVar and COSMIC) If no primary diagnostic variant is -found, then a list of secondary findings that can help to establish a -diagnostic is produced. TEAM also provides with an interface for the -definition of and customization of panels, by means of which, genes and -mutations can be added or discarded to adjust panel definitions. - - - - - -*Figure 5. ***Interface of the application. Panels for defining -targeted regions of interest can be set up by just drag and drop known -disease genes or disease definitions from the lists. Thus, virtual -panels can be interactively improved as the knowledge of the disease -increases.* - -* -* - diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/overview.md b/docs.it4i/anselm-cluster-documentation/software/omics-master-1/overview.md deleted file mode 100644 index 0382b6936197e3ccd09774224ddd043c277b8135..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/overview.md +++ /dev/null @@ -1,827 +0,0 @@ -Overview -======== - -The human NGS data processing solution - - - -Introduction ------------- - -The scope of this OMICS MASTER solution is restricted to human genomics -research (disease causing gene discovery in whole human genome or exome) -or diagnosis (panel sequencing), although it could be extended in the -future to other usages. - -The pipeline inputs the raw data produced by the sequencing machines and -undergoes a processing procedure that consists on a quality control, the -mapping and variant calling steps that result in a file containing the -set of variants in the sample. From this point, the prioritization -component or the diagnostic component can be launched. - - - -*Figure 1.** *OMICS MASTER solution overview. Data is produced in the -external labs and comes to IT4I (represented by the blue dashed line). -The data pre-processor converts raw data into a list of variants and -annotations for each sequenced patient. These lists files together with -primary and secondary (alignment) data files are stored in IT4I sequence -DB and uploaded to the discovery (candidate prioritization) or -diagnostic component where they can be analyzed directly by the user -that produced them, depending of the experimental design carried -out*. style="text-align: left; "> - -Typical genomics pipelines are composed by several components that need -to be launched manually. The advantage of OMICS MASTER pipeline is that -all these components are invoked sequentially in an automated way. - -OMICS MASTER pipeline inputs a FASTQ file and outputs an enriched VCF -file. This pipeline is able to queue all the jobs to PBS by only -launching a process taking all the necessary input files and creates the -intermediate and final folders - -Let’s see each of the OMICS MASTER solution components: - -Components ----------- - -### Processing - -This component is composed by a set of programs that carry out quality -controls, alignment, realignment, variant calling and variant -annotation. It turns raw data from the sequencing machine into files -containing lists of variants (VCF) that once annotated, can be used by -the following components (discovery and diagnosis). - -We distinguish three types of sequencing instruments: bench sequencers -(MySeq, IonTorrent, and Roche Junior, although this last one is about -being discontinued), which produce relatively Genomes in the clinic - -low throughput (tens of million reads), and high end sequencers, which -produce high throughput (hundreds of million reads) among which we have -Illumina HiSeq 2000 (and new models) and SOLiD. All of them but SOLiD -produce data in sequence format. SOLiD produces data in a special format -called colour space that require of specific software for the mapping -process. Once the mapping has been done, the rest of the pipeline is -identical. Anyway, SOLiD is a technology which is also about being -discontinued by the manufacturer so, this type of data will be scarce in -the future. - -#### Quality control, preprocessing and statistics for FASTQ - - FastQC& FastQC. - -These steps are carried out over the original FASTQ file with optimized -scripts and includes the following steps: sequence cleansing, estimation -of base quality scores, elimination of duplicates and statistics. - -Input: FASTQ file. - -Output: FASTQ file plus an HTML file containing statistics on the -data. - -FASTQ format -It represents the nucleotide sequence and its corresponding -quality scores. - - -*Figure 2.**FASTQ file.** - -#### Mapping - -Component:** Hpg-aligner.** - -Sequence reads are mapped over the human reference genome. SOLiD reads -are not covered by this solution; they should be mapped with specific -software (among the few available options, SHRiMP seems to be the best -one). For the rest of NGS machine outputs we use HPG Aligner. -HPG-Aligner is an innovative solution, based on a combination of mapping -with BWT and local alignment with Smith-Waterman (SW), that drastically -increases mapping accuracy (97% versus 62-70% by current mappers, in the -most common scenarios). This proposal provides a simple and fast -solution that maps almost all the reads, even those containing a high -number of mismatches or indels. - -Input: FASTQ file. - -Output:** Aligned file in BAM format.*** - -Sequence Alignment/Map (SAM)** - -It is a human readable tab-delimited format in which each read and -its alignment is represented on a single line. The format can represent -unmapped reads, reads that are mapped to unique locations, and reads -that are mapped to multiple locations. - -The SAM format (1)^> consists of one header -section and one alignment section. The lines in the header section start -with character â€@’, and lines in the alignment section do not. All lines -are TAB delimited. - -In SAM, each alignment line has 11 mandatory fields and a variable -number of optional fields. The mandatory fields are briefly described in -Table 1. They must be present but their value can be a -â€*’> or a zero (depending on the field) if the -corresponding information is unavailable.  - -<col width="33%" /> -<col width="33%" /> -<col width="33%" /> - |<strong>No.</strong>\ |<p><strong>Name</strong>\ |<p><strong>Description</strong></p> | - |1\ |<p>QNAME\ |<p>Query NAME of the read or the read pair</p> | - |2\ |<p>FLAG\ |<p>Bitwise FLAG (pairing,strand,mate strand,etc.)</p> | - |3\ |<p>RNAME \ |<p>Reference sequence NAME</p> | - |4\ |<p>POS \ |<p>1-Based  leftmost POSition of clipped alignment</p> | - |5\ |<p>MAPQ \ |<p>MAPping Quality (Phred-scaled)</p> | - |6\ |<p>CIGAR \ |<p>Extended CIGAR string (operations:MIDNSHP)</p> | - |7\ |<p>MRNM \ |<p>Mate REference NaMe ('=' if same RNAME)</p> | - |8\ |<p>MPOS \ |<p>1-Based leftmost Mate POSition</p> | - |9\ |<p>ISIZE \ |<p>Inferred Insert SIZE </p> | - |10\ |<p>SEQ \ |<p>Query SEQuence on the same strand as the reference</p> | - |11\ |<p>QUAL \ |<p>Query QUALity (ASCII-33=Phred base quality)</p> | - -*Table 1.** *Mandatory fields in the SAM format. - -The standard CIGAR description of pairwise alignment defines three -operations: â€M’ for match/mismatch, â€I’ for insertion compared with the -reference and â€D’ for deletion. The extended CIGAR proposed in SAM added -four more operations: â€N’ for skipped bases on the reference, â€S’ for -soft clipping, â€H’ for hard clipping and â€P’ for padding. These support -splicing, clipping, multi-part and padded alignments. Figure 3 shows -examples of CIGAR strings for different types of alignments. - - -* -Figure 3.** *SAM format file. The â€@SQ’ line in the header section -gives the order of reference sequences. Notably, r001 is the name of a -read pair. According to FLAG 163 (=1+2+32+128), the read mapped to -position 7 is the second read in the pair (128) and regarded as properly -paired (1 + 2); its mate is mapped to 37 on the reverse strand (32). -Read r002 has three soft-clipped (unaligned) bases. The coordinate shown -in SAM is the position of the first aligned base. The CIGAR string for -this alignment contains a P (padding) operation which correctly aligns -the inserted sequences. Padding operations can be absent when an aligner -does not support multiple sequence alignment. The last six bases of read -r003 map to position 9, and the first five to position 29 on the reverse -strand. The hard clipping operation H indicates that the clipped -sequence is not present in the sequence field. The NM tag gives the -number of mismatches. Read r004 is aligned across an intron, indicated -by the N operation.** - -Binary Alignment/Map (BAM)** - -BAM is the binary representation of SAM and keeps exactly the same -information as SAM. BAM uses lossless compression to reduce the size of -the data by about 75% and provides an indexing system that allows reads -that overlap a region of the genome to be retrieved and rapidly -traversed. - -#### Quality control, preprocessing and statistics for BAM - -Component:** Hpg-Fastq & FastQC. Some features: - -- Quality control: % reads with N errors, % reads with multiple - mappings, strand bias, paired-end insert, ... -- Filtering: by number of errors, number of hits, … - - Comparator: stats, intersection, ... - -Input:** BAM** file.** - -Output:** BAM file plus an HTML file containing statistics.** - -#### Variant Calling - -Component:** GATK.** - -Identification of single nucleotide variants and indels on the -alignments is performed using the Genome Analysis Toolkit (GATK). GATK -(2)^ is a software package developed at the Broad Institute to analyze -high-throughput sequencing data. The toolkit offers a wide variety of -tools, with a primary focus on variant discovery and genotyping as well -as strong emphasis on data quality assurance. - -Input:** BAM** - -Output:** VCF** - -**Variant Call Format (VCF)** - -VCF (3)^> is a standardized format for storing the -most prevalent types of sequence variation, including SNPs, indels and -larger structural variants, together with rich annotations. The format -was developed with the primary intention to represent human genetic -variation, but its use is not restricted >to diploid genomes -and can be used in different contexts as well. Its flexibility and user -extensibility allows representation of a wide variety of genomic -variation with respect to a single reference sequence. - -A VCF file consists of a header section and a data section. The -header contains an arbitrary number of metainformation lines, each -starting with characters â€##’, and a TAB delimited field definition -line, starting with a single â€#’ character. The meta-information header -lines provide a standardized description of tags and annotations used in -the data section. The use of meta-information allows the information -stored within a VCF file to be tailored to the dataset in question. It -can be also used to provide information about the means of file -creation, date of creation, version of the reference sequence, software -used and any other information relevant to the history of the file. The -field definition line names eight mandatory columns, corresponding to -data columns representing the chromosome (CHROM), a 1-based position of -the start of the variant (POS), unique identifiers of the variant (ID), -the reference allele (REF), a comma separated list of alternate -non-reference alleles (ALT), a phred-scaled quality score (QUAL), site -filtering information (FILTER) and a semicolon separated list of -additional, user extensible annotation (INFO). In addition, if samples -are present in the file, the mandatory header columns are followed by a -FORMAT column and an arbitrary number of sample IDs that define the -samples included in the VCF file. The FORMAT column is used to define -the information contained within each subsequent genotype column, which -consists of a colon separated list of fields. For example, the FORMAT -field GT:GQ:DP in the fourth data entry of Figure 1a indicates that the -subsequent entries contain information regarding the genotype, genotype -quality and read depth for each sample. All data lines are TAB -delimited and the number of fields in each data line must match the -number of fields in the header line. It is strongly recommended that all -annotation tags used are declared in the VCF header section. - - - -Figure 4.**> (a) Example of valid VCF. The header lines -##fileformat and #CHROM are mandatory, the rest is optional but -strongly recommended. Each line of the body describes variants present -in the sampled population at one genomic position or region. All -alternate alleles are listed in the ALT column and referenced from the -genotype fields as 1-based indexes to this list; the reference haplotype -is designated as 0. For multiploid data, the separator indicates whether -the data are phased (|) or unphased (/). Thus, the two alleles C and G -at the positions 2 and 5 in this figure occur on the same chromosome in -SAMPLE1. The first data line shows an example of a deletion (present in -SAMPLE1) and a replacement of two bases by another base (SAMPLE2); the -second line shows a SNP and an insertion; the third a SNP; the fourth a -large structural variant described by the annotation in the INFO column, -the coordinate is that of the base before the variant. (b–f ) Alignments -and VCF representations of different sequence variants: SNP, insertion, -deletion, replacement, and a large deletion. The REF columns shows the -reference bases replaced by the haplotype in the ALT column. The -coordinate refers to the first reference base. (g) Users are advised to -use simplest representation possible and lowest coordinate in cases -where the position is ambiguous. - -###Annotating - -Component:** HPG-Variant - -The functional consequences of every variant found are then annotated -using the HPG-Variant software, which extracts from CellBase**,** the -Knowledge database, all the information relevant on the predicted -pathologic effect of the variants. - -VARIANT (VARIant Analysis Tool) (4)^ reports information on the -variants found that include consequence type and annotations taken from -different databases and repositories (SNPs and variants from dbSNP and -1000 genomes, and disease-related variants from the Genome-Wide -Association Study (GWAS) catalog, Online Mendelian Inheritance in Man -(OMIM), Catalog of Somatic Mutations in Cancer (COSMIC) mutations, etc. -VARIANT also produces a rich variety of annotations that include -information on the regulatory (transcription factor or miRNAbinding -sites, etc.) or structural roles, or on the selective pressures on the -sites affected by the variation. This information allows extending the -conventional reports beyond the coding regions and expands the knowledge -on the contribution of non-coding or synonymous variants to the -phenotype studied. - -Input:** VCF** - -Output:** The output of this step is the Variant Calling Format (VCF) -file, which contains changes with respect to the reference genome with -the corresponding QC and functional annotations.** - -#### CellBase - -CellBase(5)^ is a relational database integrates biological information -from different sources and includes: - -**Core features:** - -We took genome sequences, genes, transcripts, exons, cytobands or cross -references (xrefs) identifiers (IDs) >from Ensembl -(6)^>. Protein information including sequences, xrefs or -protein features (natural variants, mutagenesis sites, -post-translational modifications, etc.) were imported from UniProt -(7)^>. - -**Regulatory:** - -CellBase imports miRNA from miRBase (8)^; curated and non-curated miRNA -targets from miRecords (9)^, >miRTarBase ^(10)^>, -TargetScan(11)^> and microRNA.org ^(12)^> and -CpG islands and conserved regions from the UCSC database -(13)^>.> - -**Functional annotation** - -OBO Foundry (14)^ develops many biomedical ontologies that are -implemented in OBO format. We designed a SQL schema to store these OBO -ontologies and >30 ontologies were imported. OBO ontology term -annotations were taken from Ensembl (6)^. InterPro ^(15)^ annotations -were also imported. - -**Variation** - -CellBase includes SNPs from dbSNP (16)^; SNP population frequencies -from HapMap (17)^, 1000 genomes project ^(18)^ and Ensembl ^(6)^; -phenotypically annotated SNPs were imported from NHRI GWAS Catalog -(19)^,^ ^>HGMD ^(20)^>, Open Access GWAS Database -(21)^>, UniProt ^(7)^> and OMIM -(22)^>; mutations from COSMIC ^(23)^> and -structural variations from Ensembl -(6)^>.> - -**Systems biology** - -We also import systems biology information like interactome information -from IntAct (24)^. Reactome ^(25)^> stores pathway and interaction -information in BioPAX (26)^> format. BioPAX data exchange -format >enables the integration of diverse pathway -resources. We successfully solved the problem of storing data released -in BioPAX format into a SQL relational schema, which allowed us -importing Reactome in CellBase. - -### [Diagnostic component (TEAM)](diagnostic-component-team.html) - -### [Priorization component (BiERApp)](priorization-component-bierapp.html) - -Usage ------ - -First of all, we should load ngsPipeline -module: - - $ module load ngsPipeline - -This command will load python/2.7.5 -module and all the required modules ( -hpg-aligner, -gatk, etc) - - If we launch ngsPipeline with â€-h’, we will get the usage -help: - - $ ngsPipeline -h - Usage: ngsPipeline.py [-h] -i INPUT -o OUTPUT -p PED --project PROJECT --queue -            QUEUE [--stages-path STAGES_PATH] [--email EMAIL] - [--prefix PREFIX] [-s START] [-e END] --log - - Python pipeline - - optional arguments: -  -h, --help       show this help message and exit -  -i INPUT, --input INPUT -  -o OUTPUT, --output OUTPUT -             Output Data directory -  -p PED, --ped PED   Ped file with all individuals -  --project PROJECT   Project Id -  --queue QUEUE     Queue Id -  --stages-path STAGES_PATH -             Custom Stages path -  --email EMAIL     Email -  --prefix PREFIX    Prefix name for Queue Jobs name -  -s START, --start START -             Initial stage -  -e END, --end END   Final stage -  --log         Log to file - - - -Let us see a brief description of the arguments: - -     *-h --help*. Show the help. - -     *-i, --input.* The input data directory. This directory must to -have a special structure. We have to create one folder per sample (with -the same name). These folders will host the fastq files. These fastq -files must have the following pattern “sampleName” + “_” + “1 or 2” + -“.fq”. 1 for the first pair (in paired-end sequences), and 2 for the -second one. - -     *-o , --output.* The output folder. This folder will contain all -the intermediate and final folders. When the pipeline will be executed -completely, we could remove the intermediate folders and keep only the -final one (with the VCF file containing all the variants) - -     *-p , --ped*. The ped file with the pedigree. This file contains -all the sample names. These names must coincide with the names of the -input folders. If our input folder contains more samples than the .ped -file, the pipeline will use only the samples from the .ped file. - -     *--email.* Email for PBS notifications. - -     *--prefix.* Prefix for PBS Job names. - -    *-s, --start & -e, --end.*  Initial and final stage. If we want to -launch the pipeline in a specific stage we must use -s. If we want to -end the pipeline in a specific stage we must use -e. - -     *--log*. Using log argument NGSpipeline will prompt all the logs -to this file. - -    *--project*>. Project ID of your supercomputer -allocation. - -    *--queue*. -[Queue](../../resource-allocation-and-job-execution/introduction.html) -to run the jobs in. - - >Input, output and ped arguments are mandatory. If the output -folder does not exist, the pipeline will create it. - -Examples ---------------------- - -This is an example usage of NGSpipeline: - -We have a folder with the following structure in > -/apps/bio/omics/1.0/sample_data/ >: - - /apps/bio/omics/1.0/sample_data - └── data - ├── file.ped - ├── sample1 - │  ├── sample1_1.fq - │  └── sample1_2.fq - └── sample2 - ├── sample2_1.fq - └── sample2_2.fq - -The ped file ( file.ped) contains the -following info:> - - #family_ID sample_ID parental_ID maternal_ID sex phenotype - FAM sample_A 0 0 1 1 - FAM sample_B 0 0 2 2 - -Now, lets load the NGSPipeline module and copy the sample data to a -[scratch directory](../../storage.html) : - - $ module load ngsPipeline - $ mkdir -p /scratch/$USER/omics/results - $ cp -r /apps/bio/omics/1.0/sample_data /scratch/$USER/omics/ - -Now, we can launch the pipeline (replace OPEN-0-0 with your Project ID) -: - - $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped --project OPEN-0-0 --queue qprod - -This command submits the processing [jobs to the -queue](../../resource-allocation-and-job-execution/job-submission-and-execution.html). - -If we want to re-launch the pipeline from stage 4 until stage 20 we -should use the next command: - - $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped -s 4 -e 20 --project OPEN-0-0 --queue qprod - -Details on the pipeline ------------------------------------- - -The pipeline calls the following tools: - -- >[fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), - a> quality control tool for high throughput - sequence data. -- >[gatk](https://www.broadinstitute.org/gatk/), >The - Genome Analysis Toolkit or GATK is a software package developed at - the Broad Institute to analyze high-throughput sequencing data. The - toolkit offers a wide variety of tools, with a primary focus on - variant discovery and genotyping as well as strong emphasis on data - quality assurance. Its robust architecture, powerful processing - engine and high-performance computing features make it capable of - taking on projects of any size. -- >[hpg-aligner](http://wiki.opencb.org/projects/hpg/doku.php?id=aligner:downloads), >HPG - Aligner has been designed to align short and long reads with high - sensitivity, therefore any number of mismatches or indels - are allowed. HPG Aligner implements and combines two well known - algorithms: *Burrows-Wheeler Transform*> (BWT) to - speed-up mapping high-quality reads, - and *Smith-Waterman*> (SW) to increase sensitivity when - reads cannot be mapped using BWT. -- >[hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), > a - quality control tool for high throughput - sequence data. -- >[hpg-variant](http://wiki.opencb.org/projects/hpg/doku.php?id=variant:downloads), >The - HPG Variant suite is an ambitious project aimed to provide a - complete suite of tools to work with genomic variation data, from - VCF tools to variant profiling or genomic statistics. It is being - implemented using High Performance Computing technologies to provide - the best performance possible. -- >[picard](http://picard.sourceforge.net/), >Picard - comprises Java-based command-line utilities that manipulate SAM - files, and a Java API (HTSJDK) for creating new programs that read - and write SAM files. Both SAM text format and SAM binary (BAM) - format are supported. -- >[samtools](http://samtools.sourceforge.net/samtools-c.shtml), >SAM - Tools provide various utilities for manipulating alignments in the - SAM format, including sorting, merging, indexing and generating - alignments in a - per-position format. -- >>[snpEff](http://snpeff.sourceforge.net/), <span>Genetic - variant annotation and effect - prediction toolbox. - -This listing show which tools are used in each step of the pipeline : - -- >stage-00: fastqc -- >stage-01: hpg_fastq -- >stage-02: fastqc -- >stage-03: hpg_aligner and samtools -- >stage-04: samtools -- >stage-05: samtools -- >stage-06: fastqc -- >stage-07: picard -- >stage-08: fastqc -- >stage-09: picard -- >stage-10: gatk -- >stage-11: gatk -- >stage-12: gatk -- >stage-13: gatk -- >stage-14: gatk -- >stage-15: gatk -- >stage-16: samtools -- >stage-17: samtools -- >stage-18: fastqc -- >stage-19: gatk -- >stage-20: gatk -- >stage-21: gatk -- >stage-22: gatk -- >stage-23: gatk -- >stage-24: hpg-variant -- >stage-25: hpg-variant -- >stage-26: snpEff -- >stage-27: snpEff -- >stage-28: hpg-variant - -Interpretation ---------------------------- - -The output folder contains all the subfolders with the intermediate -data. This folder contains the final VCF with all the variants. This -file can be uploaded into -[TEAM](diagnostic-component-team.html) by using the VCF -file button. It is important to note here that the entire management of -the VCF file is local: no patient’s sequence data is sent over the -Internet thus avoiding any problem of data privacy or confidentiality. - - - -*Figure 7**. *TEAM upload panel.* *Once the file has been uploaded, a -panel must be chosen from the Panel *** list. Then, pressing the Run -button the diagnostic process starts.* - -Once the file has been uploaded, a panel must be chosen from the Panel -list. Then, pressing the Run button the diagnostic process starts. TEAM -searches first for known diagnostic mutation(s) taken from four -databases: HGMD-public (20)^, -[HUMSAVAR](http://www.uniprot.org/docs/humsavar), -ClinVar (29)^ and COSMIC ^(23)^. - - - -*Figure 7.** *The panel manager. The elements used to define a panel -are (**A**) disease terms, (**B**) diagnostic mutations and (**C**) -genes. Arrows represent actions that can be taken in the panel manager. -Panels can be defined by using the known mutations and genes of a -particular disease. This can be done by dragging them to the **Primary -Diagnostic** box (action **D**). This action, in addition to defining -the diseases in the **Primary Diagnostic** box, automatically adds the -corresponding genes to the **Genes** box. The panels can be customized -by adding new genes (action **F**) or removing undesired genes (action -G**). New disease mutations can be added independently or associated -to an already existing disease term (action **E**). Disease terms can be -removed by simply dragging them back (action **H**).* - -For variant discovering/filtering we should upload the VCF file into -BierApp by using the following form: - -** - -**Figure 8.** *BierApp VCF upload panel. It is recommended to choose -a name for the job as well as a description.** - -Each prioritization (â€job’) has three associated screens that facilitate -the filtering steps. The first one, the â€Summary’ tab, displays a -statistic of the data set analyzed, containing the samples analyzed, the -number and types of variants found and its distribution according to -consequence types. The second screen, in the â€Variants and effect’ tab, -is the actual filtering tool, and the third one, the â€Genome view’ tab, -offers a representation of the selected variants within the genomic -context provided by an embedded version of >the Genome Maps Tool -(30)^>. - - - -**Figure 9.*** *This picture shows all the information associated to -the variants. If a variant has an associated phenotype we could see it -in the last column. In this case, the variant 7:132481242 C>T is -associated to the phenotype: large intestine tumor.** - -* -* - -References ------------------------ - -1. Heng Li, Bob Handsaker, Alec Wysoker, Tim - Fennell, Jue Ruan, Nils Homer, Gabor Marth5, Goncalo Abecasis6, - Richard Durbin and 1000 Genome Project Data Processing Subgroup: The - Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, - 25: 2078-2079. -2. >McKenna A, Hanna M, Banks E, Sivachenko - A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, - Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce - framework for analyzing next-generation DNA sequencing data. - *Genome Res* >2010, 20:1297-1303. -3. Petr Danecek, Adam Auton, Goncalo Abecasis, - Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. - Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean - McVean, Richard Durbin, and 1000 Genomes Project Analysis Group. The - variant call format and VCFtools. Bioinformatics 2011, - 27: 2156-2158. -4. Medina I, De Maria A, Bleda M, Salavert F, - Alonso R, Gonzalez CY, Dopazo J: VARIANT: Command Line, Web service - and Web interface for fast and accurate functional characterization - of variants found by Next-Generation Sequencing. Nucleic Acids Res - 2012, 40:W54-58. -5. Bleda M, Tarraga J, de Maria A, Salavert F, - Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I: CellBase, a - comprehensive collection of RESTful web services for retrieving - relevant biological information from heterogeneous sources. Nucleic - Acids Res 2012, 40:W609-614. -6. Flicek,P., Amode,M.R., Barrell,D., Beal,K., - Brent,S., Carvalho-Silva,D., Clapham,P., Coates,G., - Fairley,S., Fitzgerald,S. et al. (2012) Ensembl 2012. Nucleic Acids - Res., 40, D84–D90. -7. UniProt Consortium. (2012) Reorganizing the - protein space at the Universal Protein Resource (UniProt). Nucleic - Acids Res., 40, D71–D75. -8. Kozomara,A. and Griffiths-Jones,S. (2011) - miRBase: integrating microRNA annotation and deep-sequencing data. - Nucleic Acids Res., 39, D152–D157. -9. Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. - and Li,T. (2009) miRecords: an integrated resource for - microRNA-target interactions. Nucleic Acids Res., - 37, D105–D110. -10. Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., - Huang,W.C., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. - et al. (2011) miRTarBase: a database curates experimentally - validated microRNA-target interactions. Nucleic Acids Res., - 39, D163–D169. -11. Friedman,R.C., Farh,K.K., Burge,C.B. - and Bartel,D.P. (2009) Most mammalian mRNAs are conserved targets - of microRNAs. Genome Res., 19, 92–105. -12. Betel,D., Wilson,M., Gabow,A., Marks,D.S. - and Sander,C. (2008) The microRNA.org resource: targets - and expression. Nucleic Acids Res., 36, D149–D153. -13. Dreszer,T.R., Karolchik,D., Zweig,A.S., - Hinrichs,A.S., Raney,B.J., Kuhn,R.M., Meyer,L.R., Wong,M., - Sloan,C.A., Rosenbloom,K.R. et al. (2012) The UCSC genome browser - database: extensions and updates 2011. Nucleic Acids Res., - 40, D918–D923. -14. Smith,B., Ashburner,M., Rosse,C., Bard,J., - Bug,W., Ceusters,W., Goldberg,L.J., Eilbeck,K., - Ireland,A., Mungall,C.J. et al. (2007) The OBO Foundry: coordinated - evolution of ontologies to support biomedical data integration. Nat. - Biotechnol., 25, 1251–1255. -15. Hunter,S., Jones,P., Mitchell,A., - Apweiler,R., Attwood,T.K.,Bateman,A., Bernard,T., Binns,D., - Bork,P., Burge,S. et al. (2012) InterPro in 2011: new developments - in the family and domain prediction database. Nucleic Acids Res., - 40, D306–D312. -16. Sherry,S.T., Ward,M.H., Kholodov,M., - Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the - NCBI database of genetic variation. Nucleic Acids Res., - 29, 308–311. -17. Altshuler,D.M., Gibbs,R.A., Peltonen,L., - Dermitzakis,E., Schaffner,S.F., Yu,F., Bonnen,P.E., de Bakker,P.I., - Deloukas,P., Gabriel,S.B. et al. (2010) Integrating common and rare - genetic variation in diverse human populations. Nature, - 467, 52–58. -18. 1000 Genomes Project Consortium. (2010) A map - of human genome variation from population-scale sequencing. Nature, - 467, 1061–1073. -19. Hindorff,L.A., Sethupathy,P., Junkins,H.A., - Ramos,E.M., Mehta,J.P., Collins,F.S. and Manolio,T.A. (2009) - Potential etiologic and functional implications of genome-wide - association loci for human diseases and traits. Proc. Natl Acad. - Sci. USA, 106, 9362–9367. -20. Stenson,P.D., Ball,E.V., Mort,M., - Phillips,A.D., Shiel,J.A., Thomas,N.S., Abeysinghe,S., Krawczak,M. - and Cooper,D.N. (2003) Human gene mutation database (HGMD): - 2003 update. Hum. Mutat., 21, 577–581. -21. Johnson,A.D. and O’Donnell,C.J. (2009) An - open access database of genome-wide association results. BMC Med. - Genet, 10, 6. -22. McKusick,V. (1998) A Catalog of Human Genes - and Genetic Disorders, 12th edn. John Hopkins University - Press,Baltimore, MD. -23. Forbes,S.A., Bindal,N., Bamford,S., Cole,C., - Kok,C.Y., Beare,D., Jia,M., Shepherd,R., Leung,K., Menzies,A. et al. - (2011) COSMIC: mining complete cancer genomes in the catalogue of - somatic mutations in cancer. Nucleic Acids Res., - 39, D945–D950. -24. Kerrien,S., Aranda,B., Breuza,L., Bridge,A., - Broackes-Carter,F., Chen,C., Duesbury,M., Dumousseau,M., - Feuermann,M., Hinz,U. et al. (2012) The Intact molecular interaction - database in 2012. Nucleic Acids Res., 40, D841–D846. -25. Croft,D., O’Kelly,G., Wu,G., Haw,R., - Gillespie,M., Matthews,L., Caudy,M., Garapati,P., - Gopinath,G., Jassal,B. et al. (2011) Reactome: a database of - reactions, pathways and biological processes. Nucleic Acids Res., - 39, D691–D697. -26. Demir,E., Cary,M.P., Paley,S., Fukuda,K., - Lemer,C., Vastrik,I.,Wu,G., D’Eustachio,P., Schaefer,C., Luciano,J. - et al. (2010) The BioPAX community standard for pathway - data sharing. Nature Biotechnol., 28, 935–942. -27. Alemán Z, GarcĂa-GarcĂa F, Medina I, Dopazo J - (2014): A web tool for the design and management of panels of genes - for targeted enrichment and massive sequencing for - clinical applications. Nucleic Acids Res 42: W83-7. -28. [Alemán - A](http://www.ncbi.nlm.nih.gov/pubmed?term=Alem%C3%A1n%20A%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Garcia-Garcia - F](http://www.ncbi.nlm.nih.gov/pubmed?term=Garcia-Garcia%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Salavert - F](http://www.ncbi.nlm.nih.gov/pubmed?term=Salavert%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Medina - I](http://www.ncbi.nlm.nih.gov/pubmed?term=Medina%20I%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Dopazo - J](http://www.ncbi.nlm.nih.gov/pubmed?term=Dopazo%20J%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)> (2014). - A web-based interactive framework to assist in the prioritization of - disease candidate genes in whole-exome sequencing studies. - [Nucleic - Acids Res.](http://www.ncbi.nlm.nih.gov/pubmed/?term=BiERapp "Nucleic acids research.")>42 :W88-93. -29. Landrum,M.J., Lee,J.M., Riley,G.R., Jang,W., - Rubinstein,W.S., Church,D.M. and Maglott,D.R. (2014) ClinVar: public - archive of relationships among sequence variation and - human phenotype. Nucleic Acids Res., 42, D980–D985. -30. Medina I, Salavert F, Sanchez R, de Maria A, - Alonso R, Escobar P, Bleda M, Dopazo J: Genome Maps, a new - generation genome browser. Nucleic Acids Res 2013, 41:W41-46. - - - diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/priorization-component-bierapp.md b/docs.it4i/anselm-cluster-documentation/software/omics-master-1/priorization-component-bierapp.md deleted file mode 100644 index a6cd22b5866bbbb95035359d3f1ffddf1c4772cf..0000000000000000000000000000000000000000 --- a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/priorization-component-bierapp.md +++ /dev/null @@ -1,42 +0,0 @@ -Priorization component (BiERApp) -================================ - -### Access - -BiERApp is available at the following address -: <http://omics.it4i.cz/bierapp/> - -The address is accessible only -via [VPN. ](../../accessing-the-cluster/vpn-access.html) - -###BiERApp - -###This tool is aimed to discover new disease genes or variants by studying affected families or cases and controls. It carries out a filtering process to sequentially remove: (i) variants which are not no compatible with the disease because are not expected to have impact on the protein function; (ii) variants that exist at frequencies incompatible with the disease; (iii) variants that do not segregate with the disease. The result is a reduced set of disease gene candidates that should be further validated experimentally. - -BiERapp >(28) efficiently helps in the identification of -causative variants in family and sporadic genetic diseases. The program -reads lists of predicted variants (nucleotide substitutions and indels) -in affected individuals or tumor samples and controls. In family -studies, different modes of inheritance can easily be defined to filter -out variants that do not segregate with the disease along the family. -Moreover, BiERapp integrates additional information such as allelic -frequencies in the general population and the most popular damaging -scores to further narrow down the number of putative variants in -successive filtering steps. BiERapp provides an interactive and -user-friendly interface that implements the filtering strategy used in -the context of a large-scale genomic project carried out by the Spanish -Network for Research, in Rare Diseases (CIBERER) and the Medical Genome -Project. in which more than 800 exomes have been analyzed. - - - -*Figure 6**. *Web interface to the prioritization tool.* *This -figure* *shows the interface of the web tool for candidate gene -prioritization with the filters available. The tool includes a genomic -viewer (Genome Maps >30) that enables the representation of -the variants in the corresponding genomic coordinates.* - diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master/diagnostic-component-team.md b/docs.it4i/anselm-cluster-documentation/software/omics-master/diagnostic-component-team.md new file mode 100644 index 0000000000000000000000000000000000000000..0b1d54245bf02f31224a5e86ef320ebe3a62b183 --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/diagnostic-component-team.md @@ -0,0 +1,18 @@ +Diagnostic component (TEAM) +=========================== + +### Access + +TEAM is available at the following address: <http://omics.it4i.cz/team/> + +>The address is accessible only via [VPN. ](../../accessing-the-cluster/vpn-access.html) + +### Diagnostic component (TEAM) + +VCF files are scanned by this diagnostic tool for known diagnostic disease-associated variants. When no diagnostic mutation is found, the file can be sent to the disease-causing gene discovery tool to see wheter new disease associated variants can be found. + +TEAM (27) is an intuitive and easy-to-use web tool that fills the gap between the predicted mutations and the final diagnostic in targeted enrichment sequencing analysis. The tool searches for known diagnostic mutations, corresponding to a disease panel, among the predicted patient’s variants. Diagnostic variants for the disease are taken from four databases of disease-related variants (HGMD-public, HUMSAVAR , ClinVar and COSMIC) If no primary diagnostic variant is found, then a list of secondary findings that can help to establish a diagnostic is produced. TEAM also provides with an interface for the definition of and customization of panels, by means of which, genes and mutations can be added or discarded to adjust panel definitions. + + + +**Figure 5.** Interface of the application. Panels for defining targeted regions of interest can be set up by just drag and drop known disease genes or disease definitions from the lists. Thus, virtual panels can be interactively improved as the knowledge of the disease increases. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig1.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig1.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig1.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig1.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig2.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig2.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig2.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig2.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig3.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig3.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig3.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig3.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig4.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig4.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig4.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig4.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig5.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig5.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig5.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig5.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig6.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig6.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig6.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig6.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig7.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig7.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig7.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig7.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig7x.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig7x.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig7x.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig7x.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig8.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig8.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig8.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig8.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig9.png b/docs.it4i/anselm-cluster-documentation/software/omics-master/fig9.png similarity index 100% rename from docs.it4i/anselm-cluster-documentation/software/omics-master-1/fig9.png rename to docs.it4i/anselm-cluster-documentation/software/omics-master/fig9.png diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md b/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md new file mode 100644 index 0000000000000000000000000000000000000000..f9716d38d234b7be566aa6d4388050f42527766d --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md @@ -0,0 +1,393 @@ +Overview +======== + +The human NGS data processing solution + +Introduction +------------ +The scope of this OMICS MASTER solution is restricted to human genomics research (disease causing gene discovery in whole human genome or exome) or diagnosis (panel sequencing), although it could be extended in the future to other usages. + +The pipeline inputs the raw data produced by the sequencing machines and undergoes a processing procedure that consists on a quality control, the mapping and variant calling steps that result in a file containing the set of variants in the sample. From this point, the prioritization component or the diagnostic component can be launched. + + + +**Figure 1.** OMICS MASTER solution overview. Data is produced in the external labs and comes to IT4I (represented by the blue dashed line). The data pre-processor converts raw data into a list of variants and annotations for each sequenced patient. These lists files together with primary and secondary (alignment) data files are stored in IT4I sequence DB and uploaded to the discovery (candidate prioritization) or diagnostic component where they can be analyzed directly by the user that produced them, depending of the experimental design carried out. + +Typical genomics pipelines are composed by several components that need to be launched manually. The advantage of OMICS MASTER pipeline is that all these components are invoked sequentially in an automated way. + +OMICS MASTER pipeline inputs a FASTQ file and outputs an enriched VCF file. This pipeline is able to queue all the jobs to PBS by only launching a process taking all the necessary input files and creates the intermediate and final folders + +Let’s see each of the OMICS MASTER solution components: + +Components +---------- + +### Processing + +This component is composed by a set of programs that carry out quality controls, alignment, realignment, variant calling and variant annotation. It turns raw data from the sequencing machine into files containing lists of variants (VCF) that once annotated, can be used by the following components (discovery and diagnosis). + +We distinguish three types of sequencing instruments: bench sequencers (MySeq, IonTorrent, and Roche Junior, although this last one is about being discontinued), which produce relatively Genomes in the clinic + +low throughput (tens of million reads), and high end sequencers, which produce high throughput (hundreds of million reads) among which we have Illumina HiSeq 2000 (and new models) and SOLiD. All of them but SOLiD produce data in sequence format. SOLiD produces data in a special format called colour space that require of specific software for the mapping process. Once the mapping has been done, the rest of the pipeline is identical. Anyway, SOLiD is a technology which is also about being discontinued by the manufacturer so, this type of data will be scarce in the future. + +#### Quality control, preprocessing and statistics for FASTQ + +FastQC& FastQC. + +These steps are carried out over the original FASTQ file with optimized scripts and includes the following steps: sequence cleansing, estimation of base quality scores, elimination of duplicates and statistics. + +Input: **FASTQ file.** + +Output: **FASTQ file plus an HTML file containing statistics on the data.** + +FASTQ format It represents the nucleotide sequence and its corresponding quality scores. + + +**Figure 2.**FASTQ file. + +#### Mapping + +Component:** Hpg-aligner.** + +Sequence reads are mapped over the human reference genome. SOLiD reads are not covered by this solution; they should be mapped with specific software (among the few available options, SHRiMP seems to be the best one). For the rest of NGS machine outputs we use HPG Aligner. HPG-Aligner is an innovative solution, based on a combination of mapping with BWT and local alignment with Smith-Waterman (SW), that drastically increases mapping accuracy (97% versus 62-70% by current mappers, in the most common scenarios). This proposal provides a simple and fast solution that maps almost all the reads, even those containing a high number of mismatches or indels. + +Input: **FASTQ file.** + +Output:** Aligned file in BAM format.** + +**Sequence Alignment/Map (SAM)** + +It is a human readable tab-delimited format in which each read and its alignment is represented on a single line. The format can represent unmapped reads, reads that are mapped to unique locations, and reads that are mapped to multiple locations. + +The SAM format (1) consists of one header section and one alignment section. The lines in the header section start with character â€@’, and lines in the alignment section do not. All lines are TAB delimited. + +In SAM, each alignment line has 11 mandatory fields and a variable number of optional fields. The mandatory fields are briefly described in Table 1. They must be present but their value can be a â€*’ or a zero (depending on the field) if the +corresponding information is unavailable. + + |**No.** |**Name** |**Description**| + |--|--| + |1 |QNAME |Query NAME of the read or the read pai | + |2 |FLAG |Bitwise FLAG (pairing,strand,mate strand,etc.) | + |3 |RNAME |<p>Reference sequence NAME | + |4 |POS |<p>1-Based  leftmost POSition of clipped alignment | + |5 |MAPQ |<p>MAPping Quality (Phred-scaled) | + |6 |CIGAR |<p>Extended CIGAR string (operations:MIDNSHP) | + |7 |MRNM |<p>Mate REference NaMe ('=' if same RNAME) | + |8 |MPOS |<p>1-Based leftmost Mate POSition | + |9 |ISIZE |<p>Inferred Insert SIZE | + |10 |SEQ |<p>Query SEQuence on the same strand as the reference | + |11 |QUAL |<p>Query QUALity (ASCII-33=Phred base quality) | + +**Table 1.** Mandatory fields in the SAM format. + +The standard CIGAR description of pairwise alignment defines three operations: â€M’ for match/mismatch, â€I’ for insertion compared with the reference and â€D’ for deletion. The extended CIGAR proposed in SAM added four more operations: â€N’ for skipped bases on the reference, â€S’ for soft clipping, â€H’ for hard clipping and â€P’ for padding. These support splicing, clipping, multi-part and padded alignments. Figure 3 shows examples of CIGAR strings for different types of alignments. + + + +**Figure 3.** SAM format file. The â€@SQ’ line in the header section gives the order of reference sequences. Notably, r001 is the name of a read pair. According to FLAG 163 (=1+2+32+128), the read mapped to position 7 is the second read in the pair (128) and regarded as properly paired (1 + 2); its mate is mapped to 37 on the reverse strand (32). Read r002 has three soft-clipped (unaligned) bases. The coordinate shown in SAM is the position of the first aligned base. The CIGAR string for this alignment contains a P (padding) operation which correctly aligns the inserted sequences. Padding operations can be absent when an aligner does not support multiple sequence alignment. The last six bases of read r003 map to position 9, and the first five to position 29 on the reverse strand. The hard clipping operation H indicates that the clipped sequence is not present in the sequence field. The NM tag gives the number of mismatches. Read r004 is aligned across an intron, indicated by the N operation. + +**Binary Alignment/Map (BAM)** + +BAM is the binary representation of SAM and keeps exactly the same information as SAM. BAM uses lossless compression to reduce the size of the data by about 75% and provides an indexing system that allows reads that overlap a region of the genome to be retrieved and rapidly traversed. + +#### Quality control, preprocessing and statistics for BAM + +**Component:** Hpg-Fastq & FastQC. Some features: + +- Quality control: % reads with N errors, % reads with multiple mappings, strand bias, paired-end insert, ... +- Filtering: by number of errors, number of hits, … + - Comparator: stats, intersection, ... + +**Input:** BAM file. + +**Output:** BAM file plus an HTML file containing statistics. + +#### Variant Calling + +Component:** GATK.** + +Identification of single nucleotide variants and indels on the alignments is performed using the Genome Analysis Toolkit (GATK). GATK (2) is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. + +**Input:** BAM + +**Output:** VCF + +**Variant Call Format (VCF)** + +VCF (3) is a standardized format for storing the most prevalent types of sequence variation, including SNPs, indels and larger structural variants, together with rich annotations. The format was developed with the primary intention to represent human genetic variation, but its use is not restricted >to diploid genomes and can be used in different contexts as well. Its flexibility and user extensibility allows representation of a wide variety of genomic variation with respect to a single reference sequence. + +A VCF file consists of a header section and a data section. The header contains an arbitrary number of metainformation lines, each starting with characters â€##’, and a TAB delimited field definition line, starting with a single â€#’ character. The meta-information header lines provide a standardized description of tags and annotations used in the data section. The use of meta-information allows the information stored within a VCF file to be tailored to the dataset in question. It can be also used to provide information about the means of file creation, date of creation, version of the reference sequence, software used and any other information relevant to the history of the file. The field definition line names eight mandatory columns, corresponding to data columns representing the chromosome (CHROM), a 1-based position of the start of the variant (POS), unique identifiers of the variant (ID), the reference allele (REF), a comma separated list of alternate non-reference alleles (ALT), a phred-scaled quality score (QUAL), site filtering information (FILTER) and a semicolon separated list of additional, user extensible annotation (INFO). In addition, if samples are present in the file, the mandatory header columns are followed by a FORMAT column and an arbitrary number of sample IDs that define the samples included in the VCF file. The FORMAT column is used to define the information contained within each subsequent genotype column, which consists of a colon separated list of fields. For example, the FORMAT field GT:GQ:DP in the fourth data entry of Figure 1a indicates that the subsequent entries contain information regarding the genotype, genotype quality and read depth for each sample. All data lines are TAB delimited and the number of fields in each data line must match the number of fields in the header line. It is strongly recommended that all annotation tags used are declared in the VCF header section. + + + +**Figure 4.** (a) Example of valid VCF. The header lines ##fileformat and #CHROM are mandatory, the rest is optional but strongly recommended. Each line of the body describes variants present in the sampled population at one genomic position or region. All alternate alleles are listed in the ALT column and referenced from the genotype fields as 1-based indexes to this list; the reference haplotype is designated as 0. For multiploid data, the separator indicates whether the data are phased (|) or unphased (/). Thus, the two alleles C and G at the positions 2 and 5 in this figure occur on the same chromosome in SAMPLE1. The first data line shows an example of a deletion (present in SAMPLE1) and a replacement of two bases by another base (SAMPLE2); the second line shows a SNP and an insertion; the third a SNP; the fourth a large structural variant described by the annotation in the INFO column, the coordinate is that of the base before the variant. (b–f ) Alignments and VCF representations of different sequence variants: SNP, insertion, deletion, replacement, and a large deletion. The REF columns shows the reference bases replaced by the haplotype in the ALT column. The coordinate refers to the first reference base. (g) Users are advised to use simplest representation possible and lowest coordinate in cases where the position is ambiguous. + +###Annotating + +**Component:** HPG-Variant + +The functional consequences of every variant found are then annotated using the HPG-Variant software, which extracts from CellBase**,** the Knowledge database, all the information relevant on the predicted pathologic effect of the variants. + +VARIANT (VARIant Analysis Tool) (4) reports information on the variants found that include consequence type and annotations taken from different databases and repositories (SNPs and variants from dbSNP and 1000 genomes, and disease-related variants from the Genome-Wide Association Study (GWAS) catalog, Online Mendelian Inheritance in Man (OMIM), Catalog of Somatic Mutations in Cancer (COSMIC) mutations, etc. VARIANT also produces a rich variety of annotations that include information on the regulatory (transcription factor or miRNAbinding sites, etc.) or structural roles, or on the selective pressures on the sites affected by the variation. This information allows extending the conventional reports beyond the coding regions and expands the knowledge on the contribution of non-coding or synonymous variants to the phenotype studied. + +**Input:** VCF + +**Output:** The output of this step is the Variant Calling Format (VCF) file, which contains changes with respect to the reference genome with the corresponding QC and functional annotations. + +#### CellBase + +CellBase(5) is a relational database integrates biological information from different sources and includes: + +**Core features:** + +We took genome sequences, genes, transcripts, exons, cytobands or cross references (xrefs) identifiers (IDs) from Ensembl (6). Protein information including sequences, xrefs or protein features (natural variants, mutagenesis sites, post-translational modifications, etc.) were imported from UniProt (7). + +**Regulatory:** + +CellBase imports miRNA from miRBase (8); curated and non-curated miRNA targets from miRecords (9), miRTarBase (10), +TargetScan(11) and microRNA.org (12) and CpG islands and conserved regions from the UCSC database (13). + +**Functional annotation** + +OBO Foundry (14) develops many biomedical ontologies that are implemented in OBO format. We designed a SQL schema to store these OBO ontologies and >30 ontologies were imported. OBO ontology term annotations were taken from Ensembl (6). InterPro (15) annotations were also imported. + +**Variation** + +CellBase includes SNPs from dbSNP (16)^; SNP population frequencies from HapMap (17), 1000 genomes project (18) and Ensembl (6); phenotypically annotated SNPs were imported from NHRI GWAS Catalog (19),HGMD (20), Open Access GWAS Database (21), UniProt (7) and OMIM (22); mutations from COSMIC (23) and structural variations from Ensembl (6). + +**Systems biology** + +We also import systems biology information like interactome information from IntAct (24). Reactome (25) stores pathway and interaction information in BioPAX (26) format. BioPAX data exchange format enables the integration of diverse pathway +resources. We successfully solved the problem of storing data released in BioPAX format into a SQL relational schema, which allowed us importing Reactome in CellBase. + +### [Diagnostic component (TEAM)](diagnostic-component-team.md) + +### [Priorization component (BiERApp)](priorization-component-bierapp.md) + +Usage +----- +First of all, we should load ngsPipeline module: + +```bash + $ module load ngsPipeline +``` + +This command will load python/2.7.5 module and all the required modules (hpg-aligner, gatk, etc) + +If we launch ngsPipeline with â€-h’, we will get the usage help: + +```bash + $ ngsPipeline -h + Usage: ngsPipeline.py [-h] -i INPUT -o OUTPUT -p PED --project PROJECT --queue +            QUEUE [--stages-path STAGES_PATH] [--email EMAIL] + [--prefix PREFIX] [-s START] [-e END] --log + + Python pipeline + + optional arguments: +  -h, --help       show this help message and exit +  -i INPUT, --input INPUT +  -o OUTPUT, --output OUTPUT +             Output Data directory +  -p PED, --ped PED   Ped file with all individuals +  --project PROJECT   Project Id +  --queue QUEUE     Queue Id +  --stages-path STAGES_PATH +             Custom Stages path +  --email EMAIL     Email +  --prefix PREFIX    Prefix name for Queue Jobs name +  -s START, --start START +             Initial stage +  -e END, --end END   Final stage +  --log         Log to file + +``` + +Let us see a brief description of the arguments: + +```bash +     *-h --help*. Show the help. + +     *-i, --input.* The input data directory. This directory must to have a special structure. We have to create one folder per sample (with the same name). These folders will host the fastq files. These fastq files must have the following pattern “sampleName” + “_” + “1 or 2” + “.fq”. 1 for the first pair (in paired-end sequences), and 2 for the +second one. + +     *-o , --output.* The output folder. This folder will contain all the intermediate and final folders. When the pipeline will be executed completely, we could remove the intermediate folders and keep only the final one (with the VCF file containing all the variants) + +     *-p , --ped*. The ped file with the pedigree. This file contains all the sample names. These names must coincide with the names of the input folders. If our input folder contains more samples than the .ped file, the pipeline will use only the samples from the .ped file. + +     *--email.* Email for PBS notifications. + +     *--prefix.* Prefix for PBS Job names. + +    *-s, --start & -e, --end.*  Initial and final stage. If we want to launch the pipeline in a specific stage we must use -s. If we want to end the pipeline in a specific stage we must use -e. + +     *--log*. Using log argument NGSpipeline will prompt all the logs to this file. + +    *--project*>. Project ID of your supercomputer allocation. + +    *--queue*. [Queue](../../resource-allocation-and-job-execution/introduction.html) to run the jobs in. +``` + +Input, output and ped arguments are mandatory. If the output folder does not exist, the pipeline will create it. + +Examples +--------------------- + +This is an example usage of NGSpipeline: + +We have a folder with the following structure in + +```bash +/apps/bio/omics/1.0/sample_data/ >: + + /apps/bio/omics/1.0/sample_data + └── data + ├── file.ped + ├── sample1 + │  ├── sample1_1.fq + │  └── sample1_2.fq + └── sample2 + ├── sample2_1.fq + └── sample2_2.fq +``` + +The ped file ( file.ped) contains the following info: + +```bash + #family_ID sample_ID parental_ID maternal_ID sex phenotype + FAM sample_A 0 0 1 1 + FAM sample_B 0 0 2 2 +``` + +Now, lets load the NGSPipeline module and copy the sample data to a [scratch directory](../../storage.html): + +```bash + $ module load ngsPipeline + $ mkdir -p /scratch/$USER/omics/results + $ cp -r /apps/bio/omics/1.0/sample_data /scratch/$USER/omics/ +``` + +Now, we can launch the pipeline (replace OPEN-0-0 with your Project ID): + +```bash + $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped --project OPEN-0-0 --queue qprod +``` + +This command submits the processing [jobs to the queue](../../resource-allocation-and-job-execution/job-submission-and-execution.html). + +If we want to re-launch the pipeline from stage 4 until stage 20 we should use the next command: + +```bash + $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped -s 4 -e 20 --project OPEN-0-0 --queue qprod +``` + +Details on the pipeline +------------------------------------ + +The pipeline calls the following tools: + +- [fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), quality control tool for high throughput + sequence data. +- [gatk](https://www.broadinstitute.org/gatk/), The Genome Analysis Toolkit or GATK is a software package developed at + the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. +- [hpg-aligner](http://wiki.opencb.org/projects/hpg/doku.php?id=aligner:downloads), HPG Aligner has been designed to align short and long reads with high sensitivity, therefore any number of mismatches or indels are allowed. HPG Aligner implements and combines two well known algorithms: *Burrows-Wheeler Transform* (BWT) to speed-up mapping high-quality reads, and *Smith-Waterman*> (SW) to increase sensitivity when reads cannot be mapped using BWT. +- [hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), a quality control tool for high throughput sequence data. +- [hpg-variant](http://wiki.opencb.org/projects/hpg/doku.php?id=variant:downloads), The HPG Variant suite is an ambitious project aimed to provide a complete suite of tools to work with genomic variation data, from VCF tools to variant profiling or genomic statistics. It is being implemented using High Performance Computing technologies to provide the best performance possible. +- [picard](http://picard.sourceforge.net/), Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (HTSJDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported. +- [samtools](http://samtools.sourceforge.net/samtools-c.shtml), SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. +- [snpEff](http://snpeff.sourceforge.net/), Genetic variant annotation and effect prediction toolbox. + +This listing show which tools are used in each step of the pipeline : + +- stage-00: fastqc +- stage-01: hpg_fastq +- stage-02: fastqc +- stage-03: hpg_aligner and samtools +- stage-04: samtools +- stage-05: samtools +- stage-06: fastqc +- stage-07: picard +- stage-08: fastqc +- stage-09: picard +- stage-10: gatk +- stage-11: gatk +- stage-12: gatk +- stage-13: gatk +- stage-14: gatk +- stage-15: gatk +- stage-16: samtools +- stage-17: samtools +- stage-18: fastqc +- stage-19: gatk +- stage-20: gatk +- stage-21: gatk +- stage-22: gatk +- stage-23: gatk +- stage-24: hpg-variant +- stage-25: hpg-variant +- stage-26: snpEff +- stage-27: snpEff +- stage-28: hpg-variant + +Interpretation +--------------------------- + +The output folder contains all the subfolders with the intermediate data. This folder contains the final VCF with all the variants. This file can be uploaded into [TEAM](diagnostic-component-team.html) by using the VCF file button. It is important to note here that the entire management of the VCF file is local: no patient’s sequence data is sent over the Internet thus avoiding any problem of data privacy or confidentiality. + + + +**Figure 7**. *TEAM upload panel.* *Once the file has been uploaded, a panel must be chosen from the Panel* list. Then, pressing the Run button the diagnostic process starts. + +Once the file has been uploaded, a panel must be chosen from the Panel list. Then, pressing the Run button the diagnostic process starts. TEAM searches first for known diagnostic mutation(s) taken from four databases: HGMD-public (20), [HUMSAVAR](http://www.uniprot.org/docs/humsavar), ClinVar (29)^ and COSMIC (23). + + + +**Figure 7.** *The panel manager. The elements used to define a panel are (**A**) disease terms, (**B**) diagnostic mutations and (**C**) genes. Arrows represent actions that can be taken in the panel manager. Panels can be defined by using the known mutations and genes of a particular disease. This can be done by dragging them to the **Primary Diagnostic** box (action **D**). This action, in addition to defining the diseases in the **Primary Diagnostic** box, automatically adds the corresponding genes to the **Genes** box. The panels can be customized by adding new genes (action **F**) or removing undesired genes (action **G**). New disease mutations can be added independently or associated to an already existing disease term (action **E**). Disease terms can be removed by simply dragging them back (action **H**).* + +For variant discovering/filtering we should upload the VCF file into BierApp by using the following form: + +** + +**Figure 8.** *BierApp VCF upload panel. It is recommended to choose a name for the job as well as a description.** + +Each prioritization (â€job’) has three associated screens that facilitate the filtering steps. The first one, the â€Summary’ tab, displays a statistic of the data set analyzed, containing the samples analyzed, the number and types of variants found and its distribution according to consequence types. The second screen, in the â€Variants and effect’ tab, is the actual filtering tool, and the third one, the â€Genome view’ tab, offers a representation of the selected variants within the genomic context provided by an embedded version of the Genome Maps Tool (30). + + + +**Figure 9.** This picture shows all the information associated to the variants. If a variant has an associated phenotype we could see it in the last column. In this case, the variant 7:132481242 C>T is associated to the phenotype: large intestine tumor. + +References +----------------------- + +1. Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth5, Goncalo Abecasis6, Richard Durbin and 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25: 2078-2079. +2. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. *Genome Res* >2010, 20:1297-1303. +3. Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean, Richard Durbin, and 1000 Genomes Project Analysis Group. The variant call format and VCFtools. Bioinformatics 2011, 27: 2156-2158. +4. Medina I, De Maria A, Bleda M, Salavert F, Alonso R, Gonzalez CY, Dopazo J: VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing. Nucleic Acids Res 2012, 40:W54-58. +5. Bleda M, Tarraga J, de Maria A, Salavert F, Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I: CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res 2012, 40:W609-614. +6. Flicek,P., Amode,M.R., Barrell,D., Beal,K., Brent,S., Carvalho-Silva,D., Clapham,P., Coates,G., Fairley,S., Fitzgerald,S. et al. (2012) Ensembl 2012. Nucleic Acids Res., 40, D84–D90. +7. UniProt Consortium. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res., 40, D71–D75. +8. Kozomara,A. and Griffiths-Jones,S. (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res., 39, D152–D157. +9. Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. and Li,T. (2009) miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res., 37, D105–D110. +10. Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., Huang,W.C., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. et al. (2011) miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res., 39, D163–D169. +11. Friedman,R.C., Farh,K.K., Burge,C.B. and Bartel,D.P. (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res., 19, 92–105. 12. Betel,D., Wilson,M., Gabow,A., Marks,D.S. and Sander,C. (2008) The microRNA.org resource: targets and expression. Nucleic Acids Res., 36, D149–D153. +13. Dreszer,T.R., Karolchik,D., Zweig,A.S., Hinrichs,A.S., Raney,B.J., Kuhn,R.M., Meyer,L.R., Wong,M., Sloan,C.A., Rosenbloom,K.R. et al. (2012) The UCSC genome browser database: extensions and updates 2011. Nucleic Acids Res.,40, D918–D923. +14. Smith,B., Ashburner,M., Rosse,C., Bard,J., Bug,W., Ceusters,W., Goldberg,L.J., Eilbeck,K., Ireland,A., Mungall,C.J. et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol., 25, 1251–1255. +15. Hunter,S., Jones,P., Mitchell,A., Apweiler,R., Attwood,T.K.,Bateman,A., Bernard,T., Binns,D., Bork,P., Burge,S. et al. (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res.,40, D306–D312. +16. Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311. +17. Altshuler,D.M., Gibbs,R.A., Peltonen,L., Dermitzakis,E., Schaffner,S.F., Yu,F., Bonnen,P.E., de Bakker,P.I., Deloukas,P., Gabriel,S.B. et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature, 467, 52–58. +18. 1000 Genomes Project Consortium. (2010) A map of human genome variation from population-scale sequencing. Nature, 467, 1061–1073. +19. Hindorff,L.A., Sethupathy,P., Junkins,H.A., Ramos,E.M., Mehta,J.P., Collins,F.S. and Manolio,T.A. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA, 106, 9362–9367. +20. Stenson,P.D., Ball,E.V., Mort,M., Phillips,A.D., Shiel,J.A., Thomas,N.S., Abeysinghe,S., Krawczak,M. and Cooper,D.N. (2003) Human gene mutation database (HGMD): 2003 update. Hum. Mutat., 21, 577–581. +21. Johnson,A.D. and O’Donnell,C.J. (2009) An open access database of genome-wide association results. BMC Med. Genet, 10, 6. +22. McKusick,V. (1998) A Catalog of Human Genes and Genetic Disorders, 12th edn. John Hopkins University Press,Baltimore, MD. +23. Forbes,S.A., Bindal,N., Bamford,S., Cole,C., Kok,C.Y., Beare,D., Jia,M., Shepherd,R., Leung,K., Menzies,A. et al. (2011) COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res., 39, D945–D950. +24. Kerrien,S., Aranda,B., Breuza,L., Bridge,A., Broackes-Carter,F., Chen,C., Duesbury,M., Dumousseau,M., Feuermann,M., Hinz,U. et al. (2012) The Intact molecular interaction database in 2012. Nucleic Acids Res., 40, D841–D846. +25. Croft,D., O’Kelly,G., Wu,G., Haw,R., Gillespie,M., Matthews,L., Caudy,M., Garapati,P., Gopinath,G., Jassal,B. et al. (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res., 39, D691–D697. +26. Demir,E., Cary,M.P., Paley,S., Fukuda,K., Lemer,C., Vastrik,I.,Wu,G., D’Eustachio,P., Schaefer,C., Luciano,J. et al. (2010) The BioPAX community standard for pathway data sharing. Nature Biotechnol., 28, 935–942. +27. Alemán Z, GarcĂa-GarcĂa F, Medina I, Dopazo J (2014): A web tool for the design and management of panels of genes for targeted enrichment and massive sequencing for clinical applications. Nucleic Acids Res 42: W83-7. +28. [Alemán A](http://www.ncbi.nlm.nih.gov/pubmed?term=Alem%C3%A1n%20A%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Garcia-Garcia F](http://www.ncbi.nlm.nih.gov/pubmed?term=Garcia-Garcia%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Salavert F](http://www.ncbi.nlm.nih.gov/pubmed?term=Salavert%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Medina I](http://www.ncbi.nlm.nih.gov/pubmed?term=Medina%20I%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Dopazo J](http://www.ncbi.nlm.nih.gov/pubmed?term=Dopazo%20J%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)> (2014). A web-based interactive framework to assist in the prioritization of disease candidate genes in whole-exome sequencing studies. [Nucleic Acids Res.](http://www.ncbi.nlm.nih.gov/pubmed/?term=BiERapp "Nucleic acids research.")>42 :W88-93. +29. Landrum,M.J., Lee,J.M., Riley,G.R., Jang,W., Rubinstein,W.S., Church,D.M. and Maglott,D.R. (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res., 42, D980–D985. +30. Medina I, Salavert F, Sanchez R, de Maria A, Alonso R, Escobar P, Bleda M, Dopazo J: Genome Maps, a new generation genome browser. Nucleic Acids Res 2013, 41:W41-46. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master/priorization-component-bierapp.md b/docs.it4i/anselm-cluster-documentation/software/omics-master/priorization-component-bierapp.md new file mode 100644 index 0000000000000000000000000000000000000000..9d26f896434ded3483f4aa1b1f441220c5c968f5 --- /dev/null +++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/priorization-component-bierapp.md @@ -0,0 +1,20 @@ +Priorization component (BiERApp) +================================ + +### Access + +BiERApp is available at the following address: <http://omics.it4i.cz/bierapp/> + +>The address is accessible onlyvia [VPN. ](../../accessing-the-cluster/vpn-access.html) + +###BiERApp + +**This tool is aimed to discover new disease genes or variants by studying affected families or cases and controls. It carries out a filtering process to sequentially remove: (i) variants which are not no compatible with the disease because are not expected to have impact on the protein function; (ii) variants that exist at frequencies incompatible with the disease; (iii) variants that do not segregate with the disease. The result is a reduced set of disease gene candidates that should be further validated experimentally.** + +BiERapp (28) efficiently helps in the identification of causative variants in family and sporadic genetic diseases. The program reads lists of predicted variants (nucleotide substitutions and indels) in affected individuals or tumor samples and controls. In family studies, different modes of inheritance can easily be defined to filter out variants that do not segregate with the disease along the family. Moreover, BiERapp integrates additional information such as allelic frequencies in the general population and the most popular damaging scores to further narrow down the number of putative variants in successive filtering steps. BiERapp provides an interactive and user-friendly interface that implements the filtering strategy used in the context of a large-scale genomic project carried out by the Spanish Network for Research, in Rare Diseases (CIBERER) and the Medical Genome Project. in which more than 800 exomes have been analyzed. + + + +**Figure 6**. Web interface to the prioritization tool.This figure shows the interface of the web tool for candidate gene +prioritization with the filters available. The tool includes a genomic viewer (Genome Maps 30) that enables the representation of the variants in the corresponding genomic coordinates. + diff --git a/docs.it4i/anselm-cluster-documentation/software/openfoam.md b/docs.it4i/anselm-cluster-documentation/software/openfoam.md index f5579ca1e525599353caf270ff971a04d4c23d58..a6e7465967928bcfd98c070e9e30e41c76d2aefa 100644 --- a/docs.it4i/anselm-cluster-documentation/software/openfoam.md +++ b/docs.it4i/anselm-cluster-documentation/software/openfoam.md @@ -1,110 +1,100 @@ -OpenFOAM +OpenFOAM ======== -A free, open source CFD software package +##A free, open source CFD software package - - -Introduction** +Introduction ---------------- - -OpenFOAM is a free, open source CFD software package developed -by [**OpenCFD Ltd**](http://www.openfoam.com/about) at [**ESI -Group**](http://www.esi-group.com/) and distributed by the [**OpenFOAM -Foundation **](http://www.openfoam.org/). It has a large user base -across most areas of engineering and science, from both commercial and -academic organisations. +OpenFOAM is a free, open source CFD software package developed by [**OpenCFD Ltd**](http://www.openfoam.com/about) at [**ESI Group**](http://www.esi-group.com/) and distributed by the [**OpenFOAM Foundation **](http://www.openfoam.org/). It has a large user base across most areas of engineering and science, from both commercial and academic organisations. Homepage: <http://www.openfoam.com/> -###Installed version** +###Installed version -Currently, several version compiled by GCC/ICC compilers in -single/double precision with several version of openmpi are available on -Anselm. +Currently, several version compiled by GCC/ICC compilers in single/double precision with several version of openmpi are available on Anselm. For example syntax of available OpenFOAM module is: < openfoam/2.2.1-icc-openmpi1.6.5-DP > -this means openfoam version >2.2.1 compiled by -ICC compiler with >openmpi1.6.5 in> double -precision. +this means openfoam version 2.2.1 compiled by ICC compiler with openmpi1.6.5 in double precision. Naming convection of the installed versions is following: -  openfoam/<>VERSION>>-<>COMPILER<span>>-<</span><span>openmpiVERSION</span><span>>-<</span><span>PRECISION</span><span>></span> -- ><>VERSION>> - version of - openfoam -- ><>COMPILER> - version of used - compiler -- ><>openmpiVERSION> - version of used - openmpi/impi -- ><>PRECISION> - DP/>SP – - double/single precision +- <VERSION>> - version of openfoam +- <COMPILER> - version of used compiler +- <openmpiVERSION> - version of used openmpi/impi +- <PRECISION> - DP/SP – double/single precision -###Available OpenFOAM modules** +###Available OpenFOAM modules To check available modules use +```bash $ module avail +``` -In /opt/modules/modulefiles/engineering you can see installed -engineering softwares: +In /opt/modules/modulefiles/engineering you can see installed engineering softwares: +```bash ------------------------------------ /opt/modules/modulefiles/engineering ------------------------------------------------------------- ansys/14.5.x              matlab/R2013a-COM                               openfoam/2.2.1-icc-impi4.1.1.036-DP comsol/43b-COM            matlab/R2013a-EDU                               openfoam/2.2.1-icc-openmpi1.6.5-DP comsol/43b-EDU            openfoam/2.2.1-gcc481-openmpi1.6.5-DP           paraview/4.0.1-gcc481-bullxmpi1.2.4.1-osmesa10.0 lsdyna/7.x.x              openfoam/2.2.1-gcc481-openmpi1.6.5-SP +``` -For information how to use modules please [look -here](../environment-and-modules.html "Environment and Modules "). +For information how to use modules please [look here](../environment-and-modules.html "Environment and Modules "). -Getting Started** +Getting Started ------------------- To create OpenFOAM environment on ANSELM give the commands: +```bash $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP $ source $FOAM_BASHRC +``` -Pleas load correct module with your requirements “compiler - GCC/ICC, -precision - DP/SP”. +>Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SP”. -Create a project directory within the $HOME/OpenFOAM directory -named ><USER>-<OFversion> and create a directory -named run within it, e.g. by typing: +Create a project directory within the $HOME/OpenFOAM directory named ><USER>-<OFversion> and create a directory named run within it, e.g. by typing: +```bash $ mkdir -p $FOAM_RUN +``` Project directory is now available by typing: +```bash $ cd /home/<USER>/OpenFOAM/<USER>-<OFversion>/run +``` <OFversion> - for example <2.2.1> or +```bash $ cd $FOAM_RUN +``` -Copy the tutorial examples directory in the OpenFOAM distribution to -the run directory: +Copy the tutorial examples directory in the OpenFOAM distribution to the run directory: +```bash $ cp -r $FOAM_TUTORIALS $FOAM_RUN +``` -Now you can run the first case for example incompressible laminar flow -in a cavity. +Now you can run the first case for example incompressible laminar flow in a cavity. -Running Serial Applications** +Running Serial Applications ------------------------------- -Create a Bash script >test.sh +Create a Bash script >test.sh - +```bash #!/bin/bash module load openfoam/2.2.1-icc-openmpi1.6.5-DP source $FOAM_BASHRC @@ -116,33 +106,25 @@ Create a Bash script >test.sh runApplication blockMesh runApplication icoFoam - - - - +``` Job submission - +```bash $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh +``` +For information about job submission please [look here](../resource-allocation-and-job-execution/job-submission-and-execution.html "Job submission"). - - - For information about job submission please [look -here](../resource-allocation-and-job-execution/job-submission-and-execution.html "Job submission"). - -Running applications in parallel** +Running applications in parallel ------------------------------------------------- -Run the second case for example external incompressible turbulent -flow - case - motorBike. +Run the second case for example external incompressible turbulent flow - case - motorBike. -First we must run serial application bockMesh and decomposePar for -preparation of parallel computation. +First we must run serial application bockMesh and decomposePar for preparation of parallel computation. -Create a Bash scrip test.sh: +>Create a Bash scrip test.sh: - +```bash #!/bin/bash module load openfoam/2.2.1-icc-openmpi1.6.5-DP source $FOAM_BASHRC @@ -154,23 +136,19 @@ Create a Bash scrip test.sh: runApplication blockMesh runApplication decomposePar - - +``` Job submission - +```bash $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh +``` - +This job create simple block mesh and domain decomposition. Check your decomposition, and submit parallel computation: -This job create simple block mesh and domain decomposition. -Check your decomposition, and submit parallel computation: +> Create a PBS script testParallel.pbs: -Create a PBS script> -testParallel.pbs: - - +```bash #!/bin/bash #PBS -N motorBike #PBS -l select=2:ncpus=16 @@ -190,72 +168,65 @@ testParallel.pbs: mpirun -hostfile ${PBS_NODEFILE} -np $nproc potentialFoam -noFunctionObject-writep -parallel | tee potentialFoam.log mpirun -hostfile ${PBS_NODEFILE} -np $nproc simpleFoam -parallel | tee simpleFoam.log - - +``` nproc – number of subdomains Job submission - +```bash $ qsub testParallel.pbs - - - -Compile your own solver** +``` +Compile your own solver ---------------------------------------- Initialize OpenFOAM environment before compiling your solver - +```bash $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP $ source $FOAM_BASHRC $ cd $FOAM_RUN/ +``` Create directory applications/solvers in user directory - +```bash $ mkdir -p applications/solvers $ cd applications/solvers - - +``` Copy icoFoam solver’s source files - +```bash $ cp -r $FOAM_SOLVERS/incompressible/icoFoam/ My_icoFoam $ cd My_icoFoam +``` Rename icoFoam.C to My_icoFOAM.C - +```bash $ mv icoFoam.C My_icoFoam.C - - +``` Edit >*files* file in *Make* directory: - +```bash icoFoam.C EXE = $(FOAM_APPBIN)/icoFoam +``` and change to: - My_icoFoam.C +```bash + My_icoFoam.C EXE = $(FOAM_USER_APPBIN)/My_icoFoam +``` In directory My_icoFoam give the compilation command: - +```bash $ wmake +``` ------------------------------------------------------------------------ - - - - Have a fun with OpenFOAM :)** - - id="__caret"> - - - + **Have a fun with OpenFOAM :)** diff --git a/docs.it4i/anselm-cluster-documentation/software/operating-system.md b/docs.it4i/anselm-cluster-documentation/software/operating-system.md index af15c05074ea33347892db42012d41d0b6b7a7cf..9487d4fab704cb68677b605cb3f4427c91d3f9a1 100644 --- a/docs.it4i/anselm-cluster-documentation/software/operating-system.md +++ b/docs.it4i/anselm-cluster-documentation/software/operating-system.md @@ -1,13 +1,8 @@ -Operating System -================ +Operating System +=============== -The operating system, deployed on ANSELM +##The operating system, deployed on ANSELM +The operating system on Anselm is Linux - bullx Linux Server release 6.3. - - -The operating system on Anselm is Linux - bullx Linux Server release -6.3. - -bullx Linux is based on Red Hat Enterprise Linux. bullx Linux is a Linux -distribution provided by Bull and dedicated to HPC applications. +bullx Linux is based on Red Hat Enterprise Linux. bullx Linux is a Linux distribution provided by Bull and dedicated to HPC applications. diff --git a/docs.it4i/anselm-cluster-documentation/software/paraview.md b/docs.it4i/anselm-cluster-documentation/software/paraview.md index 7aafe20e77601fc324681cc958b651ee1b37edcd..3448b506e0ddc96b8efeff9e256f5ab79e255b84 100644 --- a/docs.it4i/anselm-cluster-documentation/software/paraview.md +++ b/docs.it4i/anselm-cluster-documentation/software/paraview.md @@ -1,93 +1,63 @@ -ParaView +ParaView ======== -An open-source, multi-platform data analysis and visualization -application - - +##An open-source, multi-platform data analysis and visualization application Introduction ------------ -ParaView** is an open-source, multi-platform data analysis and -visualization application. ParaView users can quickly build -visualizations to analyze their data using qualitative and quantitative -techniques. The data exploration can be done interactively in 3D or -programmatically using ParaView's batch processing capabilities. +**ParaView** is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView's batch processing capabilities. -ParaView was developed to analyze extremely large datasets using -distributed memory computing resources. It can be run on supercomputers -to analyze datasets of exascale size as well as on laptops for smaller -data. +ParaView was developed to analyze extremely large datasets using distributed memory computing resources. It can be run on supercomputers to analyze datasets of exascale size as well as on laptops for smaller data. Homepage : <http://www.paraview.org/> Installed version ----------------- - -Currently, version 4.0.1 compiled with GCC 4.8.1 against Bull MPI -library and OSMesa 10.0 is installed on Anselm. +Currently, version 4.0.1 compiled with GCC 4.8.1 against Bull MPI library and OSMesa 10.0 is installed on Anselm. Usage ----- - -On Anselm, ParaView is to be used in client-server mode. A parallel -ParaView server is launched on compute nodes by the user, and client is -launched on your desktop PC to control and view the visualization. -Download ParaView client application for your OS here -: <http://paraview.org/paraview/resources/software.php>. Important : -your version must match the version number installed on Anselm** ! -(currently v4.0.1) +On Anselm, ParaView is to be used in client-server mode. A parallel ParaView server is launched on compute nodes by the user, and client is launched on your desktop PC to control and view the visualization. Download ParaView client application for your OS here: <http://paraview.org/paraview/resources/software.php>. Important : **your version must match the version number installed on Anselm** ! (currently v4.0.1) ### Launching server To launch the server, you must first allocate compute nodes, for example -:> +```bash $ qsub -I -q qprod -A OPEN-0-0 -l select=2 +``` -to launch an interactive session on 2 nodes. Refer to [Resource -Allocation and Job -Execution](../resource-allocation-and-job-execution/introduction.html) -for details. +to launch an interactive session on 2 nodes. Refer to [Resource Allocation and Job Execution](../resource-allocation-and-job-execution/introduction.html) for details. After the interactive session is opened, load the ParaView module : +```bash $ module add paraview +``` -Now launch the parallel server, with number of nodes times 16 processes -: +Now launch the parallel server, with number of nodes times 16 processes: +```bash $ mpirun -np 32 pvserver --use-offscreen-rendering Waiting for client... Connection URL: cs://cn77:11111 Accepting connection(s): cn77:11111 +``` - Note the that the server is listening on compute node cn77 in this -case, we shall use this information later. +Note the that the server is listening on compute node cn77 in this case, we shall use this information later. ### Client connection -Because a direct connection is not allowed to compute nodes on Anselm, -you must establish a SSH tunnel to connect to the server. Choose a port -number on your PC to be forwarded to ParaView server, for example 12345. -If your PC is running Linux, use this command to estabilish a SSH tunnel -: +Because a direct connection is not allowed to compute nodes on Anselm, you must establish a SSH tunnel to connect to the server. Choose a port number on your PC to be forwarded to ParaView server, for example 12345. If your PC is running Linux, use this command to estabilish a SSH tunnel: +```bash ssh -TN -L 12345:cn77:11111 username@anselm.it4i.cz +``` -replace username with your login and cn77 -with the name of compute node your ParaView server is running on (see -previous step). If you use PuTTY on Windows, load Anselm connection -configuration, t>hen go to Connection-> -SSH>->Tunnels to set up the -port forwarding. Click Remote radio button. Insert 12345 to Source port -textbox. Insert cn77:11111. Click Add button, then Open. [Read -more about port -forwarding.](https://docs.it4i.cz/anselm-cluster-documentation/software/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) +replace username with your login and cn77 with the name of compute node your ParaView server is running on (see previous step). If you use PuTTY on Windows, load Anselm connection configuration, t>hen go to Connection-> SSH>->Tunnels to set up the port forwarding. Click Remote radio button. Insert 12345 to Source port textbox. Insert cn77:11111. Click Add button, then Open. [Read more about port forwarding.](https://docs.it4i.cz/anselm-cluster-documentation/software/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) -Now launch ParaView client installed on your desktop PC. Select -File->Connect..., click Add Server. Fill in the following : +Now launch ParaView client installed on your desktop PC. Select File->Connect..., click Add Server. Fill in the following : Name : Anselm tunnel @@ -97,25 +67,18 @@ Host : localhost Port : 12345 -Click Configure, Save, the configuration is now saved for later use. Now -click Connect to connect to the ParaView server. In your terminal where -you have interactive session with ParaView server launched, you should -see : +Click Configure, Save, the configuration is now saved for later use. Now click Connect to connect to the ParaView server. In your terminal where you have interactive session with ParaView server launched, you should see: +```bash Client connected. +``` You can now use Parallel ParaView. ### Close server -Remember to close the interactive session after you finish working with -ParaView server, as it will remain launched even after your client is -disconnected and will continue to consume resources. +Remember to close the interactive session after you finish working with ParaView server, as it will remain launched even after your client is disconnected and will continue to consume resources. GPU support ----------- - -Currently, GPU acceleration is not supported in the server and ParaView -will not take advantage of accelerated nodes on Anselm. Support for GPU -acceleration might be added in the future. - +Currently, GPU acceleration is not supported in the server and ParaView will not take advantage of accelerated nodes on Anselm. Support for GPU acceleration might be added in the future. \ No newline at end of file diff --git a/docs.it4i/anselm-cluster-documentation/storage-1/cesnet-data-storage.md b/docs.it4i/anselm-cluster-documentation/storage/cesnet-data-storage.md similarity index 100% rename from docs.it4i/anselm-cluster-documentation/storage-1/cesnet-data-storage.md rename to docs.it4i/anselm-cluster-documentation/storage/cesnet-data-storage.md diff --git a/docs.it4i/anselm-cluster-documentation/storage-1/storage.md b/docs.it4i/anselm-cluster-documentation/storage/storage.md similarity index 100% rename from docs.it4i/anselm-cluster-documentation/storage-1/storage.md rename to docs.it4i/anselm-cluster-documentation/storage/storage.md