From 198f6972fa754534bf052a4f9036dfe1bf726b88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Luk=C3=A1=C5=A1=20Krup=C4=8D=C3=ADk?= <lukas.krupcik@vsb.cz>
Date: Thu, 26 Jan 2017 11:47:00 +0100
Subject: [PATCH] formatting style

---
 .gitlab-ci.yml                                |   2 +-
 .../compute-nodes.md                          | 171 +++++-----
 .../introduction.md                           |   6 +-
 .../job-submission-and-execution.md           |  52 ++-
 .../anselm-cluster-documentation/prace.md     |  42 ++-
 .../resources-allocation-policy.md            |  45 ++-
 .../shell-and-data-access.md                  |  52 ++-
 .../software/gpi2.md                          |  41 ++-
 .../omics-master/diagnostic-component-team.md |  13 +-
 .../software/omics-master/overview.md         | 305 +++++++++---------
 .../priorization-component-bierapp.md         |  15 +-
 11 files changed, 360 insertions(+), 384 deletions(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 91877db56..b5405032b 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -9,7 +9,7 @@ docs:
   image: davidhrbac/docker-mdcheck:latest
   allow_failure: true
   script:
-  - mdl -r ~MD013 *.md docs.it4i/
+  - mdl -r ~MD013,~MD033,~MD014 *.md docs.it4i/
 
 two spaces:
   stage: test
diff --git a/docs.it4i/anselm-cluster-documentation/compute-nodes.md b/docs.it4i/anselm-cluster-documentation/compute-nodes.md
index 2d4f1707c..f7b65a0f1 100644
--- a/docs.it4i/anselm-cluster-documentation/compute-nodes.md
+++ b/docs.it4i/anselm-cluster-documentation/compute-nodes.md
@@ -1,55 +1,53 @@
-Compute Nodes
-=============
+# Compute Nodes
+
+## Nodes Configuration
 
-Nodes Configuration
--------------------
 Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes.
 
-###Compute Nodes Without Accelerator
-
--    180 nodes
--    2880 cores in total
--    two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
--    64 GB of physical memory per node
--    one 500GB SATA 2,5â€ť 7,2 krpm HDD per node
--    bullx B510 blade servers
--    cn[1-180]
-
-###Compute Nodes With GPU Accelerator
-
--    23 nodes
--    368 cores in total
--    two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
--    96 GB of physical memory per node
--    one 500GB SATA 2,5â€ť 7,2 krpm HDD per node
--    GPU accelerator 1x NVIDIA Tesla Kepler K20 per node
--    bullx B515 blade servers
--    cn[181-203]
-
-###Compute Nodes With MIC Accelerator
-
--    4 nodes
--    64 cores in total
--    two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
--    96 GB of physical memory per node
--    one 500GB SATA 2,5â€ť 7,2 krpm HDD per node
--    MIC accelerator 1x Intel Phi 5110P per node
--    bullx B515 blade servers
--    cn[204-207]
-
-###Fat Compute Nodes
-
--    2 nodes
--    32 cores in total
--    2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
--    512 GB of physical memory per node
--    two 300GB SAS 3,5â€ť15krpm HDD (RAID1) per node
--    two 100GB SLC SSD per node
--    bullx R423-E3 servers
--    cn[208-209]
+### Compute Nodes Without Accelerator
+
+* 180 nodes
+* 2880 cores in total
+* two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
+* 64 GB of physical memory per node
+* one 500GB SATA 2,5â€ť 7,2 krpm HDD per node
+* bullx B510 blade servers
+* cn[1-180]
+
+### Compute Nodes With GPU Accelerator
+
+* 23 nodes
+* 368 cores in total
+* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
+* 96 GB of physical memory per node
+* one 500GB SATA 2,5â€ť 7,2 krpm HDD per node
+* GPU accelerator 1x NVIDIA Tesla Kepler K20 per node
+* bullx B515 blade servers
+* cn[181-203]
+
+### Compute Nodes With MIC Accelerator
+
+* 4 nodes
+* 64 cores in total
+* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
+* 96 GB of physical memory per node
+* one 500GB SATA 2,5â€ť 7,2 krpm HDD per node
+* MIC accelerator 1x Intel Phi 5110P per node
+* bullx B515 blade servers
+* cn[204-207]
+
+### Fat Compute Nodes
+
+* 2 nodes
+* 32 cores in total
+* 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
+* 512 GB of physical memory per node
+* two 300GB SAS 3,5â€ť15krpm HDD (RAID1) per node
+* two 100GB SLC SSD per node
+* bullx R423-E3 servers
+* cn[208-209]
 
 ![](../img/bullxB510.png)
-
 **Figure Anselm bullx B510 servers**
 
 ### Compute Nodes Summary
@@ -61,31 +59,29 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bu
   |Nodes with MIC accelerator|4|cn[204-207]|96GB|16 @ 2.3GHz|qmic, qprod|
   |Fat compute nodes|2|cn[208-209]|512GB|16 @ 2.4GHz|qfat, qprod|
 
-Processor Architecture
-----------------------
+## Processor Architecture
+
 Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes without accelerator and fat nodes) and Intel Xeon E5-2470 (nodes with accelerator). Processors support Advanced Vector Extensions (AVX) 256-bit instruction set.
 
 ### Intel Sandy Bridge E5-2665 Processor
 
--   eight-core
--   speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology
--   peak performance:  19.2 GFLOP/s per
-    core
--   caches:
-    -   L2: 256 KB per core
-    -   L3: 20 MB per processor
--   memory bandwidth at the level of the processor: 51.2 GB/s
+* eight-core
+* speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology
+* peak performance:  19.2 GFLOP/s per core
+* caches:
+  * L2: 256 KB per core
+  * L3: 20 MB per processor
+* memory bandwidth at the level of the processor: 51.2 GB/s
 
 ### Intel Sandy Bridge E5-2470 Processor
 
--   eight-core
--   speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology
--   peak performance:  18.4 GFLOP/s per
-    core
--   caches:
-    -   L2: 256 KB per core
-    -   L3: 20 MB per processor
--   memory bandwidth at the level of the processor: 38.4 GB/s
+* eight-core
+* speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology
+* peak performance:  18.4 GFLOP/s per core
+* caches:
+  * L2: 256 KB per core
+  * L3: 20 MB per processor
+* memory bandwidth at the level of the processor: 38.4 GB/s
 
 Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23.
 
@@ -101,35 +97,34 @@ Intel Turbo Boost Technology is used by default,  you can disable it for all nod
     $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I
 ```
 
-Memory Architecture
--------------------
+## Memory Architecture
 
 ### Compute Node Without Accelerator
 
--   2 sockets
--   Memory Controllers are integrated into processors.
-    -   8 DDR3 DIMMs per node
-    -   4 DDR3 DIMMs per CPU
-    -   1 DDR3 DIMMs per channel
-    -   Data rate support: up to 1600MT/s
--   Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz
+* 2 sockets
+* Memory Controllers are integrated into processors.
+  * 8 DDR3 DIMMs per node
+  * 4 DDR3 DIMMs per CPU
+  * 1 DDR3 DIMMs per channel
+  * Data rate support: up to 1600MT/s
+* Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz
 
 ### Compute Node With GPU or MIC Accelerator
 
--   2 sockets
--   Memory Controllers are integrated into processors.
-    -   6 DDR3 DIMMs per node
-    -   3 DDR3 DIMMs per CPU
-    -   1 DDR3 DIMMs per channel
-    -   Data rate support: up to 1600MT/s
--   Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz
+* 2 sockets
+* Memory Controllers are integrated into processors.
+  * 6 DDR3 DIMMs per node
+  * 3 DDR3 DIMMs per CPU
+  * 1 DDR3 DIMMs per channel
+  * Data rate support: up to 1600MT/s
+* Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz
 
 ### Fat Compute Node
 
--   2 sockets
--   Memory Controllers are integrated into processors.
-    -   16 DDR3 DIMMs per node
-    -   8 DDR3 DIMMs per CPU
-    -   2 DDR3 DIMMs per channel
-    -   Data rate support: up to 1600MT/s
--   Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz
+* 2 sockets
+* Memory Controllers are integrated into processors.
+  * 16 DDR3 DIMMs per node
+  * 8 DDR3 DIMMs per CPU
+  * 2 DDR3 DIMMs per channel
+  * Data rate support: up to 1600MT/s
+* Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz
diff --git a/docs.it4i/anselm-cluster-documentation/introduction.md b/docs.it4i/anselm-cluster-documentation/introduction.md
index ffdac2802..6cf377ecf 100644
--- a/docs.it4i/anselm-cluster-documentation/introduction.md
+++ b/docs.it4i/anselm-cluster-documentation/introduction.md
@@ -1,13 +1,11 @@
-Introduction
-============
+# Introduction
 
 Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totaling 3344 compute cores with 15 TB RAM and giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB RAM, and 500 GB hard disk drive. Nodes are interconnected by fully non-blocking fat-tree InfiniBand network and equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](hardware-overview/).
 
-The cluster runs bullx Linux ([bull](http://www.bull.com/bullx-logiciels/systeme-exploitation.html)) [operating system](software/operating-system/), which is compatible with the  RedHat [ Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment](environment-and-modules/).
+The cluster runs [operating system](software/operating-system/), which is compatible with the  RedHat [Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment](environment-and-modules/).
 
 User data shared file-system (HOME, 320 TB) and job data shared file-system (SCRATCH, 146 TB) are available to users.
 
 The PBS Professional workload manager provides [computing resources allocations and job execution](resources-allocation-policy/).
 
 Read more on how to [apply for resources](../get-started-with-it4innovations/applying-for-resources/), [obtain login credentials,](../get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials/) and [access the cluster](shell-and-data-access/).
-
diff --git a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md
index b500a5a2b..a7dd20406 100644
--- a/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md
+++ b/docs.it4i/anselm-cluster-documentation/job-submission-and-execution.md
@@ -1,19 +1,18 @@
-Job submission and execution
-============================
+# Job submission and execution
+
+## Job Submission
 
-Job Submission
---------------
 When allocating computational resources for the job, please specify
 
-1.  suitable queue for your job (default is qprod)
-2.  number of computational nodes required
-3.  number of cores per node required
-4.  maximum wall time allocated to your calculation, note that jobs exceeding maximum wall time will be killed
-5.  Project ID
-6.  Jobscript or interactive switch
+1. suitable queue for your job (default is qprod)
+1. number of computational nodes required
+1. number of cores per node required
+1. maximum wall time allocated to your calculation, note that jobs exceeding maximum wall time will be killed
+1. Project ID
+1. Jobscript or interactive switch
 
 !!! Note "Note"
-	Use the **qsub** command to submit your job to a queue for allocation of the computational resources.
+    Use the **qsub** command to submit your job to a queue for allocation of the computational resources.
 
 Submit the job using the qsub command:
 
@@ -61,8 +60,7 @@ By default, the PBS batch system sends an e-mail only when the job is aborted. D
 $ qsub -m n
 ```
 
-Advanced job placement
-----------------------
+## Advanced job placement
 
 ### Placement by name
 
@@ -103,8 +101,7 @@ We recommend allocating compute nodes of a single switch when best possible comp
 
 In this example, we request all the 18 nodes sharing the isw11 switch for 24 hours. Full chassis will be allocated.
 
-Advanced job handling
----------------------
+## Advanced job handling
 
 ### Selecting Turbo Boost off
 
@@ -133,10 +130,10 @@ The MPI processes will be distributed differently on the nodes connected to the
 
 Although this example is somewhat artificial, it demonstrates the flexibility of the qsub command options.
 
-Job Management
---------------
+## Job Management
+
 !!! Note "Note"
-	Check status of your jobs using the **qstat** and **check-pbs-jobs** commands
+    Check status of your jobs using the **qstat** and **check-pbs-jobs** commands
 
 ```bash
 $ qstat -a
@@ -217,7 +214,7 @@ Run loop 3
 In this example, we see actual output (some iteration loops) of the job 35141.dm2
 
 !!! Note "Note"
-	Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel**, **qsig** or **qalter** commands
+    Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel**, **qsig** or **qalter** commands
 
 You may release your allocation at any time, using qdel command
 
@@ -237,18 +234,17 @@ Learn more by reading the pbs man page
 $ man pbs_professional
 ```
 
-Job Execution
--------------
+## Job Execution
 
 ### Jobscript
 
 !!! Note "Note"
-	Prepare the jobscript to run batch jobs in the PBS queue system
+    Prepare the jobscript to run batch jobs in the PBS queue system
 
 The Jobscript is a user made script, controlling sequence of commands for executing the calculation. It is often written in bash, other scripts may be used as well. The jobscript is supplied to PBS **qsub** command as an argument and executed by the PBS Professional workload manager.
 
 !!! Note "Note"
-	The jobscript or interactive shell is executed on first of the allocated nodes.
+    The jobscript or interactive shell is executed on first of the allocated nodes.
 
 ```bash
 $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
@@ -278,7 +274,7 @@ $ pwd
 In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory.
 
 !!! Note "Note"
-	All nodes within the allocation may be accessed via ssh.  Unallocated nodes are not accessible to user.
+    All nodes within the allocation may be accessed via ssh.  Unallocated nodes are not accessible to user.
 
 The allocated nodes are accessible via ssh from login nodes. The nodes may access each other via ssh as well.
 
@@ -310,7 +306,7 @@ In this example, the hostname program is executed via pdsh from the interactive
 ### Example Jobscript for MPI Calculation
 
 !!! Note "Note"
-	Production jobs must use the /scratch directory for I/O
+    Production jobs must use the /scratch directory for I/O
 
 The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory.
 
@@ -342,12 +338,12 @@ exit
 In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes.
 
 !!! Note "Note"
-	Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts.
+    Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts.
 
 In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished.
 
 !!! Note "Note"
-	Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution.
+    Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution.
 
 Example jobscript for an MPI job with preloaded inputs and executables, options for qsub are stored within the script :
 
@@ -380,7 +376,7 @@ sections.
 ### Example Jobscript for Single Node Calculation
 
 !!! Note "Note"
-	Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends.
+    Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends.
 
 Example jobscript for single node calculation, using [local scratch](storage/) on the node:
 
diff --git a/docs.it4i/anselm-cluster-documentation/prace.md b/docs.it4i/anselm-cluster-documentation/prace.md
index c48fc3e22..0b1243838 100644
--- a/docs.it4i/anselm-cluster-documentation/prace.md
+++ b/docs.it4i/anselm-cluster-documentation/prace.md
@@ -1,26 +1,24 @@
-PRACE User Support
-==================
+# PRACE User Support
+
+## Intro
 
-Intro
------
 PRACE users coming to Anselm as to TIER-1 system offered through the DECI calls are in general treated as standard users and so most of the general documentation applies to them as well. This section shows the main differences for quicker orientation, but often uses references to the original documentation. PRACE users who don't undergo the full procedure (including signing the IT4I AuP on top of the PRACE AuP) will not have a password and thus access to some services intended for regular users. This can lower their comfort, but otherwise they should be able to use the TIER-1 system as intended. Please see the [Obtaining Login Credentials section](../get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials/), if the same level of access is required.
 
 All general [PRACE User Documentation](http://www.prace-ri.eu/user-documentation/) should be read before continuing reading the local documentation here.
 
-Help and Support
---------------------
+## Help and Support
+
 If you have any troubles, need information, request support or want to install additional software, please use [PRACE Helpdesk](http://www.prace-ri.eu/helpdesk-guide264/).
 
 Information about the local services are provided in the [introduction of general user documentation](introduction/). Please keep in mind, that standard PRACE accounts don't have a password to access the web interface of the local (IT4Innovations) request tracker and thus a new ticket should be created by sending an e-mail to support[at]it4i.cz.
 
-Obtaining Login Credentials
----------------------------
+## Obtaining Login Credentials
+
 In general PRACE users already have a PRACE account setup through their HOMESITE (institution from their country) as a result of rewarded PRACE project proposal. This includes signed PRACE AuP, generated and registered certificates, etc.
 
 If there's a special need a PRACE user can get a standard (local) account at IT4Innovations. To get an account on the Anselm cluster, the user needs to obtain the login credentials. The procedure is the same as for general users of the cluster, so please see the corresponding section of the general documentation here.
 
-Accessing the cluster
----------------------
+## Accessing the cluster
 
 ### Access with GSI-SSH
 
@@ -30,11 +28,11 @@ The user will need a valid certificate and to be present in the PRACE LDAP (plea
 
 Most of the information needed by PRACE users accessing the Anselm TIER-1 system can be found here:
 
--   [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs)
--   [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ)
--   [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh)
--   [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details)
--   [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)
+* [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs)
+* [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ)
+* [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh)
+* [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details)
+* [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)
 
 Before you start to use any of the services don't forget to create a proxy certificate from your certificate:
 
@@ -116,8 +114,8 @@ If the user uses GSI SSH based access, then the procedure is similar to the SSH
 
 After successful obtainment of login credentials for the local IT4Innovations account, the PRACE users can access the cluster as regular users using SSH. For more information please see the section in general documentation.
 
-File transfers
-------------------
+## File transfers
+
 PRACE users can use the same transfer mechanisms as regular users (if they've undergone the full registration procedure). For information about this, please see the section in the general documentation.
 
 Apart from the standard mechanisms, for PRACE users to transfer data to/from Anselm cluster, a GridFTP server running Globus Toolkit GridFTP service is available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners).
@@ -199,9 +197,9 @@ Generally both shared file systems are available through GridFTP:
 
 More information about the shared file systems is available [here](storage/).
 
-Usage of the cluster
---------------------
- There are some limitations for PRACE user when using the cluster. By default PRACE users aren't allowed to access special queues in the PBS Pro to have high priority or exclusive access to some special equipment like accelerated nodes and high memory (fat) nodes. There may be also restrictions obtaining a working license for the commercial software installed on the cluster, mostly because of the license agreement or because of insufficient amount of licenses.
+## Usage of the cluster
+
+There are some limitations for PRACE user when using the cluster. By default PRACE users aren't allowed to access special queues in the PBS Pro to have high priority or exclusive access to some special equipment like accelerated nodes and high memory (fat) nodes. There may be also restrictions obtaining a working license for the commercial software installed on the cluster, mostly because of the license agreement or because of insufficient amount of licenses.
 
 For production runs always use scratch file systems, either the global shared or the local ones. The available file systems are described [here](hardware-overview/).
 
@@ -225,7 +223,7 @@ For PRACE users, the default production run queue is "qprace". PRACE users can a
 |---|---|---|---|---|---|---|
 |**qexp** Express queue|no|none required|2 reserved, 8 total|high|no|1 / 1h|
 |**qprace** Production queue|yes|> 0|178 w/o accelerator|medium|no|24 / 48 h|
-|**qfree** Free resource queue|yes|none required|178 w/o accelerator|very low|no|	12 / 12 h|
+|**qfree** Free resource queue|yes|none required|178 w/o accelerator|very low|no|12 / 12 h|
 
 **qprace**, the PRACE: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprace. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprace is 12 hours. If the job needs longer time, it must use checkpoint/restart functionality.
 
@@ -238,7 +236,7 @@ PRACE users should check their project accounting using the [PRACE Accounting To
 Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". Please note that you need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours".
 
 !!! Note "Note"
-	The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients>
+    The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients>
 
 ```bash
     $ it4ifree
diff --git a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md
index 0576b4e4f..92d338d84 100644
--- a/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md
+++ b/docs.it4i/anselm-cluster-documentation/resources-allocation-policy.md
@@ -1,31 +1,30 @@
-Resources Allocation Policy
-===========================
+# Resources Allocation Policy
+
+## Introduction
 
-Resources Allocation Policy
----------------------------
 The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The Fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview:
 
 !!! Note "Note"
-	Check the queue status at https://extranet.it4i.cz/anselm/
+    Check the queue status at [https://extranet.it4i.cz/anselm/](https://extranet.it4i.cz/anselm/)
 
- |queue |active project |project resources |nodes|min ncpus|priority|authorization|walltime |
- | --- | --- | --- | --- | --- | --- | --- | --- |
- |qexp |no |none required |2 reserved, 31 totalincluding MIC, GPU and FAT nodes |1 |150 |no |1 h |
- |qprod |yes |0 |178 nodes w/o accelerator |16 |0 |no |24/48 h |
- |qlong |yes |0 |60 nodes w/o accelerator |16 |0 |no |72/144 h |
- |qnvidia, qmic, qfat |yes |0 |23 total qnvidia4 total qmic2 total qfat |16 |200 |yes |24/48 h |
- |qfree |yes |none required |178 w/o accelerator |16 |-1024 |no |12 h |
+|queue|active project|project resources|nodes|min ncpus|priority|authorization|walltime|
+|---|---|---|---|---|---|---|---|
+|qexp|no|none required|2 reserved, 31 totalincluding MIC, GPU and FAT nodes|1|150|no|1 h|
+|qprod|yes|0|178 nodes w/o accelerator|16|0|no|24/48 h|
+|qlong|yes|0|60 nodes w/o accelerator|16|0|no|72/144 h|
+|qnvidia, qmic, qfat|yes|0|23 total qnvidia4 total qmic2 total qfat|16|200|yes|24/48 h|
+|qfree|yes|none required|178 w/o accelerator|16|-1024|no|12 h|
 
 !!! Note "Note"
-	**The qfree queue is not free of charge**. [Normal accounting](#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue.
+ **The qfree queue is not free of charge**. [Normal accounting](#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue.
 
-    **The qexp queue is equipped with the nodes not having the very same CPU clock speed.** Should you need the very same CPU speed, you have to select the proper nodes during the PSB job submission.
+**The qexp queue is equipped with the nodes not having the very same CPU clock speed.** Should you need the very same CPU speed, you have to select the proper nodes during the PSB job submission.
 
-- **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour.
-- **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
-- **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 * 48 h).
-- **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project.
-- **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
+* **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour.
+* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
+* **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 * 48 h).
+* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project.
+* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
 
 ### Notes
 
@@ -37,7 +36,8 @@ Anselm users may check current queue configuration at <https://extranet.it4i.cz/
 
 ### Queue status
 
->Check the status of jobs, queues and compute nodes at <https://extranet.it4i.cz/anselm/>
+!!! tip
+    Check the status of jobs, queues and compute nodes at <https://extranet.it4i.cz/anselm/>
 
 ![rspbs web interface](../img/rsweb.png)
 
@@ -105,8 +105,7 @@ Options:
   --incl-finished       Include finished jobs
 ```
 
-Resources Accounting Policy
--------------------------------
+## Resources Accounting Policy
 
 ### The Core-Hour
 
@@ -115,7 +114,7 @@ The resources that are currently subject to accounting are the core-hours. The c
 ### Check consumed resources
 
 !!! Note "Note"
-	The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients>
+    The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients>
 
 User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes.
 
diff --git a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md
index a72995bb0..1b318ba45 100644
--- a/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md
+++ b/docs.it4i/anselm-cluster-documentation/shell-and-data-access.md
@@ -1,8 +1,7 @@
-Accessing the Cluster
-==============================
+# Accessing the Cluster
+
+## Shell Access
 
-Shell Access
------------------
 The Anselm cluster is accessed by SSH protocol via login nodes login1 and login2 at address anselm.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address.
 
 |Login address|Port|Protocol|Login node|
@@ -13,11 +12,11 @@ The Anselm cluster is accessed by SSH protocol via login nodes login1 and login2
 
 The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/)
 
-!!! Note "Note"
-	Please verify SSH fingerprints during the first logon. They are identical on all login nodes:
+!!! note
+    Please verify SSH fingerprints during the first logon. They are identical on all login nodes:
 
-	29:b3:f4:64:b0:73:f5:6f:a7:85:0f:e0:0d:be:76:bf (DSA)
-	d4:6f:5c:18:f4:3f:70:ef:bc:fc:cc:2b:fd:13:36:b7 (RSA)
+    29:b3:f4:64:b0:73:f5:6f:a7:85:0f:e0:0d:be:76:bf (DSA)
+    d4:6f:5c:18:f4:3f:70:ef:bc:fc:cc:2b:fd:13:36:b7 (RSA)
 
 Private key authentication:
 
@@ -55,10 +54,10 @@ Last login: Tue Jul  9 15:57:38 2013 from your-host.example.com
 Example to the cluster login:
 
 !!! Note "Note"
-	The environment is **not** shared between login nodes, except for [shared filesystems](storage/#shared-filesystems).
+    The environment is **not** shared between login nodes, except for [shared filesystems](storage/#shared-filesystems).
+
+## Data Transfer
 
-Data Transfer
--------------
 Data in and out of the system may be transferred by the [scp](http://en.wikipedia.org/wiki/Secure_copy) and sftp protocols.  (Not available yet.) In case large volumes of data are transferred, use dedicated data mover node dm1.anselm.it4i.cz for increased performance.
 
 |Address|Port|Protocol|
@@ -71,14 +70,14 @@ Data in and out of the system may be transferred by the [scp](http://en.wikipedi
 The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/)
 
 !!! Note "Note"
-	Data transfer rates up to **160MB/s** can be achieved with scp or sftp.
+    Data transfer rates up to **160MB/s** can be achieved with scp or sftp.
 
     1TB may be transferred in 1:50h.
 
 To achieve 160MB/s transfer rates, the end user must be connected by 10G line all the way to IT4Innovations and use computer with fast processor for the transfer. Using Gigabit ethernet connection, up to 110MB/s may be expected.  Fast cipher (aes128-ctr) should be used.
 
 !!! Note "Note"
-	If you experience degraded data transfer performance, consult your local network provider.
+    If you experience degraded data transfer performance, consult your local network provider.
 
 On linux or Mac, use scp or sftp client to transfer the data to Anselm:
 
@@ -116,9 +115,8 @@ On Windows, use [WinSCP client](http://winscp.net/eng/download.php) to transfer
 
 More information about the shared file systems is available [here](storage/).
 
+## Connection restrictions
 
-Connection restrictions
------------------------
 Outgoing connections, from Anselm Cluster login nodes to the outside world, are restricted to following ports:
 
 |Port|Protocol|
@@ -129,17 +127,16 @@ Outgoing connections, from Anselm Cluster login nodes to the outside world, are
 |9418|git|
 
 !!! Note "Note"
-	Please use **ssh port forwarding** and proxy servers to connect from Anselm to all other remote ports.
+    Please use **ssh port forwarding** and proxy servers to connect from Anselm to all other remote ports.
 
 Outgoing connections, from Anselm Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut.
 
-Port forwarding
----------------
+## Port forwarding
 
 ### Port forwarding from login nodes
 
 !!! Note "Note"
-	Port forwarding allows an application running on Anselm to connect to arbitrary remote host and port.
+    Port forwarding allows an application running on Anselm to connect to arbitrary remote host and port.
 
 It works by tunneling the connection from Anselm back to users workstation and forwarding from the workstation to the remote host.
 
@@ -159,7 +156,8 @@ Port forwarding may be established directly to the remote host. However, this re
 $ ssh -L 6000:localhost:1234 remote.host.com
 ```
 
-Note: Port number 6000 is chosen as an example only. Pick any free port.
+!!! note
+    Port number 6000 is chosen as an example only. Pick any free port.
 
 ### Port forwarding from compute nodes
 
@@ -180,7 +178,7 @@ In this example, we assume that port forwarding from login1:6000 to remote.host.
 Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward.
 
 !!! Note "Note"
-	Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server.
+    Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server.
 
 To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run:
 
@@ -198,13 +196,11 @@ local $ ssh -R 6000:localhost:1080 anselm.it4i.cz
 
 Now, configure the applications proxy settings to **localhost:6000**. Use port forwarding  to access the [proxy server from compute nodes](#port-forwarding-from-compute-nodes) as well.
 
-Graphical User Interface
-------------------------
+## Graphical User Interface
 
--   The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters.
--   The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer).
+* The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters.
+* The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer).
 
-VPN Access
-----------
+## VPN Access
 
--   Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/).
+* Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/).
diff --git a/docs.it4i/anselm-cluster-documentation/software/gpi2.md b/docs.it4i/anselm-cluster-documentation/software/gpi2.md
index d61fbed6f..8818844f9 100644
--- a/docs.it4i/anselm-cluster-documentation/software/gpi2.md
+++ b/docs.it4i/anselm-cluster-documentation/software/gpi2.md
@@ -1,16 +1,13 @@
-GPI-2
-=====
+# GPI-2
 
-##A library that implements the GASPI specification
+## Introduction
 
-Introduction
-------------
 Programming Next Generation Supercomputers: GPI-2 is an API library for asynchronous interprocess, cross-node communication. It provides a flexible, scalable and fault tolerant interface for parallel applications.
 
 The GPI-2 library ([www.gpi-site.com/gpi2/](http://www.gpi-site.com/gpi2/)) implements the GASPI specification (Global Address Space Programming Interface, [www.gaspi.de](http://www.gaspi.de/en/project.html)). GASPI is a Partitioned Global Address Space (PGAS) API. It aims at scalable, flexible and failure tolerant computing in massively parallel environments.
 
-Modules
--------
+## Modules
+
 The GPI-2, version 1.0.2 is available on Anselm via module gpi2:
 
 ```bash
@@ -19,10 +16,10 @@ The GPI-2, version 1.0.2 is available on Anselm via module gpi2:
 
 The module sets up environment variables, required for linking and running GPI-2 enabled applications. This particular command loads the default module, which is gpi2/1.0.2
 
-Linking
--------
-!!! Note "Note"
-	Link with -lGPI2 -libverbs
+## Linking
+
+!!! note
+    Link with -lGPI2 -libverbs
 
 Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link your code against GPI-2. The GPI-2 requires the OFED infinband communication library ibverbs.
 
@@ -42,11 +39,10 @@ Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link y
     $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs
 ```
 
-Running the GPI-2 codes
------------------------
+## Running the GPI-2 codes
 
-!!! Note "Note"
-	gaspi_run starts the GPI-2 application
+!!! note
+    gaspi_run starts the GPI-2 application
 
 The gaspi_run utility is used to start and run GPI-2 applications:
 
@@ -54,7 +50,7 @@ The gaspi_run utility is used to start and run GPI-2 applications:
     $ gaspi_run -m machinefile ./myprog.x
 ```
 
-A machine file (**machinefile**) with the hostnames of nodes where the application will run, must be provided. The machinefile lists all nodes on which to run, one entry per node per process. This file may be hand created or obtained from standard $PBS_NODEFILE:
+A machine file (** machinefile **) with the hostnames of nodes where the application will run, must be provided. The machinefile lists all nodes on which to run, one entry per node per process. This file may be hand created or obtained from standard $PBS_NODEFILE:
 
 ```bash
     $ cut -f1 -d"." $PBS_NODEFILE > machinefile
@@ -80,8 +76,8 @@ machinefle:
 
 This machinefile will run 4 GPI-2 processes, 2 on node cn79 o 2 on node cn80.
 
-!!! Note "Note"
-	Use the **mpiprocs** to control how many GPI-2 processes will run per node
+!!! note
+    Use the **mpiprocs**to control how many GPI-2 processes will run per node
 
 Example:
 
@@ -93,13 +89,12 @@ This example will produce $PBS_NODEFILE with 16 entries per node.
 
 ### gaspi_logger
 
-!!! Note "Note"
-	gaspi_logger views the output form GPI-2 application ranks
+!!! note
+    gaspi_logger views the output form GPI-2 application ranks
 
 The gaspi_logger utility is used to view the output from all nodes except the master node (rank 0). The gaspi_logger is started, on another session, on the master node - the node where the gaspi_run is executed. The output of the application, when called with gaspi_printf(), will be redirected to the gaspi_logger. Other I/O routines (e.g. printf) will not.
 
-Example
--------
+## Example
 
 Following is an example GPI-2 enabled code:
 
@@ -169,4 +164,4 @@ At the same time, in another session, you may start the gaspi logger:
     [cn80:0] Hello from rank 1 of 2
 ```
 
-In this example, we compile the helloworld_gpi.c code using the **gnu compiler** (gcc) and link it to the GPI-2 and ibverbs library. The library search path is compiled in. For execution, we use the qexp queue, 2 nodes 1 core each. The GPI module must be loaded on the master compute node (in this example the cn79), gaspi_logger is used from different session to view the output of the second process.
+In this example, we compile the helloworld_gpi.c code using the **gnu compiler**(gcc) and link it to the GPI-2 and ibverbs library. The library search path is compiled in. For execution, we use the qexp queue, 2 nodes 1 core each. The GPI module must be loaded on the master compute node (in this example the cn79), gaspi_logger is used from different session to view the output of the second process.
diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master/diagnostic-component-team.md b/docs.it4i/anselm-cluster-documentation/software/omics-master/diagnostic-component-team.md
index 47f3e6478..908641a19 100644
--- a/docs.it4i/anselm-cluster-documentation/software/omics-master/diagnostic-component-team.md
+++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/diagnostic-component-team.md
@@ -1,14 +1,13 @@
-Diagnostic component (TEAM)
-===========================
+# Diagnostic component (TEAM)
 
-### Access
+## Access
 
 TEAM is available at the [following address](http://omics.it4i.cz/team/)
 
-!!! Note "Note"
-	The address is accessible only via VPN.
+!!! note
+    The address is accessible only via VPN.
 
-### Diagnostic component (TEAM)
+## Diagnostic component
 
 VCF files are scanned by this diagnostic tool for known diagnostic disease-associated variants. When no diagnostic mutation is found, the file can be sent to the disease-causing gene discovery tool to see whether new disease associated variants can be found.
 
@@ -16,4 +15,4 @@ TEAM (27) is an intuitive and easy-to-use web tool that fills the gap between th
 
 ![Interface of the application. Panels for defining targeted regions of interest can be set up by just drag and drop known disease genes or disease definitions from the lists. Thus, virtual panels can be interactively improved as the knowledge of the disease increases.](../../../img/fig5.png)
 
-**Figure 5.** Interface of the application. Panels for defining targeted regions of interest can be set up by just drag and drop known disease genes or disease definitions from the lists. Thus, virtual panels can be interactively improved as the knowledge of the disease increases.
+** Figure 5. **Interface of the application. Panels for defining targeted regions of interest can be set up by just drag and drop known disease genes or disease definitions from the lists. Thus, virtual panels can be interactively improved as the knowledge of the disease increases.
diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md b/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md
index 2b3be2f52..a02517c34 100644
--- a/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md
+++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md
@@ -1,10 +1,9 @@
-Overview
-========
+# Overview
 
 The human NGS data processing solution
 
-Introduction
-------------
+## Introduction
+
 The scope of this OMICS MASTER solution is restricted to human genomics research (disease causing gene discovery in whole human genome or exome) or diagnosis (panel sequencing), although it could be extended in the future to other usages.
 
 The pipeline inputs the raw data produced by the sequencing machines and undergoes a processing procedure that consists on a quality control, the mapping and variant calling steps that result in a file containing the set of variants in the sample. From this point, the prioritization component or the diagnostic component can be launched.
@@ -12,7 +11,7 @@ The pipeline inputs the raw data produced by the sequencing machines and undergo
 ![OMICS MASTER solution overview. Data is produced in the external labs and comes to IT4I (represented by the blue dashed line). The data pre-processor converts raw data into a list of variants and annotations for each sequenced patient. These lists files together with primary and secondary (alignment) data files are stored in IT4I sequence DB and uploaded to the discovery (candidate priorization) or diagnostic component where they can be analysed directly by the user that produced
 them, depending of the experimental design carried out.](../../../img/fig1.png)
 
-**Figure 1.** OMICS MASTER solution overview. Data is produced in the external labs and comes to IT4I (represented by the blue dashed line). The data pre-processor converts raw data into a list of variants and annotations for each sequenced patient. These lists files together with primary and secondary (alignment) data files are stored in IT4I sequence DB and uploaded to the discovery (candidate prioritization) or diagnostic component where they can be analyzed directly by the user that produced them, depending of the experimental design carried out.
+** Figure 1. ** OMICS MASTER solution overview. Data is produced in the external labs and comes to IT4I (represented by the blue dashed line). The data pre-processor converts raw data into a list of variants and annotations for each sequenced patient. These lists files together with primary and secondary (alignment) data files are stored in IT4I sequence DB and uploaded to the discovery (candidate prioritization) or diagnostic component where they can be analyzed directly by the user that produced them, depending of the experimental design carried out.
 
 Typical genomics pipelines are composed by several components that need to be launched manually. The advantage of OMICS MASTER pipeline is that all these components are invoked sequentially in an automated way.
 
@@ -20,8 +19,7 @@ OMICS MASTER pipeline inputs a FASTQ file and outputs an enriched VCF file. This
 
 Letâ€™s see each of the OMICS MASTER solution components:
 
-Components
-----------
+## Components
 
 ### Processing
 
@@ -37,26 +35,26 @@ FastQC& FastQC.
 
 These steps are carried out over the original FASTQ file with optimized scripts and includes the following steps: sequence cleansing, estimation of base quality scores, elimination of duplicates and statistics.
 
-Input: **FASTQ file.**
+Input: ** FASTQ file **.
 
-Output: **FASTQ file plus an HTML file containing statistics on the data.**
+Output: ** FASTQ file plus an HTML file containing statistics on the data **.
 
 FASTQ format It represents the nucleotide sequence and its corresponding quality scores.
 
 ![FASTQ file.](../../../img/fig2.png "fig2.png")
-**Figure 2.**FASTQ file.
+** Figure 2 **.FASTQ file.
 
 #### Mapping
 
-Component:** Hpg-aligner.**
+Component: ** Hpg-aligner **.
 
 Sequence reads are mapped over the human reference genome. SOLiD reads are not covered by this solution; they should be mapped with specific software (among the few available options, SHRiMP seems to be the best one). For the rest of NGS machine outputs we use HPG Aligner. HPG-Aligner is an innovative solution, based on a combination of mapping with BWT and local alignment with Smith-Waterman (SW), that drastically increases mapping accuracy (97% versus 62-70% by current mappers, in the most common scenarios). This proposal provides a simple and fast solution that maps almost all the reads, even those containing a high number of mismatches or indels.
 
-Input: **FASTQ file.**
+Input: ** FASTQ file **.
 
-Output:** Aligned file in BAM format.**
+Output: ** Aligned file in BAM format **.
 
-**Sequence Alignment/Map (SAM)**
+** Sequence Alignment/Map (SAM) **
 
 It is a human readable tab-delimited format in which each read and its alignment is represented on a single line. The format can represent unmapped reads, reads that are mapped to unique locations, and reads that are mapped to multiple locations.
 
@@ -65,55 +63,61 @@ The SAM format (1) consists of one header section and one alignment section. The
 In SAM, each alignment line has 11 mandatory fields and a variable number of optional fields. The mandatory fields are briefly described in Table 1. They must be present but their value can be a â€*â€™ or a zero (depending on the field) if the
 corresponding information is unavailable.
 
- |**No.** |**Name** |**Description**|
- |--|--|
- |1 |QNAME |Query NAME of the read or the read pai |
- |2 |FLAG |Bitwise FLAG (pairing,strand,mate strand,etc.) |
- |3 |RNAME |<p>Reference sequence NAME |
- |4 |POS  |<p>1-Based  leftmost POSition of clipped alignment |
- |5 |MAPQ  |<p>MAPping Quality (Phred-scaled) |
- |6 |CIGAR |<p>Extended CIGAR string (operations:MIDNSHP) |
- |7 |MRNM  |<p>Mate REference NaMe ('=' if same RNAME) |
- |8 |MPOS  |<p>1-Based leftmost Mate POSition |
- |9 |ISIZE |<p>Inferred Insert SIZE |
- |10 |SEQ  |<p>Query SEQuence on the same strand as the reference |
- |11 |QUAL |<p>Query QUALity (ASCII-33=Phred base quality) |
-
-**Table 1.** Mandatory fields in the SAM format.
+ |** No. **|** Name **|** Description **|
+ |--|--|--|
+ |1|QNAME|Query NAME of the read or the read pai|
+ |2|FLAG|Bitwise FLAG (pairing,strand,mate strand,etc.)|
+ |3|RNAME|<p>Reference sequence NAME|
+ |4|POS|<p>1-Based  leftmost POSition of clipped alignment|
+ |5|MAPQ|<p>MAPping Quality (Phred-scaled)|
+ |6|CIGAR|<p>Extended CIGAR string (operations:MIDNSHP)|
+ |7|MRNM|<p>Mate REference NaMe ('=' if same RNAME)|
+ |8|MPOS|<p>1-Based leftmost Mate POSition|
+ |9|ISIZE|<p>Inferred Insert SIZE|
+ |10|SEQ|<p>Query SEQuence on the same strand as the reference|
+ |11|QUAL|<p>Query QUALity (ASCII-33=Phred base quality)|
+
+** Table 1 **. Mandatory fields in the SAM format.
 
 The standard CIGAR description of pairwise alignment defines three operations: â€Mâ€™ for match/mismatch, â€Iâ€™ for insertion compared with the reference and â€Dâ€™ for deletion. The extended CIGAR proposed in SAM added four more operations: â€Nâ€™ for skipped bases on the reference, â€Sâ€™ for soft clipping, â€Hâ€™ for hard clipping and â€Pâ€™ for padding. These support splicing, clipping, multi-part and padded alignments. Figure 3 shows examples of CIGAR strings for different types of alignments.
 
 ![SAM format file. The â€@SQâ€™ line in the header section gives the order of reference sequences. Notably, r001 is the name of a read pair. According to FLAG 163 (=1+2+32+128), the read mapped to position 7 is the second read in the pair (128) and regarded as properly paired (1 + 2); its mate is mapped to 37 on the reverse strand (32). Read r002 has three soft-clipped (unaligned) bases. The coordinate shown in SAM is the position of the first aligned base. The CIGAR string for this alignment contains a P (padding) operation which correctly aligns the inserted sequences. Padding operations can be absent when an aligner does not support multiple sequence alignment. The last six bases of read r003 map to position 9, and the first five to position 29 on the reverse strand. The hard clipping operation H indicates that the clipped sequence is not present in the sequence field. The NM tag gives the number of mismatches. Read r004 is aligned across an intron, indicated by the N operation.](../../../img/fig3.png)
 
-**Figure 3.** SAM format file. The â€@SQâ€™ line in the header section gives the order of reference sequences. Notably, r001 is the name of a read pair. According to FLAG 163 (=1+2+32+128), the read mapped to position 7 is the second read in the pair (128) and regarded as properly paired (1 + 2); its mate is mapped to 37 on the reverse strand (32). Read r002 has three soft-clipped (unaligned) bases. The coordinate shown in SAM is the position of the first aligned base. The CIGAR string for this alignment contains a P (padding) operation which correctly aligns the inserted sequences. Padding operations can be absent when an aligner does not support multiple sequence alignment. The last six bases of read r003 map to position 9, and the first five to position 29 on the reverse strand. The hard clipping operation H indicates that the clipped sequence is not present in the sequence field. The NM tag gives the number of mismatches. Read r004 is aligned across an intron, indicated by the N operation.
+** Figure 3 **. SAM format file. The â€@SQâ€™ line in the header section gives the order of reference sequences. Notably, r001 is the name of a read pair. According to FLAG 163 (=1+2+32+128), the read mapped to position 7 is the second read in the pair (128) and regarded as properly paired (1 + 2); its mate is mapped to 37 on the reverse strand (32). Read r002 has three soft-clipped (unaligned) bases. The coordinate shown in SAM is the position of the first aligned base. The CIGAR string for this alignment contains a P (padding) operation which correctly aligns the inserted sequences. Padding operations can be absent when an aligner does not support multiple sequence alignment. The last six bases of read r003 map to position 9, and the first five to position 29 on the reverse strand. The hard clipping operation H indicates that the clipped sequence is not present in the sequence field. The NM tag gives the number of mismatches. Read r004 is aligned across an intron, indicated by the N operation.
 
-**Binary Alignment/Map (BAM)**
+** Binary Alignment/Map (BAM) **
 
 BAM is the binary representation of SAM and keeps exactly the same information as SAM. BAM uses lossless compression to reduce the size of the data by about 75% and provides an indexing system that allows reads that overlap a region of the genome to be retrieved and rapidly traversed.
 
 #### Quality control, preprocessing and statistics for BAM
 
-**Component:** Hpg-Fastq & FastQC. Some features:
+** Component **: Hpg-Fastq & FastQC.
+
+Some features
 
--   Quality control: % reads with N errors, % reads with multiple mappings, strand bias, paired-end insert, ...
--   Filtering: by number of errors, number of hits, â€¦
-    -   Comparator: stats, intersection, ...
+* Quality control
+  * reads with N errors
+  * reads with multiple mappings
+  * strand bias
+  * paired-end insert
+* Filtering: by number of errors, number of hits
+  * Comparator: stats, intersection, ...
 
-**Input:** BAM file.
+** Input: ** BAM file.
 
-**Output:** BAM file plus an HTML file containing statistics.
+** Output: ** BAM file plus an HTML file containing statistics.
 
 #### Variant Calling
 
-Component:** GATK.**
+Component: ** GATK **.
 
 Identification of single nucleotide variants and indels on the alignments is performed using the Genome Analysis Toolkit (GATK). GATK (2) is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance.
 
-**Input:** BAM
+** Input: ** BAM
 
-**Output:** VCF
+** Output: ** VCF
 
-**Variant Call Format (VCF)**
+** Variant Call Format (VCF) **
 
 VCF (3) is a standardized format for storing the most prevalent types of sequence variation, including SNPs, indels and larger structural variants, together with rich annotations. The format was developed with the primary intention to represent human genetic variation, but its use is not restricted >to diploid genomes and can be used in different contexts as well. Its flexibility and user extensibility allows representation of a wide variety of genomic variation with respect to a single reference sequence.
 
@@ -123,42 +127,42 @@ A VCF file consists of a header section and a data section. The header contains
 this list; the reference haplotype is designated as 0. For multiploid data, the separator indicates whether the data are phased (|) or unphased (/). Thus, the two alleles C and G at the positions 2 and 5 in this figure occur on the same chromosome in SAMPLE1. The first data line shows an example of a deletion (present in SAMPLE1) and a replacement of
 two bases by another base (SAMPLE2); the second line shows a SNP and an insertion; the third a SNP; the fourth a large structural variant described by the annotation in the INFO column, the coordinate is that of the base before the variant. (bâ€“f ) Alignments and VCF representations of different sequence variants: SNP, insertion, deletion, replacement, and a large deletion. The REF columns shows the reference bases replaced by the haplotype in the ALT column. The coordinate refers to the first reference base. (g) Users are advised to use simplest representation possible and lowest coordinate in cases where the position is ambiguous.](../../../img/fig4.png)
 
-**Figure 4.** (a) Example of valid VCF. The header lines ##fileformat and #CHROM are mandatory, the rest is optional but strongly recommended. Each line of the body describes variants present in the sampled population at one genomic position or region. All alternate alleles are listed in the ALT column and referenced from the genotype fields as 1-based indexes to this list; the reference haplotype is designated as 0. For multiploid data, the separator indicates whether the data are phased (|) or unphased (/). Thus, the two alleles C and G at the positions 2 and 5 in this figure occur on the same chromosome in SAMPLE1. The first data line shows an example of a deletion (present in SAMPLE1) and a replacement of two bases by another base (SAMPLE2); the second line shows a SNP and an insertion; the third a SNP; the fourth a large structural variant described by the annotation in the INFO column, the coordinate is that of the base before the variant. (bâ€“f ) Alignments and VCF representations of different sequence variants: SNP, insertion, deletion, replacement, and a large deletion. The REF columns shows the reference bases replaced by the haplotype in the ALT column. The coordinate refers to the first reference base. (g) Users are advised to use simplest representation possible and lowest coordinate in cases where the position is ambiguous.
+** Figure 4 **. (a) Example of valid VCF. The header lines ##fileformat and #CHROM are mandatory, the rest is optional but strongly recommended. Each line of the body describes variants present in the sampled population at one genomic position or region. All alternate alleles are listed in the ALT column and referenced from the genotype fields as 1-based indexes to this list; the reference haplotype is designated as 0. For multiploid data, the separator indicates whether the data are phased (|) or unphased (/). Thus, the two alleles C and G at the positions 2 and 5 in this figure occur on the same chromosome in SAMPLE1. The first data line shows an example of a deletion (present in SAMPLE1) and a replacement of two bases by another base (SAMPLE2); the second line shows a SNP and an insertion; the third a SNP; the fourth a large structural variant described by the annotation in the INFO column, the coordinate is that of the base before the variant. (bâ€“f ) Alignments and VCF representations of different sequence variants: SNP, insertion, deletion, replacement, and a large deletion. The REF columns shows the reference bases replaced by the haplotype in the ALT column. The coordinate refers to the first reference base. (g) Users are advised to use simplest representation possible and lowest coordinate in cases where the position is ambiguous.
 
-###Annotating
+### Annotating
 
-**Component:** HPG-Variant
+** Component: ** HPG-Variant
 
-The functional consequences of every variant found are then annotated using the HPG-Variant software, which extracts from CellBase**,** the Knowledge database, all the information relevant on the predicted pathologic effect of the variants.
+The functional consequences of every variant found are then annotated using the HPG-Variant software, which extracts from CellBase, the Knowledge database, all the information relevant on the predicted pathologic effect of the variants.
 
 VARIANT (VARIant Analysis Tool) (4) reports information on the variants found that include consequence type and annotations taken from different databases and repositories (SNPs and variants from dbSNP and 1000 genomes, and disease-related variants from the Genome-Wide Association Study (GWAS) catalog, Online Mendelian Inheritance in Man (OMIM), Catalog of Somatic Mutations in Cancer (COSMIC) mutations, etc. VARIANT also produces a rich variety of annotations that include information on the regulatory (transcription factor or miRNAbinding sites, etc.) or structural roles, or on the selective pressures on the sites affected by the variation. This information allows extending the conventional reports beyond the coding regions and expands the knowledge on the contribution of non-coding or synonymous variants to the phenotype studied.
 
-**Input:** VCF
+** Input: ** VCF
 
-**Output:** The output of this step is the Variant Calling Format (VCF) file, which contains changes with respect to the reference genome with the corresponding QC and functional annotations.
+** Output: ** The output of this step is the Variant Calling Format (VCF) file, which contains changes with respect to the reference genome with the corresponding QC and functional annotations.
 
 #### CellBase
 
 CellBase(5) is a relational database integrates biological information from different sources and includes:
 
-**Core features:**
+** Core features: **
 
 We took genome sequences, genes, transcripts, exons, cytobands or cross references (xrefs) identifiers (IDs) from Ensembl (6). Protein information including sequences, xrefs or protein features (natural variants, mutagenesis sites, post-translational modifications, etc.) were imported from UniProt (7).
 
-**Regulatory:**
+** Regulatory: **
 
 CellBase imports miRNA from miRBase (8); curated and non-curated miRNA targets from miRecords (9), miRTarBase (10),
 TargetScan(11) and microRNA.org (12) and CpG islands and conserved regions from the UCSC database (13).
 
-**Functional annotation**
+** Functional annotation **
 
-OBO Foundry (14) develops many biomedical ontologies that are implemented in OBO format. We designed a SQL schema to store these OBO ontologies and &gt;30 ontologies were imported. OBO ontology term annotations were taken from Ensembl (6). InterPro (15) annotations were also imported.
+OBO Foundry (14) develops many biomedical ontologies that are implemented in OBO format. We designed a SQL schema to store these OBO ontologies and 30 ontologies were imported. OBO ontology term annotations were taken from Ensembl (6). InterPro (15) annotations were also imported.
 
-**Variation**
+** Variation **
 
 CellBase includes SNPs from dbSNP (16)^; SNP population frequencies from HapMap (17), 1000 genomes project (18) and Ensembl (6); phenotypically annotated SNPs were imported from NHRI GWAS Catalog (19),HGMD (20), Open Access GWAS Database (21), UniProt (7) and OMIM (22); mutations from COSMIC (23) and structural variations from Ensembl (6).
 
-**Systems biology**
+** Systems biology **
 
 We also import systems biology information like interactome information from IntAct (24). Reactome (25) stores pathway and interaction information in BioPAX (26) format. BioPAX data exchange format enables the integration of diverse pathway
 resources. We successfully solved the problem of storing data released in BioPAX format into a SQL relational schema, which allowed us importing Reactome in CellBase.
@@ -167,8 +171,8 @@ resources. We successfully solved the problem of storing data released in BioPAX
 
 ### [Priorization component (BiERApp)](priorization-component-bierapp/)
 
-Usage
------
+## Usage
+
 First of all, we should load  ngsPipeline module:
 
 ```bash
@@ -182,27 +186,27 @@ If we launch ngsPipeline with â€-hâ€™, we will get the usage help:
 ```bash
     $ ngsPipeline -h
     Usage: ngsPipeline.py [-h] -i INPUT -o OUTPUT -p PED --project PROJECT --queue
-                          QUEUE [--stages-path STAGES_PATH] [--email EMAIL]
+      QUEUE [--stages-path STAGES_PATH] [--email EMAIL]
      [--prefix PREFIX] [-s START] [-e END] --log
 
     Python pipeline
 
     optional arguments:
-      -h, --help            show this help message and exit
+      -h, --help  show this help message and exit
       -i INPUT, --input INPUT
       -o OUTPUT, --output OUTPUT
-                            Output Data directory
+        Output Data directory
       -p PED, --ped PED     Ped file with all individuals
       --project PROJECT     Project Id
       --queue QUEUE         Queue Id
       --stages-path STAGES_PATH
-                            Custom Stages path
+        Custom Stages path
       --email EMAIL         Email
       --prefix PREFIX       Prefix name for Queue Jobs name
       -s START, --start START
-                            Initial stage
+        Initial stage
       -e END, --end END     Final stage
-      --log                 Log to file
+      --log       Log to file
 
 ```
 
@@ -233,8 +237,8 @@ second one.
 
 Input, output and ped arguments are mandatory. If the output folder does not exist, the pipeline will create it.
 
-Examples
----------------------
+## Examples
+
 This is an example usage of NGSpipeline:
 
 We have a folder with the following structure in
@@ -249,8 +253,8 @@ We have a folder with the following structure in
         â”‚   â”śâ”€â”€ sample1_1.fq
         â”‚   â””â”€â”€ sample1_2.fq
         â””â”€â”€ sample2
-            â”śâ”€â”€ sample2_1.fq
-            â””â”€â”€ sample2_2.fq
+  â”śâ”€â”€ sample2_1.fq
+  â””â”€â”€ sample2_2.fq
 ```
 
 The ped file ( file.ped) contains the following info:
@@ -283,109 +287,106 @@ If we want to re-launch the pipeline from stage 4 until stage 20 we should use t
     $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped -s 4 -e 20 --project OPEN-0-0 --queue qprod
 ```
 
-Details on the pipeline
-------------------------------------
+## Details on the pipeline
+
+The pipeline calls the following tools
 
-The pipeline calls the following tools:
--   [fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), quality control tool for high throughput
-    sequence data.
--   [gatk](https://www.broadinstitute.org/gatk/), The Genome Analysis Toolkit or GATK is a software package developed at
+* [fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), quality control tool for high throughput sequence data.
+* [gatk](https://www.broadinstitute.org/gatk/), The Genome Analysis Toolkit or GATK is a software package developed at
     the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
--   [hpg-aligner](https://github.com/opencb-hpg/hpg-aligner), HPG Aligner has been designed to align short and long reads with high sensitivity, therefore any number of mismatches or indels are allowed. HPG Aligner implements and combines two well known algorithms: *Burrows-Wheeler Transform* (BWT) to speed-up mapping high-quality reads, and *Smith-Waterman*> (SW) to increase sensitivity when reads cannot be mapped using BWT.
--   [hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), a quality control tool for high throughput sequence data.
--   [hpg-variant](http://docs.bioinfo.cipf.es/projects/hpg-variant/wiki), The HPG Variant suite is an ambitious project aimed to provide a complete suite of tools to work with genomic variation data, from VCF tools to variant profiling or genomic statistics. It is being implemented using High Performance Computing technologies to provide the best performance possible.
--   [picard](http://picard.sourceforge.net/), Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (HTSJDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.
--   [samtools](http://samtools.sourceforge.net/samtools-c.shtml), SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
--   [snpEff](http://snpeff.sourceforge.net/), Genetic variant annotation and effect prediction toolbox.
-
-This listing show which tools are used in each step of the pipeline :
-
--   stage-00: fastqc
--   stage-01: hpg_fastq
--   stage-02: fastqc
--   stage-03: hpg_aligner and samtools
--   stage-04: samtools
--   stage-05: samtools
--   stage-06: fastqc
--   stage-07: picard
--   stage-08: fastqc
--   stage-09: picard
--   stage-10: gatk
--   stage-11: gatk
--   stage-12: gatk
--   stage-13: gatk
--   stage-14: gatk
--   stage-15: gatk
--   stage-16: samtools
--   stage-17: samtools
--   stage-18: fastqc
--   stage-19: gatk
--   stage-20: gatk
--   stage-21: gatk
--   stage-22: gatk
--   stage-23: gatk
--   stage-24: hpg-variant
--   stage-25: hpg-variant
--   stage-26: snpEff
--   stage-27: snpEff
--   stage-28: hpg-variant
-
-Interpretation
----------------------------
+* [hpg-aligner](https://github.com/opencb-hpg/hpg-aligner), HPG Aligner has been designed to align short and long reads with high sensitivity, therefore any number of mismatches or indels are allowed. HPG Aligner implements and combines two well known algorithms: *Burrows-Wheeler Transform* (BWT) to speed-up mapping high-quality reads, and *Smith-Waterman*> (SW) to increase sensitivity when reads cannot be mapped using BWT.
+* [hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), a quality control tool for high throughput sequence data.
+* [hpg-variant](http://docs.bioinfo.cipf.es/projects/hpg-variant/wiki), The HPG Variant suite is an ambitious project aimed to provide a complete suite of tools to work with genomic variation data, from VCF tools to variant profiling or genomic statistics. It is being implemented using High Performance Computing technologies to provide the best performance possible.
+* [picard](http://picard.sourceforge.net/), Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (HTSJDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.
+* [samtools](http://samtools.sourceforge.net/samtools-c.shtml), SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
+* [snpEff](http://snpeff.sourceforge.net/), Genetic variant annotation and effect prediction toolbox.
+
+This listing show which tools are used in each step of the pipeline
+
+* stage-00: fastqc
+* stage-01: hpg_fastq
+* stage-02: fastqc
+* stage-03: hpg_aligner and samtools
+* stage-04: samtools
+* stage-05: samtools
+* stage-06: fastqc
+* stage-07: picard
+* stage-08: fastqc
+* stage-09: picard
+* stage-10: gatk
+* stage-11: gatk
+* stage-12: gatk
+* stage-13: gatk
+* stage-14: gatk
+* stage-15: gatk
+* stage-16: samtools
+* stage-17: samtools
+* stage-18: fastqc
+* stage-19: gatk
+* stage-20: gatk
+* stage-21: gatk
+* stage-22: gatk
+* stage-23: gatk
+* stage-24: hpg-variant
+* stage-25: hpg-variant
+* stage-26: snpEff
+* stage-27: snpEff
+* stage-28: hpg-variant
+
+## Interpretation
 
 The output folder contains all the subfolders with the intermediate data. This folder contains the final VCF with all the variants. This file can be uploaded into [TEAM](diagnostic-component-team.html) by using the VCF file button. It is important to note here that the entire management of the VCF file is local: no patientâ€™s sequence data is sent over the Internet thus avoiding any problem of data privacy or confidentiality.
 
 ![TEAM upload panel. Once the file has been uploaded, a panel must be chosen from the Panel list. Then, pressing the Run button the diagnostic process starts.]((../../../img/fig7.png)
 
-**Figure 7**. *TEAM upload panel.* *Once the file has been uploaded, a panel must be chosen from the Panel* list. Then, pressing the Run button the diagnostic process starts.
+** Figure 7. ** *TEAM upload panel.* *Once the file has been uploaded, a panel must be chosen from the Panel* list. Then, pressing the Run button the diagnostic process starts.
 
-Once the file has been uploaded, a panel must be chosen from the Panel list. Then, pressing the Run button the diagnostic process starts. TEAM searches first for known diagnostic mutation(s) taken from four databases: HGMD-public (20), [HUMSAVAR](http://www.uniprot.org/docs/humsavar), ClinVar (29)^ and COSMIC (23).
+Once the file has been uploaded, a panel must be chosen from the Panel list. Then, pressing the Run button the diagnostic process starts. TEAM searches first for known diagnostic mutation(s) taken from four databases: HGMD-public (20), [HUMSAVAR](http://www.uniprot.org/docs/humsavar), ClinVar (29) and COSMIC (23).
 
 ![The panel manager. The elements used to define a panel are (A) disease terms, (B) diagnostic mutations and (C) genes. Arrows represent actions that can be taken in the panel manager. Panels can be defined by using the known mutations and genes of a particular disease. This can be done by dragging them to the Primary Diagnostic box (action D). This action, in addition to defining the diseases in the Primary Diagnostic box, automatically adds the corresponding genes to the Genes box. The panels can be customized by adding new genes (action F) or removing undesired genes (action G). New disease mutations can be added independently or associated to an already existing disease term (action E). Disease terms can be removed by simply dragging themback (action H).](../../../img/fig7x.png)
 
-**Figure 7.** *The panel manager. The elements used to define a panel are (**A**) disease terms, (**B**) diagnostic mutations and (**C**) genes. Arrows represent actions that can be taken in the panel manager. Panels can be defined by using the known mutations and genes of a particular disease. This can be done by dragging them to the **Primary Diagnostic** box (action **D**). This action, in addition to defining the diseases in the **Primary Diagnostic** box, automatically adds the corresponding genes to the **Genes** box. The panels can be customized by adding new genes (action **F**) or removing undesired genes (action **G**). New disease mutations can be added independently or associated to an already existing disease term (action **E**). Disease terms can be removed by simply dragging them back (action **H**).*
+** Figure 7. ** The panel manager. The elements used to define a panel are (** A **) disease terms, (** B **) diagnostic mutations and (** C **) genes. Arrows represent actions that can be taken in the panel manager. Panels can be defined by using the known mutations and genes of a particular disease. This can be done by dragging them to the ** Primary Diagnostic ** box (action ** D **). This action, in addition to defining the diseases in the ** Primary Diagnostic ** box, automatically adds the corresponding genes to the ** Genes ** box. The panels can be customized by adding new genes (action ** F **) or removing undesired genes (action **G**). New disease mutations can be added independently or associated to an already existing disease term (action ** E **). Disease terms can be removed by simply dragging them back (action ** H **).
 
 For variant discovering/filtering we should upload the VCF file into BierApp by using the following form:
 
 *![BierApp VCF upload panel. It is recommended to choose a name for the job as well as a description.](../../../img/fig8.png)*
 
-**Figure 8.** *BierApp VCF upload panel. It is recommended to choose a name for the job as well as a description.**
+** Figure 8 **. *BierApp VCF upload panel. It is recommended to choose a name for the job as well as a description **.
 
 Each prioritization (â€jobâ€™) has three associated screens that facilitate the filtering steps. The first one, the â€Summaryâ€™ tab, displays a statistic of the data set analyzed, containing the samples analyzed, the number and types of variants found and its distribution according to consequence types. The second screen, in the â€Variants and effectâ€™ tab, is the actual filtering tool, and the third one, the â€Genome viewâ€™ tab, offers a representation of the selected variants within the genomic context provided by an embedded version of the Genome Maps Tool (30).
 
-![This picture shows all the information associated to the variants. If a variant has an associated phenotype we could see it in the last column. In this case, the variant 7:132481242 C&gt;T is associated to the phenotype: large intestine tumor.](../../../img/fig9.png)
-
-**Figure 9.** This picture shows all the information associated to the variants. If a variant has an associated phenotype we could see it in the last column. In this case, the variant 7:132481242 C&gt;T is associated to the phenotype: large intestine tumor.
-
-References
------------------------
-
-1.  Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth5, Goncalo Abecasis6, Richard Durbin and 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25: 2078-2079.
-2.  McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.   *Genome Res* >2010, 20:1297-1303.
-3.  Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean, Richard Durbin, and 1000 Genomes Project Analysis Group. The variant call format and VCFtools. Bioinformatics 2011, 27: 2156-2158.
-4.  Medina I, De Maria A, Bleda M, Salavert F, Alonso R, Gonzalez CY, Dopazo J: VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing. Nucleic Acids Res 2012, 40:W54-58.
-5.  Bleda M, Tarraga J, de Maria A, Salavert F, Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I: CellBase, a  comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res 2012, 40:W609-614.
-6.  Flicek,P., Amode,M.R., Barrell,D., Beal,K., Brent,S., Carvalho-Silva,D., Clapham,P., Coates,G., Fairley,S., Fitzgerald,S. et al. (2012) Ensembl 2012. Nucleic Acids Res., 40, D84â€“D90.
-7.  UniProt Consortium. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic   Acids Res., 40, D71â€“D75.
-8.  Kozomara,A. and Griffiths-Jones,S. (2011) miRBase: integrating microRNA annotation and deep-sequencing data.    Nucleic Acids Res., 39, D152â€“D157.
-9.  Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. and Li,T. (2009) miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res., 37, D105â€“D110.
-10. Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., Huang,W.C., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. et al. (2011) miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res., 39, D163â€“D169.
-11. Friedman,R.C., Farh,K.K., Burge,C.B. and Bartel,D.P. (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res., 19, 92â€“105. 12. Betel,D., Wilson,M., Gabow,A., Marks,D.S. and Sander,C. (2008) The microRNA.org resource: targets and expression. Nucleic Acids Res., 36, D149â€“D153.
-13. Dreszer,T.R., Karolchik,D., Zweig,A.S., Hinrichs,A.S., Raney,B.J., Kuhn,R.M., Meyer,L.R., Wong,M., Sloan,C.A., Rosenbloom,K.R. et al. (2012) The UCSC genome browser database: extensions and updates 2011. Nucleic Acids Res.,40, D918â€“D923.
-14. Smith,B., Ashburner,M., Rosse,C., Bard,J., Bug,W., Ceusters,W., Goldberg,L.J., Eilbeck,K., Ireland,A., Mungall,C.J. et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat.    Biotechnol., 25, 1251â€“1255.
-15.  Hunter,S., Jones,P., Mitchell,A., Apweiler,R., Attwood,T.K.,Bateman,A., Bernard,T., Binns,D., Bork,P., Burge,S. et al. (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res.,40, D306â€“D312.
-16.  Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308â€“311.
-17.  Altshuler,D.M., Gibbs,R.A., Peltonen,L., Dermitzakis,E., Schaffner,S.F., Yu,F., Bonnen,P.E., de Bakker,P.I.,  Deloukas,P., Gabriel,S.B. et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature, 467, 52â€“58.
-18.  1000 Genomes Project Consortium. (2010) A map of human genome variation from population-scale sequencing. Nature,    467, 1061â€“1073.
-19.  Hindorff,L.A., Sethupathy,P., Junkins,H.A., Ramos,E.M., Mehta,J.P., Collins,F.S. and Manolio,T.A. (2009)   Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad.    Sci. USA, 106, 9362â€“9367.
-20.  Stenson,P.D., Ball,E.V., Mort,M., Phillips,A.D., Shiel,J.A., Thomas,N.S., Abeysinghe,S., Krawczak,M.   and Cooper,D.N. (2003) Human gene mutation database (HGMD):    2003 update. Hum. Mutat., 21, 577â€“581.
-21.  Johnson,A.D. and Oâ€™Donnell,C.J. (2009) An open access database of genome-wide association results. BMC Med.    Genet, 10, 6.
-22.  McKusick,V. (1998) A Catalog of Human Genes  and Genetic  Disorders, 12th edn. John Hopkins University    Press,Baltimore, MD.
-23.  Forbes,S.A., Bindal,N., Bamford,S., Cole,C.,    Kok,C.Y., Beare,D., Jia,M., Shepherd,R., Leung,K., Menzies,A. et al.    (2011) COSMIC: mining complete cancer genomes in the catalogue of    somatic mutations in cancer. Nucleic Acids Res.,    39, D945â€“D950.
-24.  Kerrien,S., Aranda,B., Breuza,L., Bridge,A.,    Broackes-Carter,F., Chen,C., Duesbury,M., Dumousseau,M.,    Feuermann,M., Hinz,U. et al. (2012) The Intact molecular interaction    database in 2012. Nucleic Acids Res., 40, D841â€“D846.
-25.  Croft,D., Oâ€™Kelly,G., Wu,G., Haw,R.,    Gillespie,M., Matthews,L., Caudy,M., Garapati,P.,    Gopinath,G., Jassal,B. et al. (2011) Reactome: a database of    reactions, pathways and biological processes. Nucleic Acids Res.,    39, D691â€“D697.
-26.  Demir,E., Cary,M.P., Paley,S., Fukuda,K.,    Lemer,C., Vastrik,I.,Wu,G., Dâ€™Eustachio,P., Schaefer,C., Luciano,J.    et al. (2010) The BioPAX community standard for pathway    data sharing. Nature Biotechnol., 28, 935â€“942.
-27.  AlemĂˇn Z, GarcĂa-GarcĂa F, Medina I, Dopazo J    (2014): A web tool for the design and management of panels of genes    for targeted enrichment and massive sequencing for    clinical applications. Nucleic Acids Res 42: W83-7.
-28.  [AlemĂˇn    A](http://www.ncbi.nlm.nih.gov/pubmed?term=Alem%C3%A1n%20A%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Garcia-Garcia    F](http://www.ncbi.nlm.nih.gov/pubmed?term=Garcia-Garcia%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Salavert    F](http://www.ncbi.nlm.nih.gov/pubmed?term=Salavert%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Medina    I](http://www.ncbi.nlm.nih.gov/pubmed?term=Medina%20I%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Dopazo    J](http://www.ncbi.nlm.nih.gov/pubmed?term=Dopazo%20J%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)> (2014).    A web-based interactive framework to assist in the prioritization of    disease candidate genes in whole-exome sequencing studies.    [Nucleic    Acids Res.](http://www.ncbi.nlm.nih.gov/pubmed/?term=BiERapp "Nucleic acids research.")>42 :W88-93.
-29.  Landrum,M.J., Lee,J.M., Riley,G.R., Jang,W.,    Rubinstein,W.S., Church,D.M. and Maglott,D.R. (2014) ClinVar: public    archive of relationships among sequence variation and    human phenotype. Nucleic Acids Res., 42, D980â€“D985.
-30.  Medina I, Salavert F, Sanchez R, de Maria A,    Alonso R, Escobar P, Bleda M, Dopazo J: Genome Maps, a new    generation genome browser. Nucleic Acids Res 2013, 41:W41-46.
+![This picture shows all the information associated to the variants. If a variant has an associated phenotype we could see it in the last column. In this case, the variant 7:132481242 CT is associated to the phenotype: large intestine tumor.](../../../img/fig9.png)
+
+** Figure 9 **. This picture shows all the information associated to the variants. If a variant has an associated phenotype we could see it in the last column. In this case, the variant 7:132481242 CT is associated to the phenotype: large intestine tumor.
+
+## References
+
+1. Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth5, Goncalo Abecasis6, Richard Durbin and 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25: 2078-2079.
+1. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. *Genome Res* >2010, 20:1297-1303.
+1. Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean, Richard Durbin, and 1000 Genomes Project Analysis Group. The variant call format and VCFtools. Bioinformatics 2011, 27: 2156-2158.
+1. Medina I, De Maria A, Bleda M, Salavert F, Alonso R, Gonzalez CY, Dopazo J: VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing. Nucleic Acids Res 2012, 40:W54-58.
+1. Bleda M, Tarraga J, de Maria A, Salavert F, Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I: CellBase, a  comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res 2012, 40:W609-614.
+1. Flicek,P., Amode,M.R., Barrell,D., Beal,K., Brent,S., Carvalho-Silva,D., Clapham,P., Coates,G., Fairley,S., Fitzgerald,S. et al. (2012) Ensembl 2012. Nucleic Acids Res., 40, D84â€“D90.
+1. UniProt Consortium. (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic   Acids Res., 40, D71â€“D75.
+1. Kozomara,A. and Griffiths-Jones,S. (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res., 39, D152â€“D157.
+1. Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. and Li,T. (2009) miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res., 37, D105â€“D110.
+1. Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., Huang,W.C., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. et al. (2011) miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res., 39, D163â€“D169.
+1. Friedman,R.C., Farh,K.K., Burge,C.B. and Bartel,D.P. (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res., 19, 92â€“105. 12. Betel,D., Wilson,M., Gabow,A., Marks,D.S. and Sander,C. (2008) The microRNA.org resource: targets and expression. Nucleic Acids Res., 36, D149â€“D153.
+1. Dreszer,T.R., Karolchik,D., Zweig,A.S., Hinrichs,A.S., Raney,B.J., Kuhn,R.M., Meyer,L.R., Wong,M., Sloan,C.A., Rosenbloom,K.R. et al. (2012) The UCSC genome browser database: extensions and updates 2011. Nucleic Acids Res.,40, D918â€“D923.
+1. Smith,B., Ashburner,M., Rosse,C., Bard,J., Bug,W., Ceusters,W., Goldberg,L.J., Eilbeck,K., Ireland,A., Mungall,C.J. et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol., 25, 1251â€“1255.
+1. Hunter,S., Jones,P., Mitchell,A., Apweiler,R., Attwood,T.K.,Bateman,A., Bernard,T., Binns,D., Bork,P., Burge,S. et al. (2012) InterPro in 2011: new developments in the family and domain prediction  database. Nucleic Acids Res.,40, D306â€“D312.
+1. Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308â€“311.
+1. Altshuler,D.M., Gibbs,R.A., Peltonen,L., Dermitzakis,E., Schaffner,S.F., Yu,F., Bonnen,P.E., de Bakker,P.I.,  Deloukas,P., Gabriel,S.B. et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature, 467, 52â€“58.
+1. 1000 Genomes Project Consortium. (2010) A map of human genome variation from population-scale sequencing. Nature, 467, 1061â€“1073.
+1. Hindorff,L.A., Sethupathy,P., Junkins,H.A., Ramos,E.M., Mehta,J.P., Collins,F.S. and Manolio,T.A. (2009)   Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA, 106, 9362â€“9367.
+1. Stenson,P.D., Ball,E.V., Mort,M., Phillips,A.D., Shiel,J.A., Thomas,N.S., Abeysinghe,S., Krawczak,M. and Cooper,D.N. (2003) Human gene mutation database (HGMD): 2003 update. Hum. Mutat., 21, 577â€“581.
+1. Johnson,A.D. and Oâ€™Donnell,C.J. (2009) An open access database of genome-wide association results. BMC Med. Genet, 10, 6.
+1. McKusick,V. (1998) A Catalog of Human Genes and Genetic Disorders, 12th edn. John Hopkins University Press,Baltimore, MD.
+1. Forbes,S.A., Bindal,N., Bamford,S., Cole,C., Kok,C.Y., Beare,D., Jia,M., Shepherd,R., Leung,K., Menzies,A. et al. (2011) COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res., 39, D945â€“D950.
+1. Kerrien,S., Aranda,B., Breuza,L., Bridge,A., Broackes-Carter,F., Chen,C., Duesbury,M., Dumousseau,M., Feuermann,M., Hinz,U. et al. (2012) The Intact molecular interaction database in 2012. Nucleic Acids Res., 40, D841â€“D846.
+1. Croft,D., Oâ€™Kelly,G., Wu,G., Haw,R., Gillespie,M., Matthews,L., Caudy,M., Garapati,P., Gopinath,G., Jassal,B. et al. (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res.,    39, D691â€“D697.
+1. Demir,E., Cary,M.P., Paley,S., Fukuda,K., Lemer,C., Vastrik,I.,Wu,G., Dâ€™Eustachio,P., Schaefer,C., Luciano,J. et al. (2010) The BioPAX community standard for pathway data sharing. Nature Biotechnol., 28, 935â€“942.
+1. AlemĂˇn Z, GarcĂa-GarcĂa F, Medina I, Dopazo J (2014): A web tool for the design and management of panels of genes for targeted enrichment and massive sequencing for clinical applications. Nucleic Acids Res 42: W83-7.
+1. [AlemĂˇn A](http://www.ncbi.nlm.nih.gov/pubmed?term=Alem%C3%A1n%20A%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Garcia-Garcia F](http://www.ncbi.nlm.nih.gov/pubmed?term=Garcia-Garcia%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Salavert F](http://www.ncbi.nlm.nih.gov/pubmed?term=Salavert%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Medina I](http://www.ncbi.nlm.nih.gov/pubmed?term=Medina%20I%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Dopazo J](http://www.ncbi.nlm.nih.gov/pubmed?term=Dopazo%20J%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)> (2014). A web-based interactive framework to assist in the prioritization of disease candidate genes in whole-exome sequencing studies. [Nucleic Acids Res.](http://www.ncbi.nlm.nih.gov/pubmed/?term=BiERapp "Nucleic acids research.")>42 :W88-93.
+1. Landrum,M.J., Lee,J.M., Riley,G.R., Jang,W., Rubinstein,W.S., Church,D.M. and Maglott,D.R. (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res., 42, D980â€“D985.
+1. Medina I, Salavert F, Sanchez R, de Maria A, Alonso R, Escobar P, Bleda M, Dopazo J: Genome Maps, a new generation genome browser. Nucleic Acids Res 2013, 41:W41-46.
diff --git a/docs.it4i/anselm-cluster-documentation/software/omics-master/priorization-component-bierapp.md b/docs.it4i/anselm-cluster-documentation/software/omics-master/priorization-component-bierapp.md
index 3c2c24cd0..53f94e012 100644
--- a/docs.it4i/anselm-cluster-documentation/software/omics-master/priorization-component-bierapp.md
+++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/priorization-component-bierapp.md
@@ -1,21 +1,20 @@
-Prioritization component (BiERapp)
-================================
+# Prioritization component (BiERapp)
 
-### Access
+## Access
 
 BiERapp is available at the [following address](http://omics.it4i.cz/bierapp/)
 
-!!! Note "Note"
-	The address is accessible onlyvia VPN.
+!!! note
+    The address is accessible only via VPN.
 
-###BiERapp
+## BiERapp
 
-**This tool is aimed to discover new disease genes or variants by studying affected families or cases and controls. It carries out a filtering process to sequentially remove: (i) variants which are not no compatible with the disease because are not expected to have impact on the protein function; (ii) variants that exist at frequencies incompatible with the disease; (iii) variants that do not segregate with the disease. The result is a reduced set of disease gene candidates that should be further validated experimentally.**
+** This tool is aimed to discover new disease genes or variants by studying affected families or cases and controls. It carries out a filtering process to sequentially remove: (i) variants which are not no compatible with the disease because are not expected to have impact on the protein function; (ii) variants that exist at frequencies incompatible with the disease; (iii) variants that do not segregate with the disease. The result is a reduced set of disease gene candidates that should be further validated experimentally. **
 
 BiERapp (28) efficiently helps in the identification of causative variants in family and sporadic genetic diseases. The program reads lists of predicted variants (nucleotide substitutions and indels) in affected individuals or tumor samples and controls. In family studies, different modes of inheritance can easily be defined to filter out variants that do not segregate with the disease along the family. Moreover, BiERapp integrates additional information such as allelic frequencies in the general population and the most popular damaging scores to further narrow down the number of putative variants in successive filtering steps. BiERapp provides an interactive and user-friendly interface that implements the filtering strategy used in the context of a large-scale genomic project carried out by the Spanish Network for Research, in Rare Diseases (CIBERER) and the Medical Genome Project. in which more than 800 exomes have been analyzed.
 
 ![Web interface to the prioritization tool. This figure shows the interface of the web tool for candidate gene prioritization with the filters available. The tool includes a genomic viewer (Genome Maps 30) that enables the representation of the variants in the corresponding genomic coordinates.](../../../img/fig6.png)
 
-**Figure 6**. Web interface to the prioritization tool. This figure shows the interface of the web tool for candidate gene
+** Figure 6 **. Web interface to the prioritization tool. This figure shows the interface of the web tool for candidate gene
 prioritization with the filters available. The tool includes a genomic viewer (Genome Maps 30) that enables the representation of the variants in the corresponding genomic coordinates.
 
-- 
GitLab