test

dc9950c2 · Lukáš Krupčík · ba4a6d25 · dc9950c2
Commit dc9950c2 authored 8 years ago by Lukáš Krupčík
--- a/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md
+++ b/docs.it4i/anselm-cluster-documentation/software/omics-master/overview.md
@@ -85,7 +85,7 @@ The standard CIGAR description of pairwise alignment defines three operations: 
 Figure 3 . SAM format file. The ‘@SQ’ line in the header section gives the order of reference sequences. Notably, r001 is the name of a read pair. According to FLAG 163 (=1+2+32+128), the read mapped to position 7 is the second read in the pair (128) and regarded as properly paired (1 + 2); its mate is mapped to 37 on the reverse strand (32). Read r002 has three soft-clipped (unaligned) bases. The coordinate shown in SAM is the position of the first aligned base. The CIGAR string for this alignment contains a P (padding) operation which correctly aligns the inserted sequences. Padding operations can be absent when an aligner does not support multiple sequence alignment. The last six bases of read r003 map to position 9, and the first five to position 29 on the reverse strand. The hard clipping operation H indicates that the clipped sequence is not present in the sequence field. The NM tag gives the number of mismatches. Read r004 is aligned across an intron, indicated by the N operation.
- Binary Alignment/Map (BAM) 
+##### Binary Alignment/Map (BAM) 
 BAM is the binary representation of SAM and keeps exactly the same information as SAM. BAM uses lossless compression to reduce the size of the data by about 75% and provides an indexing system that allows reads that overlap a region of the genome to be retrieved and rapidly traversed.
@@ -145,24 +145,24 @@ VARIANT (VARIant Analysis Tool) (4) reports information on the variants found th
 CellBase(5) is a relational database integrates biological information from different sources and includes:
- Core features: 
+Core features: 
 We took genome sequences, genes, transcripts, exons, cytobands or cross references (xrefs) identifiers (IDs) from Ensembl (6). Protein information including sequences, xrefs or protein features (natural variants, mutagenesis sites, post-translational modifications, etc.) were imported from UniProt (7).
- Regulatory: 
+Regulatory: 
 CellBase imports miRNA from miRBase (8); curated and non-curated miRNA targets from miRecords (9), miRTarBase (10),
 TargetScan(11) and microRNA.org (12) and CpG islands and conserved regions from the UCSC database (13).
- Functional annotation 
+Functional annotation 
 OBO Foundry (14) develops many biomedical ontologies that are implemented in OBO format. We designed a SQL schema to store these OBO ontologies and 30 ontologies were imported. OBO ontology term annotations were taken from Ensembl (6). InterPro (15) annotations were also imported.
- Variation 
+Variation 
 CellBase includes SNPs from dbSNP (16)^; SNP population frequencies from HapMap (17), 1000 genomes project (18) and Ensembl (6); phenotypically annotated SNPs were imported from NHRI GWAS Catalog (19),HGMD (20), Open Access GWAS Database (21), UniProt (7) and OMIM (22); mutations from COSMIC (23) and structural variations from Ensembl (6).
- Systems biology 
+Systems biology 
 We also import systems biology information like interactome information from IntAct (24). Reactome (25) stores pathway and interaction information in BioPAX (26) format. BioPAX data exchange format enables the integration of diverse pathway
 resources. We successfully solved the problem of storing data released in BioPAX format into a SQL relational schema, which allowed us importing Reactome in CellBase.