diff --git a/czi-format/benchmark_results/benchmark_data.sqlite b/czi-format/benchmark_results/benchmark_data.sqlite new file mode 100644 index 0000000000000000000000000000000000000000..adbf19c2e1ff9d857e2d4027a6d0e6e6f0650869 Binary files /dev/null and b/czi-format/benchmark_results/benchmark_data.sqlite differ diff --git a/czi-format/benchmark_results/results.xlsx b/czi-format/benchmark_results/results.xlsx index 7b2aa545f136e6514cca24513055857dd71cf576..4e35e6c5518a8a3c140470adea08adca4040eb35 100644 Binary files a/czi-format/benchmark_results/results.xlsx and b/czi-format/benchmark_results/results.xlsx differ diff --git a/czi-format/czi-parser/compression/documentation.txt b/czi-format/czi-parser/compression/documentation.txt new file mode 100644 index 0000000000000000000000000000000000000000..a70c51a63089912af91c2e4248dd822227dbe1df --- /dev/null +++ b/czi-format/czi-parser/compression/documentation.txt @@ -0,0 +1,6 @@ + +gzip (zlib) compress2 + http://refspecs.linuxbase.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/zlib-compress2-1.html +lzma + +bzip2 diff --git a/document/data/artemia_bzip2_level.dat b/document/data/artemia_bzip2_level.dat new file mode 100644 index 0000000000000000000000000000000000000000..c6bda19d462ec474376dddfa559352ea39dcbdc4 --- /dev/null +++ b/document/data/artemia_bzip2_level.dat @@ -0,0 +1,10 @@ +x f(x) +1 3.58176 +2 3.60657 +3 3.61728 +4 3.62128 +5 3.62762 +6 3.62970 +7 3.63206 +8 3.63472 +9 3.63400 \ No newline at end of file diff --git a/document/data/artemia_comp_level.txt b/document/data/artemia_comp_level.txt new file mode 100644 index 0000000000000000000000000000000000000000..0e4bc50dd3b00e34d49844b08bfff2a124be273c --- /dev/null +++ b/document/data/artemia_comp_level.txt @@ -0,0 +1,10 @@ +level gzip lzma bzip2 +1 2.53352 3.06497 3.58176 +2 2.54137 3.09320 3.60657 +3 2.66279 3.11559 3.61728 +4 2.66761 3.35998 3.62128 +5 2.61082 3.46081 3.62762 +6 2.67796 3.46118 3.62970 +7 2.69937 3.46118 3.63206 +8 2.69891 3.46118 3.63472 +9 2.71106 3.46118 3.63400 \ No newline at end of file diff --git a/document/data/artemia_gzip_level.dat b/document/data/artemia_gzip_level.dat new file mode 100644 index 0000000000000000000000000000000000000000..1f00c59847f7bc80d36af36fea298b221c56f6e3 --- /dev/null +++ b/document/data/artemia_gzip_level.dat @@ -0,0 +1,10 @@ +x f(x) +1 2.53352 +2 2.54137 +3 2.66279 +4 2.66761 +5 2.61082 +6 2.67796 +7 2.69937 +8 2.69891 +9 2.71106 \ No newline at end of file diff --git a/document/data/artemia_lzma_level.dat b/document/data/artemia_lzma_level.dat new file mode 100644 index 0000000000000000000000000000000000000000..f6392d3a0e8e3ae2079c754f3360d588fcfbf0a2 --- /dev/null +++ b/document/data/artemia_lzma_level.dat @@ -0,0 +1,10 @@ +x f(x) +1 3.06497 +2 3.09320 +3 3.11559 +4 3.35998 +5 3.46081 +6 3.46118 +7 3.46118 +8 3.46118 +9 3.46118 \ No newline at end of file diff --git a/document/document.pdf b/document/document.pdf index d7c7f0c4d18cb806dca470222bcb0c8b1814cf3a..46ade2ce10b949af0f9c6e15fd703920d0dfcaed 100644 Binary files a/document/document.pdf and b/document/document.pdf differ diff --git a/document/document.tex b/document/document.tex index c6977c417a04f9fae53b8b663e6558fd1d8b93f0..0e23388e6bf876310b835cdfa17a5cb76473f62a 100644 --- a/document/document.tex +++ b/document/document.tex @@ -11,6 +11,8 @@ \usepackage{amsmath} \usepackage{amssymb} \usepackage{graphicx} +\usepackage{tikz} +\usepackage{pgfplots} \usepackage{dirtytalk} \usepackage{siunitx} @@ -21,6 +23,16 @@ \newcommand{\image}[4]{\begin{figure}[h!] \centering \includegraphics[width=#1\linewidth]{figures/#2} \caption{#4} \label{#3} \end{figure}} \newcommand{\bThreed}{B$^3$D } +\pgfplotsset{ + compat=1.5, + width=7cm, + /pgfplots/ybar legend/.style={ + /pgfplots/legend image code/.code={% + \draw[##1,/tikz/.cd,yshift=-0.25em] + (0cm,0cm) rectangle (3pt,0.8em);}, + } +} + \author{Moravec VojtÄ›ch} \title{Metody komprese bioinformatickĂ˝ch dat pro pĹ™enos na HPC infrastrukturu} \date{2018/2019} @@ -115,6 +127,8 @@ Nejprve uvedeme 3 metody, kterĂ© se pouĹľĂvajĂ pro bezztrátovou kompresi dat pro bezztrátovou kompresi. NáslednÄ› popĂšeme kompresi \bThreed, která se pĹ™Ămo zaměřuje na kompresi obrazĹŻ, zĂskanĂ˝ch z mikroskopĹŻ. +\emph{TODO: Z ORDER} + \subsection{StandartnĂ metody} StandartnĂ metody, kterĂ© zde uvedeme, jsou hojnÄ› vyuĹľĂvány v nejrozšĂĹ™eněšjšĂch programech zabĂ˝vajĂcĂ se bezztrátovou kompresĂ. Tyto metody jsou navrĹľeny tak, @@ -130,7 +144,80 @@ LZMA algoritmus kombinuje vĂce druhĹŻ algoritmĹŻ, LZ77 \cite{LZ77}, aritmetick TĹ™etĂ algoritmem je bzip2, stejnÄ› jako LZMA vyuĹľĂvá vĂce metod, Run-Length kĂłdovánĂ, Huffmanovo kĂłdovánĂ a Block-Sorting kompresi \cite{block_sorting}. -Z tÄ›chto třà uvedenĂ˝ch, slibuje LZMA nejvÄ›tšà kompresnĂ pomÄ›r. +V následujĂcĂch grafech porovnáme uvedenĂ© 3 metody. Všechny 3 algoritmy dovolujĂ nastavit urÄŤitou ĂşroveĹ komprese, obecnÄ› platĂ, Ĺľe vetšà úroveĹ znamená +vyššà kompresnĂ pomÄ›r, ale takĂ© vÄ›tšà paměťovĂ© a ÄŤasovĂ© nároky. NĂ© vĹľdy je tedy nejlepšĂm Ĺ™ešenĂm pouĹľĂt maximálnĂ ĂşroveĹ, obvykle 9. +NásledujĂcĂ testy byly provedeny nad CZI souborem, kterĂ˝ obsahoval 39 Ĺ™ezĹŻ ĹľabronoĹľky, Ĺ™ez je uloĹľen v obrazu $1388 \times 1040$ pixelĹŻ, kde typ pixelu je Gray16, +tedy 2 byty pro kaĹľdĂ˝ pixel, rozmezĂ hodnot $0 - 65535$. + +\begin{figure}[h!] + \centering + % \pgfplotsset{scaled x ticks=false} + \begin{tikzpicture} + \begin{axis}[ + width=0.8\linewidth, + xlabel = {ĂšroveĹ komprese}, + ylabel = {KompresnĂ pomÄ›r}, + domain=1:9, + legend entries = {gzip, lzma, bzip2}, + legend pos = outer north east, + ymin=2, ymax=4, + ] + \addplot[red, thick] table{data/artemia_gzip_level.dat}; + \addplot[blue, thick] table{data/artemia_lzma_level.dat}; + \addplot[green, thick] table{data/artemia_bzip2_level.dat}; + \end{axis} + \end{tikzpicture} + \caption{PrĹŻmÄ›rnĂ˝ kompresnĂ pomÄ›r pro jednotlivĂ© ĂşrovnÄ›} + \label{fig:comp_level_comp} +\end{figure} + +Na Obrázku \ref{fig:comp_level_comp} vidĂme, Ĺľe pro metody gzip a bzip2 nehraje kompresnĂ ĂşroveĹ velkou roli a naopak lzma algoritmus vydává lepšà +vĂ˝sledky od ĂşrovnÄ› 5. TakĂ© vidĂme, Ĺľe pro kompresi danĂ˝ch snĂmkĹŻ ĹľabronoĹľky je nejlepšà metoda gzip. UvedenĂ© kompresnĂ pomÄ›ry jsou prĹŻmÄ›rem pĹ™es +všechn 39 Ĺ™ezĹŻ. Na následujĂcĂm Obrázku \ref{fig:more_files_comp} si porovnáme algoritmy na vĂce souborech, uvedenĂ© kompresnĂ pomÄ›ry jsou prĹŻmÄ›rem pĹ™es +všechny snĂmky v souboru a kompresnĂ ĂşroveĹ 6. + +\begin{figure}[h!] + \centering + \begin{tikzpicture} + \begin{axis}[ + width=0.8\linewidth, + height=0.55\linewidth, + xbar, + enlargelimits=0.25, + ytick = data, + xlabel={KompresnĂ pomÄ›r}, + legend pos = outer north east, + symbolic y coords={{Artemia},{Artemia Flash}, {LLC Emerald}, {16 Bit Z Stack}}, + nodes near coords, + ] + % gzip + \addplot coordinates { + (2.67797,{Artemia}) + (1.84856,{Artemia Flash}) + (1.40943,{LLC Emerald}) + (1.47818,{16 Bit Z Stack}) + }; + % lzma + \addplot coordinates { + (3.46118,{Artemia}) + (2.3531,{Artemia Flash}) + (1.64554,{LLC Emerald}) + (1.80586,{16 Bit Z Stack}) + }; + % bzip2 + \addplot coordinates { + (3.62971,{Artemia}) + (2.47042,{Artemia Flash}) + (1.72,{LLC Emerald}) + (1.90709,{16 Bit Z Stack}) + }; + \legend{gzip,lzma,bzip2} + \end{axis} + \end{tikzpicture} + \caption{SrovnánĂ kompresĂ mezi vĂce soubory, ĂşroveĹ komprese 6} + \label{fig:more_files_comp} +\end{figure} + \subsection{Komprese obrazĹŻ z mikroskopu, knihovna \bThreed} Tato sekce vycházĂ z \cite{Balazs164624}, taktĂ©Ĺľ grafy jsou pĹ™ebrány z tĂ©to práce.