Vojtech Moravec
data_project

Repository



Data compression project

Thesis

Reread first paper and start transformation to thesis with template
BigDataViewer
Vizualizace
Look into scalar quantization

Use evolution algorithm to find quantization values, with correct distribution
Find a way to generate good quantization sequence


Look into vector quantization

Start with simple then maybe try the adaptive one
Evolution algorithms to find codebook:

Codebook Design For Vector Quantization Using Genetic Algorithm
Vector Quantization using the Improved Differential Evolution Algorithm for Image Compression


Vector quantization using the firefly algorithm for image compression


SVD approximation

Try this.


Wavelet technique (what is it?)
Better error measurement for lossy compression, what is still acceptable


Phase 2 TODOs - Completed


 Rename consume_ functions in InBinaryStram to more proper names, like consume_int8

 Implement OutBinaryStreamBase, OutBinaryFileStream


 Implement serialization for Czi parts and move to this form of parsing / writing - WIP. *Actual goals are as follows: *


 Finish implementation of DimensionEntryDV1

 Finish implementation of DirectoryEntryDV

 Finish implementation of CziSubblockSegment

 Finish implementation of CziSubblockDirectorySegment

 Propert strategy to set variables in CziFile_


 Finish writing


 Static parse settings in Serializable class

 Callback functions to load pixel data

 Multi-file support, specification p. 9 (problem being - we don't have multi-file czi files.)


Phase 1 TODOs - Completed

TODO LIST


 Tweak Z-Order reordering

Values, which are stored in more than one byte, must stay together.
Pixels, which contains more channels, can be separeted in final z order. (Try both separeted and not-separated)

Right now, when we reorder bytes we move whole pixel (all channels), but when pixel's channels will be separated we will move just one channel.


 Feed binary data to existing compressors and produce results (tables, graphs)
Try both data in their order and in Z-order

Compressors to try:


 gzip (zLib, huffman deflate combination)

 bzip2

 LZMA (7-zip)

 Add level CLI option to specify compression level for above algorithms

 Add CSV writer, which will write test results to file and test function taking folder with test files to parse.


 Describe what is compression ratio, how are we calculating it.

 Write about Z-Curve order, plot Z-Curve comparsion with normal order.

 Plot/into tables, BPP (Bits Per Pixel), Compression throughput (MB/s).

 Plot how are different frames compressed, x axis -> frame, y axis -> compression ratio

 Look at Image difference (Negative values can be mapped to odd/even numbers. But the difference must be saved in more than one byte.)

Things we have tested so far: short-short=int, mapping int to ushort, results are not better

 Try Z-Order on ushort mapped ints.
Try raw byte difference, find if it can be mapped to 1 or 2 bytes

 DeltaCompression

bsdiff, bspatch

iniat test shows quite poor results of compression ratios below 1.0. We should check some different library implementation of bsdiff MAYBE?.


 Histogram of difference values

 Benchamrk difference with NegativeToEven mapping.

 Benchmark B3D cuda library - it works finally.

 Look at patterns in images

 Try move pixels around and evaluate by quadratic error (pow 2)


Singularity on Anselm

To access nvidia nodes we need qnvidia access.
We cann't run sudo commands inside prepared singularity images Options:

We have to prepare our own singularity image
We have to use docker inside singularity


CZI parser TODO list
This is list of things, which have to be done first:


 Parse SubBlockDirectory (there will be collection of DirectoryEntryDV's)


 Exact copy of DirectoryEntryDV will be located in the referenced SubBlock


 Parse dimensions entries


 Parse IEEE 4 / 8 byte float.

Parsing is done via memcpy call. Other alternative is using union, but double parsing wass't working with it. Later we can take a look on this and maybe improve the conversion and get rid of the copy.


 Parse SubBlock


 Parse image data to proper pixel type. Do we really want to do that? We really care only about bytes. Parsing image data into some PixelType Matrix may be good, only if we want to do some operation on the image. This is moved to later section then.


 Parse / extract image data from SubBlock

We are aware of position and size of the data in CZI file. With that two informations we can extract the data easily.


 Parse image values into matrices. Matrix<Gray8> etc.

This is outside the scope of this project. Main goal is to try and compare different compressions on image data. Right now we are able to obtain image data and more, so it is enaugh for now.

Parse important informations from XML metadata (e.g. BitsPerPixel and compare with value from parsed binary data)

Support multi-file situations

Obtain multi-file CZI files. Currently we don't have any multi-files, so we can't really parse them. Secondary files should have different GUID than master file, also filepart should be different from 0. Files we have right now have (0) in their name, but theirs filepart is 0 and GUID of master file isn't set correctly.
One master file and more secondary files files (our current CziFile class kinda support that situations, so keep going that way)


Compression of images

I tested FLIF compression on ~200 MB file. Compression ratio was good but the speed on the other hand was really bad. Decompression wasn't better, it was slow too.

Even lossy compression was slow. This compression is probably only good for small images.


Space filling (Peano) curves

Wikipedia
Find out what is this about
Look specifinally at Z-order curve and Hilbert curve


Image difference

Find difference between included images.
Plot those differences and find out if certain images aren't same, just positioned little bit differently

Pattern matching - Find if there are some patterns in image sets.