Skip to content
Snippets Groups Projects
README.md 8.41 KiB
Newer Older
# Data compression *Thesis*
theazgra's avatar
theazgra committed

## Thesis
Vojtech Moravec's avatar
Vojtech Moravec committed

  ### Current TODO
  - [x] Slit rezy dohromady a vytvorit slovnik z nich
Vojtech Moravec's avatar
Vojtech Moravec committed
    - poslat statisticka data doc. Dvorskemu
Vojtech Moravec's avatar
Vojtech Moravec committed
    - Komprese všech řezů - jeden slovník pro všechny řezy
  - [x] Histogramy nepoužívat tak moc, a když tak z ggplot2
  - [x] Přehodit kompresní poměr, 2 --> 0,5
  - [x] U colormapy neuvádět velikost slovníku a kompresní poměr, ale zobrazit Bits Per Pixel (BPP)
  - [x] Vytvořit dokumentaci pustupu testování
  - [x] Předělat grafy do angličtiny
  - [ ] Nezapomenout na tabulku na konci - | originální obrázek | (velikost slovníku) | kompresní poměr | BPP | obrázek chyb | obrázek po komresi |
Vojtech Moravec's avatar
Vojtech Moravec committed
 - [BigDataViewer](https://imagej.net/BigDataViewer)
Vojtech Moravec's avatar
Vojtech Moravec committed
   - [GitHub repositories](https://github.com/bigdataviewer)
     - [Compression](https://github.com/theazgra/BdvServerCompression) - Our library for compressior
Vojtech Moravec's avatar
Vojtech Moravec committed
     - [Core](https://github.com/bigdataviewer/bigdataviewer-core) - ImgLib2-based viewer for registered SPIM stacks and more
     - [Fiji plugin](https://github.com/bigdataviewer/bigdataviewer_fiji) - Fiji plugins for starting BigDataViewer and exporting data.
     - [Server](https://github.com/bigdataviewer/bigdataviewer-server) - Serving datasets over http to Fiji client
     - [GUI](https://github.com/bigdataviewer/bigdataviewer-ui-panel) - GUI of Fiji plugin
   - Compression target:
     - Client - Server communication
     - Presentation of biological data over network
Vojtech Moravec's avatar
Vojtech Moravec committed
   - Data (3D chunks of view) are stored in HDF5 multidimensional arrays - look how are they stored
   - [Paper](https://arxiv.org/pdf/1412.0488.pdf)
Vojtech Moravec's avatar
Vojtech Moravec committed
 - [Vizualizace](http://julius2.it4i.cz/sona/)
 - **Scalar Quantization**
    - Lloyd-Max algorithm to find optimal quantization values - based on probability distribution function
    - Differential evolution based method to find optimal quantization values
    - `Entropy-Coded Quantization` *p. 276* can help with coding of quantization values
 - **Vector quantization**
Vojtech Moravec's avatar
Vojtech Moravec committed
   - https://github.com/droidadroit/LBG/blob/master/lbg_split.py
   - [Vector Quantization LBG Java App](https://github.com/sherifabdlnaby/Vector-Quantization-LBG-Image-Compression)
Vojtech Moravec's avatar
Vojtech Moravec committed
    - Start with simple then maybe try the adaptive one
    - Evolution algorithms to find codebook:
      - [Vector Quantization using the Improved Differential Evolution Algorithm for Image Compression](https://arxiv.org/ftp/arxiv/papers/1710/1710.05311.pdf)
Vojtech Moravec's avatar
Vojtech Moravec committed
      - [Vector quantization using the firefly algorithm for image compression](https://www.sciencedirect.com/science/article/pii/S0957417411010700)
      - [Codebook Design For Vector Quantization Using Genetic Algorithm](https://pdfs.semanticscholar.org/a5bb/5fd47b0a7f8a96949d006ccde77a88ad63df.pdf)
 - SVD approximation
Vojtech Moravec's avatar
Vojtech Moravec committed
    - Try this.
 - Wavelet technique (*what is it?*)
 - Better error measurement for lossy compression, what is still acceptable
   - Now we have MSE and PSNR.
Vojtech Moravec's avatar
Vojtech Moravec committed
 - Image difference, do it right in the server and visualize in the core.


# Semestral project archive from here below


## Phase 1 TODOs - Completed
Vojtěch Moravec's avatar
Vojtěch Moravec committed
### TODO LIST
theazgra's avatar
theazgra committed
- [x] Tweak Z-Order reordering
theazgra's avatar
theazgra committed
  - Values, which are stored in more than one byte, must stay together.
  - Pixels, which contains more `channels`, can be separeted in final z order. (Try both separeted and not-separated)
    - Right now, when we `reorder` bytes we move *whole* pixel (*all channels*), but when pixel's channels will be separated we will move just one channel.
Vojtěch Moravec's avatar
Vojtěch Moravec committed
- [x] Feed binary data to existing compressors and produce results (tables, graphs)
- Try both data in their order and in `Z-order`
- Compressors to try:
  - [x] gzip (zLib, huffman deflate combination)
  - [x] bzip2
  - [x] LZMA (7-zip)
  - [x] Add *level* CLI option to specify compression level for above algorithms
  - [x] Add CSV writer, which will write test results to file and test function taking folder with test files to parse.
Vojtěch Moravec's avatar
Vojtěch Moravec committed
- [x] Describe what is compression ratio, how are we calculating it.
- [x] Write about Z-Curve order, plot Z-Curve comparsion with normal order.
- [x] Plot/into tables, BPP (Bits Per Pixel), Compression throughput (MB/s).
- [x] Plot how are different frames compressed, x axis -> frame, y axis -> compression ratio
Vojtěch Moravec's avatar
Vojtěch Moravec committed
- [x] Look at *Image difference* (*Negative values can be mapped to odd/even numbers. But the difference must be saved in more than one byte.*)
theazgra's avatar
theazgra committed
    - Things we have tested so far: short-short=int, mapping int to ushort, results are not better
    - [x] Try Z-Order on ushort mapped ints.
Vojtěch Moravec's avatar
Vojtěch Moravec committed
    - ~~Try raw byte difference, find if it can be mapped to 1 or 2 bytes~~
theazgra's avatar
theazgra committed
    - [x] [DeltaCompression](http://www.diva-portal.org/smash/get/diva2:817831/FULLTEXT01.pdf)
        - bsdiff, bspatch
theazgra's avatar
theazgra committed
            - iniat test shows quite poor results of compression ratios below 1.0. We should check some different library implementation of bsdiff MAYBE?.	
theazgra's avatar
theazgra committed
- [x] Histogram of difference values
- [x] Benchamrk difference with `NegativeToEven` mapping.
- [x] Benchmark B3D cuda library - it works finally.
- [ ] Look at patterns in images
- [ ] Try move pixels around and evaluate by quadratic error (pow 2)
theazgra's avatar
theazgra committed

## Phase 2 TODOs - Completed
- [x] Rename `consume_` functions in InBinaryStram to more proper names, like consume_int8
- [x] Implement `OutBinaryStreamBase`, `OutBinaryFileStream`
- [ ] Implement serialization for Czi parts and move to this form of parsing / writing - WIP. *Actual goals are as follows: *
  - [x] Finish implementation of DimensionEntryDV1
  - [x] Finish implementation of DirectoryEntryDV
  - [x] Finish implementation of CziSubblockSegment
  - [x] Finish implementation of CziSubblockDirectorySegment
  - [x] Propert strategy to set variables in `CziFile_`
- [ ] Static parse settings in `Serializable` class
- [ ] Callback functions to load pixel data
- [ ] Multi-file support, specification p. 9 (**problem being - we don't have multi-file czi files.**)
Vojtech Moravec's avatar
Vojtech Moravec committed

## CZI parser TODO list
This is list of things, which have to be done first:
- [x] Parse `SubBlockDirectory` (there will be collection of `DirectoryEntryDV`'s)
  - [x] Exact copy of `DirectoryEntryDV` will be located in the referenced `SubBlock`
  - [x] Parse dimensions entries
theazgra's avatar
theazgra committed
- [x] Parse IEEE 4 / 8 byte float.
theazgra's avatar
theazgra committed
  - Parsing is done via `memcpy` call. Other alternative is using `union`, but double parsing wass't working with it. Later we can take a look on this and maybe improve the conversion and get rid of the copy.
- [x] Parse `SubBlock`
theazgra's avatar
theazgra committed
    - [ ] ~~Parse image data to proper pixel type~~. Do we really want to do that? We really care only about **bytes**. Parsing image data into some `PixelType` Matrix may be good, only if we want to do some operation on the image. This is moved to *later* section then.
- [x] Parse / extract image data from `SubBlock`
  - We are aware of position and size of the data in CZI file. With that two informations we can extract the data easily.
Vojtěch Moravec's avatar
Vojtěch Moravec committed
- [x] Parse image values into matrices. `Matrix<Gray8>` etc.
- **This is outside the scope of this project. Main goal is to try and compare different compressions on image data. Right now we are able to obtain image data and more, so it is enaugh for now.**
  - ~~Parse important informations from XML metadata (e.g. BitsPerPixel and compare with value from parsed binary data)~~
  - ~~Support multi-file situations~~
    - ~~Obtain multi-file CZI files. Currently we don't have any multi-files, so we can't really parse them. Secondary files should have different GUID than master file, also filepart should be different from 0. Files we have right now have (0) in their name, but theirs filepart is 0 and GUID of master file isn't set correctly.~~
    - ~~One master file and more *secondary files* files (our current `CziFile` class kinda support that situations, so keep going that way)~~
theazgra's avatar
theazgra committed

## Compression of images
- I tested [FLIF](https://flif.info/) compression on ~200 MB file. Compression ratio was good but the speed on the other hand was really bad. Decompression wasn't better, it was slow too.
theazgra's avatar
theazgra committed
  - Even lossy compression was slow. This compression is probably only good for small images.

## Space filling (*Peano*) curves
- [Wikipedia](https://en.wikipedia.org/wiki/Space-filling_curve)
- Find out what is this about
- Look specifinally at [Z-order curve](https://en.wikipedia.org/wiki/Z-order_curve) and [Hilbert curve](https://en.wikipedia.org/wiki/Hilbert_curve)
theazgra's avatar
theazgra committed


## Image difference
- Find difference between included images.
- Plot those differences and find out if certain images aren't same, just positioned little bit differently 
Vojtěch Moravec's avatar
Vojtěch Moravec committed
- *Pattern matching* - Find if there are some patterns in image sets.