The PCIF algorithm: compression through RLE and Huffman codes

Layers that are mediumly chaotic are compressed throught a tecnique that combines Run Lenght Encoding (RLE) and Huffman coding. The combination of these two tecniques is quite widely used to obtain a high compression even through Huffman codes even when we have lots of occurencies of an unique simbol (0, in our case). We obtain in this way a compressed size very close to the enthropy of the layer without using computationally-expensive tecniques as arithmetic coding.

This method can be expressed shortly by the following steps:

  1. The layer is scanned from left to right and from low to bottom, proceeding in a zig-zag pattern described in figure (an example on an 8x5 matrix). The numer of zeros and ones encountered is counted (RLE).
  2. Every couple of sequences of zeros and ones is encoded with an Huffman tree.
Zig zag visit order for RLE and Huffman compressed layers

The Huffman tree must be encoded along the compressed bitstream. The coupling of the sequences is usefull to capture the information about density of ones in a given zone: if we have found a high number of zeros it is reasanable to expect a low number of ones to follow. Grouping together sequences of ones and zeros we consider this information in the coding obtaining a higher compression ratio.

The zig-zag order is used to consider correlation in both dimensions without losing the computational advantage given by the principle of locality, as we consider only two rows of the layer at a time. A more efficient way to consider bidimensional correlation, even if this would reduce the computational efficiency, would be to use a fractal-style visit order.

The evolution of the PCIF algorithm is now available ! It has a greater compression ratio, it is much faster and the implementation is available in both Java bytecode and native executables. Take a look at the new BCIF algorithm.