The Polyomino Compressed Format algorithm

In this section there is a description of the PCIF compression algorithm. This tecnique can be applied to true color images; this means that every pixel of the image by 3 bytes of data, one relative to each one of the primary color green, red and blue. Since a byte represents an integer from 0 to 255, we can see the representation of a raster image as the combination of three matrixes containing integer positive values lower than 256.

The Polyomino Compressed Format algorithm operates in three steps: filtering, decomposition and actual compression.

PCIF compression scheme

Filtering

The first phase concerns filtering. In this phase the size of the data is not reduced but the values of the image samples are subject to a reversible transform. The aim of this transformation is to obtain lots of values equal or close to zero. This will generate an easily compressable set of data for any compression algorithm that follows. The step in divided in three substeps:

  • Standard filtering: First of all, an ordering on the points of the image is defined. Then, starting from the last point and proceeding to the first, the algorithm makes an estimate of the value of the point based on the previous ones. This guess is based on the assumption that compressable images are usually 'smooth' and have a limited numer of sharp edges. The sample is replaced with its difference with the estimated value modulo 256. Thanks to the defined ordering, once the extimation function is known this operation is completely reversible. This phase is described in detail in the standard filtering section.
  • Color filtering: The second filtering step consist in executing and subtracting from every sample of the image another estimate, but this time instead of considering smoothness assumptions the algorithm attempts to use color correlation. In other words we expect that where a zone of an image now contains particular irregular variations (represented by the actual values, since we have executed the filtering phase) these changes are reflected in more than one color at a time. This phase is one of the most innovative in comparison with previous image compression algorithms; it produces effects similar to what we could obtain trought a locally-optimized color transform. Read more about the color filtering in the dedicated section.
  • Remapping: To guarantee a convenient set of values to the next phase of the algorithm, every possible value is mapped into another. What we want to obtain is a high distribution of low values and an increasingly lower frequence for higher values. This step is described further in the remapping section.

Decomposition in layers

The second phase consists in decomposing each of the three matrixes representing the colors of the image in 8 layers. Each of these layers will contain binary values and every matrix will represent a different bit of the bit representation of the original matrix; for example, there will be a matrix representing the least significative bit of the values representing the red component of the image.

Decomposition of the red component

An example of decomposition is in figure where the red component matrix containing values from 0 to 255 (in figure: shades of grey) is decomposed in eight matrixes containing values 0 or 1 (in figure: black and white). The starting image is the red component of the famous Lenna image after filtering. The top layers represent the least significative pixels, and as we expect after the remapping phase they are more caothic than the others.

This operation is usefull for various reasons. First of all, the partition of the data into more or less chaotical zones allows to apply different compression tecniques to different types of data, chosing for each layer an optimal algorithm. Further, adjacent high values of the original matrix will result in groups of adjacent 'one values' in layers associated with more significative bits. This will be usefull when we will group these points in polyominoes in the next step. More information about this phase can be found in the decomposition phase section

Compression

The third phase of the algorithm proceeds in effectively compressing the data processed up to this point. Since the correlation between the various layers have been minimized in the previous step, now these will be processed indipendently. This is the most time-expensive phase of the algorithm, but thanks to this indepencance it can be easily parallelized reducing drastically the execution time.

Depending on how much a layer is chaotic, it will be processed by one of the following. More information about the compression phase and the choice of the compression algorithm is in the compression section.

  • Compression through polyominoes: This is another innovative part of the algorithm. The most ordered layers are selected for this tecnique. The values equal to one are grouped in a series of hv-convex polyominoes for whom a convenient coding has been defined. The coding of the layer proceeds by coding couples of polyominoes and integers said distances that represent the position of the polyominoes relatively one to another. More information about the compression through polyominoes can be found in the apposite section.
  • Compression through RLE and Huffman coding: To compress more chaotical layers Huffman coding is used in combination with Run Length Encoding. Briefly, the layer is scanned in a zig-zag order to maximize correlation between points in both dimensions. The encountered values are RLE coded, and the resulting values are coupled and coded through Huffman trees. With this method we obtain results similar to high-modelling codings with a combination of basic tecniques, mantaining good time performances. Details about compression using RLE and Huffman coding can be found in the relative section.
  • Uncompressed storage: If a layer is by the estimate of the algorithm very chaotic it is stored without any compression. This is usefull first of all if the layer contains uncompressible data, as in these cases an attempt of compression would generate an increment of data size. Further, since good performances is one of the aims of the algorithm, this fastens the compression and decompression of images containing layers that, even if compressed, could be reduced in size of only a few bytes. Isolating uncompressible data is another good result obtainable thanks to the decomposition phase.
The evolution of the PCIF algorithm is now available ! It has a greater compression ratio, it is much faster and the implementation is available in both Java bytecode and native executables. Take a look at the new BCIF algorithm.