Taco de Wolff

Data Scientist

JPEG scan data

Published on February 20, 2012

This articles provides information on how the JPEG scan data entropy encoded segment is constructed at binary level. It is accompanied with this article: JPEG file format. The conversion from a raw bitmap to JPEG is implemented in this tool: JPEG

I assert you have loaded a raw bitmap into memory with RGB components. The following steps in that sequence will need to occur to acquire the scan data.

Blocks

Divide the image into blocks of 8×8 (row-major). These will be processed individually throughout and can be parallelized if you wish. If your image is not a multiple of 8 in width or height, you can pad the pixels with black to make it a multiple of 8. This will give artifacts at the edges though. It is better to stretch the last pixel to fill till the multiple-of-8 boundary.

RGB to YCbCr

Convert the RGB values to YCbCr values. Since the human eye is more sensitive to luminance than to chrominance, we can use a different quantization table for chrominance, which discarts more information (resulting in smaller files). The relation between RGB and YCbCr is as follows (assuming RGB values of 0-255):

Y  = 0.299 * R + 0.587 * G + 0.114 * B;
Cb = 0.492 * (B - Y);
Cr = 0.877 * (R - Y);
Y -= 128.0;

Fourier transform

A Fourier transform is performed over each color components separately (Y, Cb and Cr). In particular the DCT variant is used. After the DCT we need to normalize our values. When using FFTW the values need to be divided by 16. Additionally the first row and first column need to be divided by √2 (thus the DC value, the top-left element, need to be divided by 2).

The reason for using a Fourier transform is by converting the image into the frequency domain. Top-left elements will contains slowly evolving characteristics of the image, while the bottom-right of the block will contain the high frequency components. These high frequency components will be discarted the most, leaving all edges somewhat blurry. At a slight distance this is hardly noticeable.

Quantization

Divide each element of the block by the corresponding element in the quantization table. The quantization table itself should be multiplied by a certain alpha value and it is this multiplied table that should be written to the JPEG file. The alpha value is calculated as such:

double alpha(double quality)
{
	return (quality <= 50.0
			 ? 50.0 / quality
			   : 2.0 - quality / 50.0);
}

Use a rounding function to round this to a signed integer. I use the truncate function:

double truncate(double i)
{
	double r = round(i);
	if (r < 0.0 && r == i - 0.5)
	{
		return r + 1.0;
	}
	return r;
}

Huffman encoding

Convert the 8×8 block of quantized DCT values into a 1D array by using the serpentine / zigzag order (see bottom of the page). This will cause most of the zero elements to be put at the end of the array, making compression more efficient with run-length encoding. After reordering, the first element will be the DC value, which will be treated special, and any subsequent value is AC.

The DC value is encoded as follows:

Subtract the previous DC value from the current
Find k and t so that dc = (2^k-1 + t) for positive DC and dc = -(2^k - t - 1) for negative DC
Set s = 0 for negative dc and s = 1 for positive
Huffman encode k, the maximal value for k is 11 (see bottom of the page)
Write the DC Huffman encoded k, s and t to the file. k has variable number of bits (Huffman encoded), s has one bit, t is written with k - 1 bits

If the DC value is zero instead, just write the DC Huffman encoded 0x00 to the file.

Non-zero AC values are encoded as follows. Zero AC values are skipped.

Let d denote the run of zeros before the current non-zero AC value.
Write d as d = 16p + r, where r = 0, 1, 2, …, 15
Find k and t so that ac = (2^k-1 + t) for positive AC and ac = -(2^k - t - 1) for negative AC
p is written as p times 0xF0
r and k are both written as 4 bits each and appended to eachother
Notice that 0xF0 is the same as r = 15 and k = 0. Which means a run of 15 zeros and another zero -> 16 zeros!
AC Huffman encode each byte, there are p + 1 bytes, and write those to the file
Write s (1 bit) and t (k - 1 bits) to the file as well

If the last values of the AC are zero and there is no non-zero AC to specify the number of zero elements, simply write the AC Huffman encoded 0x00 to denote the end of the block. If the last AC element is non-zero you do not end the block with an AC Huffman encoded 0x00. This is also the case if all AC elements are zero. So if both the DC and all AC values are zero, write the DC Huffman encoded 0x00 followed by the AC Huffman encoded 0x00.

Serpentine order

This is a way of converting a 2D array into a 1D array, by going over each diagonal in zigzag way. Example:

 0,  1,  2,  3,
 4,  5,  6,  7,
 8,  9, 10, 11,
12, 13, 14, 15

becomes

0,
1, 4,
8, 5, 2,
3, 6, 9, 12,
13, 10, 7,
11, 14,
15

Huffman encoding

The default Huffman tables are given in JPEG file format which gives two arrays for each table: bits and values. The bits array denotes how many values have a certain bit length. The first entry in bits tells how many values use 1 bit, second entry tells how many values use 2 bits, etc. That means that the cumulated number of each bits entry must match with the length of the value array.

Using a Huffman tree we can determine the bit sequence associated with a certain value. This allows us to map any given input value to its corresponding Huffman encoded bit sequence. The Huffman tree works as follows. We start with a root node, each node has two childs. The left child stands for binary 0 and the right child stands for binary 1. For each child node we can again add two child nodes for binary 0 and 1. When traversing down the tree, for each edge you would add either a 0 or a 1 to your bit sequence.

For the DC coefficients the first value is represented by 2 bits. That means that at depth 2 the first child (when looking from the left) is associated with the first value in the value array (bit sequence 00). It has become a leaf. The other three nodes at that depth do get child nodes in the next iteration and according to the DC table five values are associated at this depth (010, 011, 100, 101, 110). This goes on until all values have associated Huffman encoded bit sequences.

More information

http://www.seas.gwu.edu/~ayoussef/cs225/standards.html#DCH

http://www.impulseadventure.com/photo/jpeg-huffman-coding.html

http://www.w3.org/Graphics/JPEG/itu-t81.pdf