Huffman Coding Technique
Huffman Coding Technique
Huffman Coding Technique
.
Average code length
Entropy H(A)=
Where wi is the probability or weight of the symbol
Examples:
Efficiency of Huffman Codes
Redundancy(r) the difference between the entropy and the
average length of a code.
In this example, if we use fixed-length codes than we have to
spend three bits per sample, which gives code redundancy of
3-2.122=0.878bps.
For Huffman code, the redundancy is zero when the probabilities
are negative powers of two.
Minimum Variance Huffman Codes
When two or more than two symbols in a Huffman tree have the
same probability, different merge orders produce different
Huffman codes.
Two code trees with same symbol probabilities:
Encoding of data:
Given the characters and their frequencies, perform the
algorithm and generate a code. Write the characters using the
code
Decoding of data:
Given the Huffman tree, figure out what each character is
(possible because of prefix property)
ENCODING:
Algorithm:
1. Find the grey-level probabilities for the image by finding the
histogram.
2. Order the input probabilities (histogram magnitudes) from
smallest to largest.
3. Combine the smallest two by addition.
4. GOTO step 2, until only two probabilities are left
5. By working backward along the tree, generate code by
alternating assignment of 0 and 1.
Coding Procedures for an N-symbol source:
1. Source reduction
a. List all probabilities in a descending order
b. Merge the two symbols with smallest probabilities into
a new compound symbol
c. Repeat the above two steps for N-2 steps
2. Codeword assignment
a. Start from the smallest source and work back to the
original source
b. Each merging point corresponds to a node in binary
codeword tree
Example:
Consider an image with 3 bits/pixel, giving 8 possible gray levels.
The image is of 10 rows by 10 columns.
Step 1: Find the histogram for the image and Convert into
probabilities by normalizing to the total number of pixels,
Gray level 1 has 20 pixels
Gray level 2 has 40 pixels
Gray level 3 has 20 pixels
Gray level 4 has 10 pixels
Gray level 5 has 10 pixels
Step 2: The probabilities are ordered.
a
5
=
0.1
a
4
=
0.1
a
3
=
0.2
a
1
=
0.2
a
2
=
0.4
Step 3: Combine the smallest two by addition.
Step 4: Repeat steps 2 and 3, where reorder (if necessary) and
add the two smallest probabilities. Reorder and add until only 2
values remain.
Step 5: Actual code assignment is made. Start on the right-hand
side of the tree and assign 0s & 1s.
Gray level represented by 1 bit, a
2
, is the most likely to occur
(40% of the time) & thus has least information in the information
theoretic sense.
DECODING:
The process of decompression is simply a matter of translating
the stream of prefix codes to individual byte values, usually by
traversing the Huffman tree node by node as each bit is read from
the input stream (reaching a leaf node terminates the search for
that particular byte value). For this however, the Huffman tree
must be somehow reconstructed or otherwise, the information to
reconstruct the tree must be sent a priori.
MATLAB CODE:
The output screen shot:
Advantages:
Algorithm is easy to implement.
Produce a lossless compression of images
Reduce size of data by 20%-90% in general.
Huffman codes can expresses the most common source
symbols using shorter strings of bits than are used for less
common source symbols.
The running time of Huffman's method is fairly efficient.
Huffman code is the most efficient compression method: no
other mapping of individual source symbols to unique strings
of bits will produce a smaller average output size when the
actual symbol frequencies agree with those used to create
the code.
It is generally beneficial to minimize the variance of
codeword length.
Disadvantages:
Although Huffman's original algorithm is optimal for a
symbol-by-symbol coding (i.e. a stream of unrelated
symbols) with a known input probability distribution, it is not
optimal when the symbol-by-symbol restriction is dropped, or
when the probability mass functions are unknown, not
identically distributed, or not independent (e.g., "cat" is more
common than "cta"). Other methods such as arithmetic
coding and LZW coding often have better compression
capability.
If no characters occur more frequently than others, then no
advantage over fixed length code, such as ASCII.
Applications
Huffman coding is a technique used to compress files for
transmission
Uses statistical coding
more frequently used symbols have shorter code words
Works well for text and fax transmissions
An application that uses several data structures
Both the .mp3 and .jpg file formats use Huffman coding at
one stage of the compression.
Alternative method that achieves higher compression but is
slower is patented by IBM, making Huffman Codes
attractive.