Assignment No: 02 Title: Huffman Algorithm
Assignment No: 02 Title: Huffman Algorithm
Assignment No: 02 Title: Huffman Algorithm
Assignment No: 02
Title: Huffman Algorithm
Title of Assignment:
Implementation of Huffmans Algorithm using greedy strategy.
Objective:
To know the concept of Data compression (encoding and decoding)
To use a Tree data structure for the cause of decoding and encoding the data
To understand the use of HEAP to achieve optimum time complexity
Problem statement:
Accept a set of characters along with their frequencies. And from this generate the
prefix codes which are used for encoding and decoding of strings consisting of the
characters.
Theory:
Greedy Method
Greedy method is the most significant straight forward design technique.
It can be applied to a wide variety of problems.
Most of these problems have n inputs and requires us to obtain a subset that
satisfies some constraints.
Any subset that satisfies these constraints is called a feasible solution.
Feasible solution either maximizes or minimizes a given objective function.
A feasible solution that does this is called an optimal solution.
Greedy method proposes an algorithm that works in stages, considering one
input at a time.
At each stage a decision is made regarding whether a particular input is in
optimal solution or not.
During the process inputs are considered in an order determined by some
selection procedure.
If the inclusion of next input into partially constructed optimal solution results in
an infeasible solution, we discard this input; otherwise it is added to partially
constructed solution.
TE IT (SEMESTER II)
Software Laboratory
II (Design and Analysis of Algorithms)
1
MAEERS PUNE.
Huffman coding is a statistical technique which attempts to reduce the amount of bits
required to represent a string of symbols. In order to reduce the amount of bits required
to represent a string of symbols, symbols are allowed to be of varying lengths. Shorter
codes are assigned to the most frequently used symbols, and longer codes to the
symbols which appear less frequently in the string (that's where the statistical part
comes in).
Building a Huffman Tree
The Huffman code for an alphabet (set of symbols) may be generated by constructing a
binary tree with nodes containing the symbols to be encoded and their probabilities of
occurrence. The tree may be constructed as follows:
Data structures used
struct node
{
char alphabet;
int frequency;
char pref_code[20];
struct node *lchild, *rhild;
};
typedef struct node tnode; // tree node structure
struct term
{
tnode *data;
struct term *link;
};
typedef struct list qnode; // node structure for queue
qnode *front, *rear; // front and rear are pointers to qnode
TE IT (SEMESTER II)
Software Laboratory
II (Design and Analysis of Algorithms)
2
MAEERS PUNE.
4. Heap_list[1] = t
5. heap-size = 1
6. for i= 2 to n
I.
II.
III.
IV.
V.
increment heap-size
7. root = create_tree()
8. compute_prefix_codess(root)
9. encode()
10. decode()
11. end
Steps for creating tree ( heap_size should be at least 2)
1. t1 = delete from heap, the node having minimum frequency is deleted,
decrement heap size by 1
2. adjust the heap after deleting one element to maintain min heap property
3. t2 = delete from heap, the node having minimum frequency is deleted,
decrement heap size by 1
4. adjust the heap after deleting one element to maintain min heap property
5. create a new tree node, temp
6. add the frequencies of deleted nodes and assign it as a frequency of newly
created node
7. temp->lc = t1 and temp->rc = t2
TE IT (SEMESTER II)
Software Laboratory
II (Design and Analysis of Algorithms)
3
MAEERS PUNE.
TE IT (SEMESTER II)
Software Laboratory
II (Design and Analysis of Algorithms)
4
MAEERS PUNE.
list. Time complexity for deletion in this case is O(1). But for insertion it is of the order
O(n*n).
Once the Huffman tree is built, the code for each symbol may be obtained by tracing a
path to the symbol from the root of the tree. A 1 is assigned for a branch in one direction
and a 0 is assigned for a branch in the other direction. For example a symbol which is
reached by branching right once, then twice left may be represented by the pattern
'100'. The figure below depicts codes for nodes of a sample tree.
00
01
100
101
11
Huffman Encoding
This problem is that of finding the minimum length bit string which can be used to
encode a string of symbols. One application is text compression:
What's the smallest number of bits (hence the minimum size of file) we
can use to store an arbitrary piece of text?
Huffman's scheme uses a table of frequency of occurrence for each symbol (or
character) in the input. This table may be derived from the input itself or from data which
is representative of the input. For instance, the frequency of occurrence of letters in
normal English might be derived from processing a large number of text documents and
then used for encoding all text documents. We then need to assign a variable-length bit
TE IT (SEMESTER II)
Software Laboratory
II (Design and Analysis of Algorithms)
5
MAEERS PUNE.
string to each character that unambiguously represents that character. This means that
the encoding for each character must have a unique prefix. If the characters to be
encoded are arranged in a binary tree:
The time complexity of the Huffman algorithm is O(nlogn). Using a heap to store the
weight of each tree, each iteration requires O(logn) time to determine the cheapest
weight and insert the new weight. There are O(n) iterations, one for each item.
Decoding Huffman-encoded Data
"How do we decode a Huffman-encoded bit string? With these variable
length strings, it's not possible to break up an encoded string of bits into
characters!"
The decoding procedure is simple. Start with the root node. Starting with the first bit in
the stream, one then uses successive bits from the stream to determine whether to go
left or right in the decoding tree. We traverse to the left child if bit is 0 and we traverse
to the right if bit is 1. When we reach a leaf of the tree, we've decoded a character, so
we place that character onto the decoded (uncompressed) output stream. The next bit
in the input stream is the first bit of the prefix code of next character. We reset the tree
pointer to the root node and start decoding next prefix code in similar fashion. The
process will terminate at the end of encoded string.
Applications:
Huffman coding today is often used as a "back-end" to some other compression
method. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3
have a front-end model and quantization followed by Huffman coding.
Testing
Input :
Accept no. of alphabets :5
Alphabet Frequency
a
0.5
b
0.3
c
0.1
d
0.4
e
0.6
Output :
Prefix Codes are as follows
TE IT (SEMESTER II)
Software Laboratory
II (Design and Analysis of Algorithms)
6
MAEERS PUNE.
TE IT (SEMESTER II)
Software Laboratory
II (Design and Analysis of Algorithms)
7