Assignment No: 02 Title: Huffman Algorithm

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

MAEERS PUNE.

Department of Information Technology

Assignment No: 02
Title: Huffman Algorithm

Title of Assignment:
Implementation of Huffmans Algorithm using greedy strategy.
Objective:
To know the concept of Data compression (encoding and decoding)
To use a Tree data structure for the cause of decoding and encoding the data
To understand the use of HEAP to achieve optimum time complexity
Problem statement:
Accept a set of characters along with their frequencies. And from this generate the
prefix codes which are used for encoding and decoding of strings consisting of the
characters.
Theory:
Greedy Method
Greedy method is the most significant straight forward design technique.
It can be applied to a wide variety of problems.
Most of these problems have n inputs and requires us to obtain a subset that
satisfies some constraints.
Any subset that satisfies these constraints is called a feasible solution.
Feasible solution either maximizes or minimizes a given objective function.
A feasible solution that does this is called an optimal solution.
Greedy method proposes an algorithm that works in stages, considering one
input at a time.
At each stage a decision is made regarding whether a particular input is in
optimal solution or not.
During the process inputs are considered in an order determined by some
selection procedure.
If the inclusion of next input into partially constructed optimal solution results in
an infeasible solution, we discard this input; otherwise it is added to partially
constructed solution.

TE IT (SEMESTER II)

Software Laboratory
II (Design and Analysis of Algorithms)
1

MAEERS PUNE.

Department of Information Technology

Selection procedure thus is based on some optimization measure, which may be


the objective function.

Huffman coding is a statistical technique which attempts to reduce the amount of bits
required to represent a string of symbols. In order to reduce the amount of bits required
to represent a string of symbols, symbols are allowed to be of varying lengths. Shorter
codes are assigned to the most frequently used symbols, and longer codes to the
symbols which appear less frequently in the string (that's where the statistical part
comes in).
Building a Huffman Tree
The Huffman code for an alphabet (set of symbols) may be generated by constructing a
binary tree with nodes containing the symbols to be encoded and their probabilities of
occurrence. The tree may be constructed as follows:
Data structures used
struct node
{
char alphabet;
int frequency;
char pref_code[20];
struct node *lchild, *rhild;
};
typedef struct node tnode; // tree node structure
struct term
{
tnode *data;
struct term *link;
};
typedef struct list qnode; // node structure for queue
qnode *front, *rear; // front and rear are pointers to qnode

Steps for Huffman


1. Read the no. of characters, n
2. Read first character and its frequency
3. Create a tree node t and set its frequency and character

TE IT (SEMESTER II)

Software Laboratory
II (Design and Analysis of Algorithms)
2

MAEERS PUNE.

Department of Information Technology

4. Heap_list[1] = t
5. heap-size = 1
6. for i= 2 to n
I.

read next character and its frequency

II.

create a new tree node, t

III.

assign frequency and character to the newly created node

IV.

insert the tree node to a heap, maintaining min heap property

V.

increment heap-size

7. root = create_tree()
8. compute_prefix_codess(root)
9. encode()
10. decode()
11. end
Steps for creating tree ( heap_size should be at least 2)
1. t1 = delete from heap, the node having minimum frequency is deleted,
decrement heap size by 1
2. adjust the heap after deleting one element to maintain min heap property
3. t2 = delete from heap, the node having minimum frequency is deleted,
decrement heap size by 1
4. adjust the heap after deleting one element to maintain min heap property
5. create a new tree node, temp
6. add the frequencies of deleted nodes and assign it as a frequency of newly
created node
7. temp->lc = t1 and temp->rc = t2

TE IT (SEMESTER II)

Software Laboratory
II (Design and Analysis of Algorithms)
3

MAEERS PUNE.

Department of Information Technology

8. if heap is empty return temp


9. else insert into heap and increment heap size by 1
10. go to step 1
Steps for computing prefix code : uses root as a pointer to tree node
1. Initialize front and rear pointers of the queue to NULL
2. zero = 0 and one =1
3. rootpref_code = NULL
4. Create a new qnode, front and rear will point to this qnode. Queue is maintained
as a Singly Linked List
5. Data field of the queue is root ( Pointer to tree node), root is added to queue
6. Delete a tree node from queue, say temp and
7. copy prefix_code of temp to string say str.
8. If templchild is not NULL
8.1 concatenate string str with zero and store this a s prefix_code of templchild
8.2 add templchild to the queue
9. If temprchild is not NULL
9.1 concatenate string str with one and store this a s prefix_code of temprchild
9.2 add temprchild to the queue
10. If templchild and trchild , both are NULL ( temp is leaf node)
10.1 store the alphabet, frequency and prefix_codes of temp in an array
11. If queue is not empty go to step 6
12. Print prefix_code array: alphabet, frequency and prefix_code
Note: One can use heap (minimum Heap) data structure to store the parent less tree
pointers or singly link list to store the tree nodes. Depending on this the algorithms may
differ for insertion and deletion. If heap is used, every time you insert/delete you need to
adjust/heapify to maintain the minimum heap. Time complexity is of the order (nlogn). If
you use link list, insert will insert at appropriate position so that list is sorted on
ascending order of frequency values. Deletion will be always from the beginning of the

TE IT (SEMESTER II)

Software Laboratory
II (Design and Analysis of Algorithms)
4

MAEERS PUNE.

Department of Information Technology

list. Time complexity for deletion in this case is O(1). But for insertion it is of the order
O(n*n).
Once the Huffman tree is built, the code for each symbol may be obtained by tracing a
path to the symbol from the root of the tree. A 1 is assigned for a branch in one direction
and a 0 is assigned for a branch in the other direction. For example a symbol which is
reached by branching right once, then twice left may be represented by the pattern
'100'. The figure below depicts codes for nodes of a sample tree.

Square nodes are leaf nodes for


which prefix codes are found
out and shown in bold.

00

01

100

101

11

Huffman Encoding
This problem is that of finding the minimum length bit string which can be used to
encode a string of symbols. One application is text compression:
What's the smallest number of bits (hence the minimum size of file) we
can use to store an arbitrary piece of text?
Huffman's scheme uses a table of frequency of occurrence for each symbol (or
character) in the input. This table may be derived from the input itself or from data which
is representative of the input. For instance, the frequency of occurrence of letters in
normal English might be derived from processing a large number of text documents and
then used for encoding all text documents. We then need to assign a variable-length bit

TE IT (SEMESTER II)

Software Laboratory
II (Design and Analysis of Algorithms)
5

MAEERS PUNE.

Department of Information Technology

string to each character that unambiguously represents that character. This means that
the encoding for each character must have a unique prefix. If the characters to be
encoded are arranged in a binary tree:
The time complexity of the Huffman algorithm is O(nlogn). Using a heap to store the
weight of each tree, each iteration requires O(logn) time to determine the cheapest
weight and insert the new weight. There are O(n) iterations, one for each item.
Decoding Huffman-encoded Data
"How do we decode a Huffman-encoded bit string? With these variable
length strings, it's not possible to break up an encoded string of bits into
characters!"
The decoding procedure is simple. Start with the root node. Starting with the first bit in
the stream, one then uses successive bits from the stream to determine whether to go
left or right in the decoding tree. We traverse to the left child if bit is 0 and we traverse
to the right if bit is 1. When we reach a leaf of the tree, we've decoded a character, so
we place that character onto the decoded (uncompressed) output stream. The next bit
in the input stream is the first bit of the prefix code of next character. We reset the tree
pointer to the root node and start decoding next prefix code in similar fashion. The
process will terminate at the end of encoded string.
Applications:
Huffman coding today is often used as a "back-end" to some other compression
method. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3
have a front-end model and quantization followed by Huffman coding.
Testing
Input :
Accept no. of alphabets :5
Alphabet Frequency
a
0.5
b
0.3
c
0.1
d
0.4
e
0.6
Output :
Prefix Codes are as follows

TE IT (SEMESTER II)

Software Laboratory
II (Design and Analysis of Algorithms)
6

MAEERS PUNE.

Department of Information Technology

d -> 00 c -> 010 b -> 011 a -> 10 e -> 11


Enter a Message for encoding:
acdeb
Encoded Message is :
100100011011
Decoded string is acdeb
Analysis:
The time complexity of the Huffman algorithm is O(nlogn). Using a heap to store the
weight of each tree, each iteration requires O(logn) time to determine the cheapest
weight and insert the new weight. There are O(n) iterations, one for each item.
Advantages:
Arithmetic coding can be viewed as a generalization of Huffman coding; indeed, in
practice arithmetic coding is often preceded by Huffman coding, as it is easier to find an
arithmetic code for a binary input than for a nonbinary input. Also, although arithmetic
coding offers better compression performance than Huffman coding, Huffman coding is
still in wide use because of its simplicity, high speed.
Conclusion:
Thus we have studied and implemented the Huffmans Algorithm
FAQs
1. Explain the difference between divide and conquer strategy and greedy strategy.
2. Give the characteristics of greedy.
3. Explain heapify or adjust operation while deleting or inserting into heap.

TE IT (SEMESTER II)

Software Laboratory
II (Design and Analysis of Algorithms)
7

You might also like