DC 3

Data Compression C LLEGPT IT603N
3 Huffman Coding
Data
Data
Compression
Compression
Get Prepared Together
Prepared and Edited by:- Divya Kaurani Designed by:- Kussh Prajapati
www.collegpt.com [email protected]
Unit - 3 : Huffman Coding
Kuch jaruri baatein :
1. Codeword:
A codeword refers to the encoded representation of a specific symbol or
element within the original data. During compression, a codeword is assigned
to each symbol. This codeword is a sequence of bits (0s and 1s) that represents
the original symbol in a compressed form.
2. Self Information:
Self-information, also known as information content, surprisal, or Shannon
information, is a measure of the information content associated with the
outcome of a random variable. It quantifies the unexpectedness, surprise, or
amount of information gained by learning the outcome of a particular event.
3. Unary Code:
Unary coding, also known as the unary numeral system or thermometer code, is
a lossless data compression technique. It's a type of entropy encoding that
represents a natural number with a code of length (n+1). The code is usually
made up of (n) ones followed by a zero. (Eg. 110, n=2).
4. Diagram Coding:
Diagram coding is a data compression method that uses semi-static
dictionaries. In the first pass, the method finds all of the characters and most
frequently used two character blocks (digrams) in the source and inserts them
into a dictionary. In the second pass, compression is performed.
Run Length Coding :
It is a simple form of data compression where sequences of the same data value (runs)
are stored as a single data value and count pair. Instead of repeating the same value
multiple times, the value and the no. of repetitions are encoded, reducing redundancy
in the data.
01110 01001
00110011100110101001
Shannon Fano Coding :
Shannon Fano Algorithm is an entropy encoding technique for lossless data

compression of multimedia. Named after Claude Shannon and Robert Fano, it assigns
a code to each symbol based on their probabilities of occurrence. It is a
variable-length encoding scheme, that is, the codes assigned to the symbols will be of
varying lengths.
Algorithm:
1. Create a list of probabilities or frequency counts for the given set of symbols so that
the relative frequency of occurrence of each symbol is known.
2. Sort the list of symbols in decreasing order of probability, the most probable ones
to the left and the least probable ones to the right.
3. Split the list into two parts, with the total probability of both parts being as close to
each other as possible.
4. Assign the value 0 to the left part and 1 to the right part.
5. Repeat steps 3 and 4 for each part until all the symbols are split into individual
subgroups.
The Shannon codes are considered accurate if the code of each symbol is unique.
Huffman Coding :
Huffman coding is a widely used lossless data compression algorithm that assigns
variable-length codes to symbols based on their frequency of occurrence in the data.
This approach achieves compression by focusing on the information content of each
symbol.
Algorithm:
1. Create a shortened node based on probability and arrange it in descending order.
2. Repeat the next step till all data is completed.
3. Take a smallest node, do the summation and remove that node.
4. Based on the summation on the above step create a new node.
Applications of Huffman Coding:
Huffman coding, a versatile and efficient data compression technique, finds
applications in various domains thanks to its ability to significantly reduce data size
while preserving information integrity. Here's a breakdown of its key applications:
1. File Compression.
2. Network Communication.
3. Embedded Systems and Mobile Apps.
4. Multimedia Processing.
5. Database Systems.
6. Cryptography.
7. Other Applications.
Minimum Variance Huffman Coding: A variation of Huffman coding that focuses on

minimizing the variance of the codeword lengths instead of solely minimizing the
average codeword length.
Objective: Aims for a more balanced distribution of codeword lengths while
maintaining good compression.
Applications:
Error correction: Reliable transmission (e.g., minimizing impact of errors on data).
Predictable codeword length distribution: Applications requiring consistent codeword
lengths (specific communication protocols).
Adaptive Huffman Coding: A dynamic approach to Huffman coding that adapts to the
changing statistics of the data stream.
Objective: Continuously updates the codebook (mapping between symbols and
codewords) based on the encountered symbols to maintain efficient compression.
Applications:
Non-stationary data: Compressing data with changing symbol frequencies (e.g.,
streaming media, sensor data).
Dynamic data sources: Real-time data compression scenarios.
Extended Huffman Coding: A variation of Huffman coding that groups symbols

together into blocks before applying the standard Huffman coding algorithm.
Objective: Aims to improve compression efficiency by taking advantage of statistical
dependencies between symbols within a block.
Applications:
Text compression: Exploiting letter combinations in languages (e.g., bigrams, trigrams).
Image compression: Leveraging repeating patterns in image data.
Multimedia compression: Taking advantage of dependencies in audio and video data.
Summarizing:
1. Standard Huffman Coding:
General-purpose: Suitable for various data types where simplicity and efficiency are
priorities.
Applications: File compression (ZIP, GZIP, etc.), network communication (HTTP
headers, PNG images), embedded systems, multimedia processing (audio/video),
database systems (compression of frequently accessed data).
2. Minimum Variance Huffman Coding:

Focuses on predictable codeword lengths: Beneficial for applications sensitive to
transmission errors (e.g., error correction) or requiring predictable codeword length
distribution.
Applications: Similar to standard Huffman coding, but may be preferred in specific
scenarios like error correction in data transmission.
3. Adaptive Huffman Coding:
Dynamically adapts to changing data: Ideal for non-stationary data where the
frequency of symbols changes throughout the stream.
Applications: Real-time data compression, streaming media (audio/video), potentially
useful for compressing sensor data or other non-stationary data streams.
4. Extended Huffman Coding:

Exploits statistical dependencies: Effective for data with repeating patterns or
predictable symbol combinations.
Applications: Text compression (exploiting letter combinations), image compression
(leveraging repetitive pixel patterns), multimedia compression (audio/video with
dependencies).
Key Differences:
Standard and Minimum Variance: Primarily differ in their focus (average vs. variance of
codeword lengths) but share similar applications.
Adaptive: Stands out for its dynamic adaptation to changing data, making it suitable for
non-stationary data scenarios.
Extended: Targets data with specific statistical dependencies within predictable block
sizes.
Click here for explanation
and answer
Golomb Code :
Golomb codes are a lossless data compression method that encodes positive integers.
They are based on the assumption that larger integers are less likely to occur. It
efficiently represents positive integers by utilizing a combination of unary coding and
binary encoding.
Algorithm:
1. Unary Code of q = ⌊n/m⌋.
2. Let k = ⌈log2 m⌉, c = 2^k - m and r = n mod m.
r’ = if 0 <= r <= c then r truncated to k-1 bits.
else r+c truncated to k bits
3. Concatenate both the above step results.
Tunstall Code :
Tunstall codes are a type of entropy coding used for lossless data compression. They
are a variable-to-fixed length code that can reduce error propagation in
fixed-to-variable length codes. A single Tunstall codeword can represent multiple
letters in a sequence. It minimizes the average codeword length while ensuring unique
decodability (no codeword is a prefix of another).
All the Best
"Enjoyed these notes? Feel free to share them with
your friends and provide valuable feedback in your
review. If you come across any inaccuracies, don't
hesitate to reach out to the author for clarification.
Your input helps us improve!"
Visit: www.collegpt.com
www.collegpt.com [email protected]

DC 3

Uploaded by

Copyright:

Available Formats

DC 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DC 3

Uploaded by

Copyright:

Available Formats

Data Compression C LLEGPT IT603N

Get Prepared Together

Shannon Fano Algorithm is an entropy encoding technique for lossless data

Minimum Variance Huffman Coding: A variation of Huffman coding that focuses on

Extended Huffman Coding: A variation of Huffman coding that groups symbols

2. Minimum Variance Huffman Coding:

4. Extended Huffman Coding:

your friends and provide valuable feedback in your

review. If you come across any inaccuracies, don't

hesitate to reach out to the author for clarification.

Your input helps us improve!"

You might also like