DC 3
DC 3
DC 3
3 Huffman Coding
Data
Data
Compression
Compression
Get Prepared Together
Prepared and Edited by:- Divya Kaurani Designed by:- Kussh Prajapati
www.collegpt.com [email protected]
Unit - 3 : Huffman Coding
Kuch jaruri baatein :
1. Codeword:
A codeword refers to the encoded representation of a specific symbol or
element within the original data. During compression, a codeword is assigned
to each symbol. This codeword is a sequence of bits (0s and 1s) that represents
the original symbol in a compressed form.
2. Self Information:
Self-information, also known as information content, surprisal, or Shannon
information, is a measure of the information content associated with the
outcome of a random variable. It quantifies the unexpectedness, surprise, or
amount of information gained by learning the outcome of a particular event.
3. Unary Code:
Unary coding, also known as the unary numeral system or thermometer code, is
a lossless data compression technique. It's a type of entropy encoding that
represents a natural number with a code of length (n+1). The code is usually
made up of (n) ones followed by a zero. (Eg. 110, n=2).
4. Diagram Coding:
Diagram coding is a data compression method that uses semi-static
dictionaries. In the first pass, the method finds all of the characters and most
frequently used two character blocks (digrams) in the source and inserts them
into a dictionary. In the second pass, compression is performed.
Run Length Coding :
It is a simple form of data compression where sequences of the same data value (runs)
are stored as a single data value and count pair. Instead of repeating the same value
multiple times, the value and the no. of repetitions are encoded, reducing redundancy
in the data.
01110 01001
00110011100110101001
Shannon Fano Coding :
Algorithm:
1. Create a list of probabilities or frequency counts for the given set of symbols so that
the relative frequency of occurrence of each symbol is known.
2. Sort the list of symbols in decreasing order of probability, the most probable ones
to the left and the least probable ones to the right.
3. Split the list into two parts, with the total probability of both parts being as close to
each other as possible.
4. Assign the value 0 to the left part and 1 to the right part.
5. Repeat steps 3 and 4 for each part until all the symbols are split into individual
subgroups.
The Shannon codes are considered accurate if the code of each symbol is unique.
Get Prepared Together
Huffman Coding :
Huffman coding is a widely used lossless data compression algorithm that assigns
variable-length codes to symbols based on their frequency of occurrence in the data.
This approach achieves compression by focusing on the information content of each
symbol.
Algorithm:
1. Create a shortened node based on probability and arrange it in descending order.
2. Repeat the next step till all data is completed.
3. Take a smallest node, do the summation and remove that node.
4. Based on the summation on the above step create a new node.
Applications of Huffman Coding:
Huffman coding, a versatile and efficient data compression technique, finds
applications in various domains thanks to its ability to significantly reduce data size
while preserving information integrity. Here's a breakdown of its key applications:
1. File Compression.
2. Network Communication.
3. Embedded Systems and Mobile Apps.
4. Multimedia Processing.
5. Database Systems.
6. Cryptography.
7. Other Applications.
Summarizing:
1. Standard Huffman Coding:
General-purpose: Suitable for various data types where simplicity and efficiency are
priorities.
Applications: File compression (ZIP, GZIP, etc.), network communication (HTTP
headers, PNG images), embedded systems, multimedia processing (audio/video),
database systems (compression of frequently accessed data).
Key Differences:
Standard and Minimum Variance: Primarily differ in their focus (average vs. variance of
codeword lengths) but share similar applications.
Adaptive: Stands out for its dynamic adaptation to changing data, making it suitable for
non-stationary data scenarios.
Extended: Targets data with specific statistical dependencies within predictable block
sizes.
Click here for explanation
and answer
Golomb Code :
Golomb codes are a lossless data compression method that encodes positive integers.
They are based on the assumption that larger integers are less likely to occur. It
efficiently represents positive integers by utilizing a combination of unary coding and
binary encoding.
Algorithm:
1. Unary Code of q = ⌊n/m⌋.
2. Let k = ⌈log2 m⌉, c = 2^k - m and r = n mod m.
r’ = if 0 <= r <= c then r truncated to k-1 bits.
else r+c truncated to k bits
3. Concatenate both the above step results.
Tunstall Code :
Tunstall codes are a type of entropy coding used for lossless data compression. They
are a variable-to-fixed length code that can reduce error propagation in
fixed-to-variable length codes. A single Tunstall codeword can represent multiple
letters in a sequence. It minimizes the average codeword length while ensuring unique
decodability (no codeword is a prefix of another).
All the Best
"Enjoyed these notes? Feel free to share them with
Visit: www.collegpt.com
www.collegpt.com [email protected]