ICE513 Module 4 - Source Coding
ICE513 Module 4 - Source Coding
ICE513 Module 4 - Source Coding
Lecture Outline
• Definitions
• Data Compression
• Kraft-McMillian Inequality
– Consequences of K-M Inequality
• Optimal Codes
• Huffman Codes
Data Compression
• Goal – establish the fundamental limit for the
compression of information.
• Data compression can be achieved by
assigning short descriptions to the most
frequent outcomes of the data source, and
necessarily longer descriptions to the less
frequent outcomes.
Data Compression
• For example, in Morse code, the most
frequent symbol is represented by a single
• In this module, we will find the shortest
average description length of a random
Digital Communication System
Source code C(x)
• A source code C of a Example 1
random variable X is a
mapping from the range •Let X be a random variable
of X, to the set of finite- depicting the toss of a fair coin;
length (D) of symbols D=2, since alphabet is
from a D-ary alphabet. {0,1} ≡ {H,T}
• Cx denotes the
Let code for H=00⇒C(H)=00
codeword corresponding
Code for T⇒C(T) =11
to x; while lx is the
length corresponding to
∴ l(H)=l(T)=l[Cx]=2
Expected Length L(C) of a Code
L(C): Uniquely Decodeable
Example 2 Expected Length
Given that Px = {½, ¼, ¼}
and {C1=0, C2=01,C3=101},
Find the Entropy H(X) and
Expected length of the code
Note: We shall see later that (2) is always valid for uniquely decodeable codes
with equality iff C(X) is an optimal code.
L(C): Optimal (Complete)
Example 3
L(C): Fixed-length Code
Example 4 Expected length
Find the expected length for
the ensemble X of example 3,
if Cx is a fixed-length code
given as: Cx={00, 01, 10, 11}
i.e. lx={2, 2, 2, 2}.
For a fixed-length code, equation (1)
Prefix-free {Instantaneous} Codes
Example 5
Consider the code:
C{1, 2, 3, 4} = {0, 10, 110, 111}
•Since none of the codes is a prefix of the
A codeword C(x) is other i.e. none of the codes begins
said to be prefix-free another. C(x) is a prefix-free code
iff no codeword C(xi) – instantaneous,
is a prefix of any – self-punctuating code
other codeword C(xj).
Extension C(x') of a Code
The extension C(x') of a
code C(x) is the
mapping from finite For Example 5
length strings of X to
finite-length string of D.
Uniquely-Decodeable Codes
• A code is uniquely decodeable if any encoded
string has only one possible source string that
can produce it.
• For this set of codes:
L(C) ≥ H(C)
with equality iff C(X) is a complete code
U-D Codes
Example 6
Check if a code given as C{1, 2, 3, 4} = {0, 1, 00, 11} with PX = { ½, ¼, ⅛, ⅛} is
uniquely decodeable.
Find H(X) and L(C).
H(X) = H{½, ¼, ⅛, ⅛} = 1.75 bits
L(C) = ½ + ¼ + ¼ + ¼ = 1.25bits
L(C) < H(C)
The code is not uniquely decodeable
A given source string X1 = 134213 encodes as {000111000} same as another
source string X2 = 312431 which also encodes as {000111000}.
Hence, C = 000111000 is not uniquely decodeable since it has more than one
(X1 and X2) source strings that can produce it.
U-D, Non-prefix Codes
Example 7
Check if a code given as C{A, B, C, D} ={0, 01, 011, 111} with
PX = { ½, ¼, ⅛, ⅛} is uniquely decodeable.
Find H(X) and L(C).
H(X) = H{½, ¼, ⅛, ⅛} = 1.75 bits
L(C) = 1x½ + 2x¼ + 3x⅛ + 3x⅛ = 1.75bits
L(C) = H(C)
The code is uniquely decodeable and complete
Though C is both uniquely decodeable (and complete), it is
however not a prefix code, since CA is a prefix of CB and CC etc.
Kraft-McMillian Inequality
For any uniquely decodeable code
This inequality gives
C over the binary alphabet {0,1},
the limitation on the the codeword length must satisfy:
set of minimal
expected codeword (5)
lengths possible for
prefix codes to Conversely
describe a given Given a set of codeword lengths
source uniquely. that satisfy this inequality, there ∃
a uniquely decodeable prefix code
with these codeword lengths.
K-M Inequality: Consequences
a) If it holds with strict inequality {i.e. },
then the code is redundant.
b) If it holds with strict equality {i.e. },
then the code is a complete code.
c) If it does not hold {i.e. }, then the code
is not uniquely decodeable.
Binary Trees
• Any binary tree can be
viewed as a prefix code for
the leaves (i.e. terminal
nodes) of the tree.
Example 8
For the binary tree presented, (1), (2), and (3) are depths.
b and c are at depth 1;
equation (6) gives: d, e, f, and g are at depth 2 while
h, i, j, k are depth 3.
Since the Kraft-McMillian inequality holds with strict inequality,
we conclude that this particular code has some redundancy.
Binary Tree
Example 9
Since the Kraft-McMillian inequality holds with strict equality,
we conclude that this is a complete code.
Lower Bound on L(C)
The expected length L(C) of a uniquely
decodeable code is bounded below by
H(X). i.e. :
L(C) ≥ H(X)
with equality iff 2-li=Pi
Lower Bound on L(C)
When L(C) = H(X), the code is
said to be optimal.
base change
fundamental inequality
Huffman Codes
• An entropy encoding • Developed by David A.
algorithm for lossless data Huffman @ MIT as a PhD
compression student.
• A simple algorithm that • Published in 1952 in “A
allows for the construction Method for Construction of
Minimum Redundancy
of an optimal {i.e. shortest
expected length} prefix
• It is the most efficient
compression method of its
• Simplest algorithm gives
highest priority to least
probable node
Huffman Codes
• Consider a random variable X taking values in
the set X = {1, 2, 3, 4, 5} with probabilities
0.25, 0.25, 0.2, 0.15, 0.15, respectively.
• We expect the optimal binary code for X to
have the longest codewords assigned to the
symbols 4 and 5.
Example 10
A source generates four different symbols {a, b, c, d} with
probabilities PX = { ½, ¼, ⅛, ⅛} respectively. Construct an optimal
code to encode this source and check for optimality.
Example 11
Construct a binary Huffman code for the following distribution
on five symbols: PX = {0.3, 0.3, 0.2, 0.1, 0.1} and check for
Huffman Codes
Let Ax={1, 2, 3, 4, 5, 6, 7} and
Px={0.49, 0.26, 0.12, 0.04, 0.04, 0.03, 0.02}.
Construct Huffman code for encoding the
alphabet and check for optimality.
Further verify if there is any redundancy.