5 Data Compression
5 Data Compression
5 Data Compression
• does not take into account that some symbols occur more frequently than others.
• for those rare symbol sequences, can we drop them with “acceptable” error? (idea: AEP −→
ignore non-typical set).
X n −→ Encoder −→ Decoder −→ X̂ n .
A (2nR , n) lossless source code of rate R bits per source symbol consists of:
(n)
Probability of error Pe = P(X̂ n 6= X n ).
(n)
A rate R is said to be achievable if ∃ a seq. of (2nR , n) codes such that Pe → 0 as n → ∞.
R∗ = infimum of all achievable rates.
Thus, If X1 , X2 , · · · are iid (called a discrete memoryless source, i.e., DMS), then R∗ = H(X).
(n)
Proof. Achievability: Show that ∀R > H(x), can find a seq. of (2nR , n) codes s.t. Pe → 0.
(n)
For ε0 > 0, consider R = (1 + ε0 )H(X). We have |Aε | ≤ 2nR , where ε = ε0 H(X).
(n) (n)
Encoder: Assign distinct m(xn ) to each xn ∈ Aε . Assign m(xn ) = 1, ∀xn ∈
/ Aε .
(n)
Decoder: Let x̂n (M ) be the unique seq. in Aε corresponding to M , or declare an error if M = 1.
(n) (n)
Pe = P(X̂ n 6= X n ) = P(X n ∈
/ Aε ) → 0. ∵ AEP.
1
∴ R is achievable. ε0 is arbitrary =⇒ all R > H(X) are achievable.
(n)
Converse: Show that ∀ seq. of (2nR , n) codes with limn→∞ Pe = 0, we must have R ≥ H(X).
Let M = m(X n ). From Fano’s inequality,
nR ≥ H(M )
= I(X n ; M ) ∵ H(M |X n ) = 0
= nH(X) − H(X n |M )
1 (n)
≥ n H(X) − − Pe log |X|
n
1 (n)
Thus, R ≥ H(X) − n − Pe log |X| → H(X).
• Given any encoded string, only one possible source string produces it.
• But may need to look at the entire string to determine even the first symbol in the source
string.
2
Example:0, 10, 11 −→ prefix-free.
0, 1, 01 −→ not prefix-free, not uniquely decodable.
=⇒ uniquely decodable.
• If (1) holds for source {li : 1 ≤ i ≤ |X|}, then ∃ a prefix-free code with these codeword lengths.
Proof.
3
If codeword has length l, then it has 2lmax −l descendants. At most 2lmax leaves.
=⇒ i 2lmax −li ≤ 2lmax =⇒ i 2−li ≤ 1.
P P
−li
P
• Converse. Given l1 , · · · , lm , s.t. i2 ≤ 1.
=⇒ can construct a binary tree with descendants.
• Repeat process.
We want minimum expected length prefix-free code. Also need to satisfy Kraft inequality. Formu-
late as an optimization problem:
X
min pi li
i
X
s.t. 2−li ≤ 1.
i
Theorem 3. The expected length L of any prefix-free code satisfies L ≥ H(X), with equality iff
pi = 2−li . (Note: these are called dyadic probabilities.)
4
Proof.
X 1 X
H(X) − L = pi log − pi li
pi
i i
X 2−li
= pi log
pi
i
X
= −D(p||q) where qi = 2−li , i = 1, · · · , m, qm+1 = 1 − 2−li ≥ 0, from Kraft’s inequality.
i
≤ 0.
l m
1
We can choose li = log2 pi , which implies that
1 1
log ≤ li < log + 1
pi pi
X 1 X X 1
pi log ≤ pi li < pi log + 1.
pi pi
i i i
Therefore, the optimal prefix-free code has average code length Lmin that satisfies
How to find the optimal prefix-free code? Is this an integer optimization problem, which is in
general NP-hard? The leading information theorists before 1952, including Shannon, had no exact
solutions.
Huffman, student in Fano’s information theory class at MIT in 1952, came up with an optimal
prefix-free code construction method. This became known as the Huffman code.
Some simple properties:
1 If pi > pj , then li ≤ lj .
2 A prefix-free code is said to be full if no new codewords can be added without destroying the
prefix-free property.
Optimal prefix-free code is full. i 2−li = 1.
P
3 Define sibling of a codeword to be the binary string that differs from the codeword only in the
final digit.
For each of the longest codeword, its sibling is another longest codeword.
5
Proof. – if there is a longest codeword w/o sibling, can delete last bit of this codeword and still
satisfy prefix-free property. =⇒ every longest codeword have a sibling.
– Exchange the longest codewords so that the two lowest probability symbols are associated with
two siblings on the tree. =⇒ L remains the same.
• Every uniquely decodable code with codeword length {li : 1 ≤ i ≤ |X|} satisfies
|X|
X
2−li ≤ 1. (2)
i=1
• If (2) holds for source {li : 1 ≤ i ≤ |X|}, then ∃ a uniquely decodable code with these codeword
lengths.
See the very elegant proof in Theorem 5.5.1 of Cover and Thomas. Would you use a uniquely
decodable code that is not prefix-free?