5 Data Compression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

EE7101: Introduction to Information and Coding Theory

Handout 5 – Data Compression


Tay Wee Peng

1 Fixed length codes

Definition 1. : Source code C : X 7→ 2L , where 2L is the set of binary strings of length L.

Need |X| ≤ 2L =⇒ log |X| ≤ L < log |X| + 1.


L
There are |X|n n-tuples. To encode, we need log |X| ≤ n < log |X| + n1 .

• does not take into account that some symbols occur more frequently than others.

• for those rare symbol sequences, can we drop them with “acceptable” error? (idea: AEP −→
ignore non-typical set).

2 Lossless source coding

X n −→ Encoder −→ Decoder −→ X̂ n .
A (2nR , n) lossless source code of rate R bits per source symbol consists of:

• encoder m : Xn 7→ [1 : 2nR ) = {1, 2, · · · , 2bnRc }. (i.e. assigns index M = m(xn ) to xn ).

• decoder x̂n : [1 : 2nR ) 7→ Xn ∪ {e} assigns estimate X̂ n = x̂n (M ) in Xn or declares an error e.

(n)
Probability of error Pe = P(X̂ n 6= X n ).
(n)
A rate R is said to be achievable if ∃ a seq. of (2nR , n) codes such that Pe → 0 as n → ∞.
R∗ = infimum of all achievable rates.
Thus, If X1 , X2 , · · · are iid (called a discrete memoryless source, i.e., DMS), then R∗ = H(X).

(n)
Proof. Achievability: Show that ∀R > H(x), can find a seq. of (2nR , n) codes s.t. Pe → 0.
(n)
For ε0 > 0, consider R = (1 + ε0 )H(X). We have |Aε | ≤ 2nR , where ε = ε0 H(X).
(n) (n)
Encoder: Assign distinct m(xn ) to each xn ∈ Aε . Assign m(xn ) = 1, ∀xn ∈
/ Aε .
(n)
Decoder: Let x̂n (M ) be the unique seq. in Aε corresponding to M , or declare an error if M = 1.
(n) (n)
Pe = P(X̂ n 6= X n ) = P(X n ∈
/ Aε ) → 0. ∵ AEP.

1
∴ R is achievable. ε0 is arbitrary =⇒ all R > H(X) are achievable.
(n)
Converse: Show that ∀ seq. of (2nR , n) codes with limn→∞ Pe = 0, we must have R ≥ H(X).
Let M = m(X n ). From Fano’s inequality,

H(X n |M ) ≤ H(X n |X̂ n ) ≤ 1 + nPe(n) log |X|

nR ≥ H(M )
= I(X n ; M ) ∵ H(M |X n ) = 0
= nH(X) − H(X n |M )
 
1 (n)
≥ n H(X) − − Pe log |X|
n
1 (n)
Thus, R ≥ H(X) − n − Pe log |X| → H(X).

Can we do better using variable length codes?

3 Variable length codes

Example, X = {1, 2, 3, 4}.


X P(X) C(X)
1 0.5 0
2 0.25 10
3 0.125 110
4 0.125 111
We can conclude from the table above that we use longer condewords for lower probability symbols.
L(C) = E[l(X)] = 0.5 ∗ 1 + 0.25 ∗ 2 + 0.125 ∗ 3 + 0.125 ∗ 3 = 1.75.
Definition 2. : A code C is uniquely decodable if, for any string of source symbols X n , the
concatenation of the corresponding codewords C(X1 )C(X2 ) · · · C(Xn ) is different from that of any
other string of source symbols.

• Given any encoded string, only one possible source string produces it.

• But may need to look at the entire string to determine even the first symbol in the source
string.

Example. Non-uniquely decodable.


C(1)=0, C(2)=1, C(3)=01, then ”01” can be decoded as either ”1,2” or ”3”.
It’s not uniquely decodable even though codewords are all different.
Definition 3. : A code is prefix-free if no codeword is a prefix of any other codeword (also called
instantaneous code).

2
Example:0, 10, 11 −→ prefix-free.
0, 1, 01 −→ not prefix-free, not uniquely decodable.

Theorem 1. A prefix-free code is uniquely decodeable.

Proof. Code C ⇐⇒ binary tree T :

• each branch is labeled as 0 or 1,

• each leaf corresponds to a codeword.

∵ prefix-free, no codewords at intermediate nodes.


Concatenate C(X1 ), C(X2 ) :

• same as grafting T to each leaf in T ,

• each leaf in new tree corresponds to a concatenated string,

• no strings at any intermediate nodes,

• all concatenations of codewords lie on distinct nodes.

=⇒ uniquely decodable.

Are all uniquely decodable codes prefix-free?


No. For example, C(1) = 0, C(2) = 01, C(3) = 011, is uniquely decodable (0 acts as the separator),
but it’s not prefix-free.

Theorem 2. Kraft inequality for prefix-free codes

• Every prefix-free code with codeword length {li : 1 ≤ i ≤ |X|} satisfies


|X|
X
2−li ≤ 1. (1)
i=1

• If (1) holds for source {li : 1 ≤ i ≤ |X|}, then ∃ a prefix-free code with these codeword lengths.

Proof.

• Suppose C is prefix-free. Consider the binary tree.

• Let lmax be the length of the longest codeword in C.

• Extend binary tree up to level lmax .

3
If codeword has length l, then it has 2lmax −l descendants. At most 2lmax leaves.
=⇒ i 2lmax −li ≤ 2lmax =⇒ i 2−li ≤ 1.
P P

−li
P
• Converse. Given l1 , · · · , lm , s.t. i2 ≤ 1.
=⇒ can construct a binary tree with descendants.

• Label first node of depth l1 as codeword 1, remove descendants.

• Repeat process.

We want minimum expected length prefix-free code. Also need to satisfy Kraft inequality. Formu-
late as an optimization problem:
X
min pi li
i
X
s.t. 2−li ≤ 1.
i

Using Lagrange multiplier:


X X
J= pi li + λ 2−li .
i i
∂J
= pi − λ2−li ln 2 = 0.
∂li
pi
2−li = .
X λ ln 2
2−li = 1 −→ equality in Kraft, otherwise can increase source li and reduce L.
i
1
λ= .
ln 2
1
li = log2 −→ not necessarily integer.
pi
1
Shannon code: li = dlog2 pi e.

Theorem 3. The expected length L of any prefix-free code satisfies L ≥ H(X), with equality iff
pi = 2−li . (Note: these are called dyadic probabilities.)

4
Proof.
X 1 X
H(X) − L = pi log − pi li
pi
i i
X 2−li
= pi log
pi
i
X
= −D(p||q) where qi = 2−li , i = 1, · · · , m, qm+1 = 1 − 2−li ≥ 0, from Kraft’s inequality.
i
≤ 0.

l m
1
We can choose li = log2 pi , which implies that

1 1
log ≤ li < log + 1
pi pi
X 1 X X 1
pi log ≤ pi li < pi log + 1.
pi pi
i i i

Therefore, the optimal prefix-free code has average code length Lmin that satisfies

H(X) ≤ Lmin < H(X) + 1.

How to find the optimal prefix-free code? Is this an integer optimization problem, which is in
general NP-hard? The leading information theorists before 1952, including Shannon, had no exact
solutions.
Huffman, student in Fano’s information theory class at MIT in 1952, came up with an optimal
prefix-free code construction method. This became known as the Huffman code.
Some simple properties:

1 If pi > pj , then li ≤ lj .

2 A prefix-free code is said to be full if no new codewords can be added without destroying the
prefix-free property.
Optimal prefix-free code is full. i 2−li = 1.
P

3 Define sibling of a codeword to be the binary string that differs from the codeword only in the
final digit.
For each of the longest codeword, its sibling is another longest codeword.

Proof. Sibling of a longest codeword cannot be a intermediate node, therefore, it is a codeword.

4 Suppose p1 ≥ p2 ≥ · · · ≥ pm . Then there is an optimal prefix-free code in which codewords for


m − 1 and m are sibling and have maximal length.

5
Proof. – if there is a longest codeword w/o sibling, can delete last bit of this codeword and still
satisfy prefix-free property. =⇒ every longest codeword have a sibling.
– Exchange the longest codewords so that the two lowest probability symbols are associated with
two siblings on the tree. =⇒ L remains the same.

Gives us a recursive way of obtaining an optimal prefix-free code.


L = L0 + Pm−1 + Pm ,where L0 is the length of code after combining m − 1 and m.
=⇒ Lmin = L0min + Pm−1 + Pm . By induction, Huffman code is optimal.

Theorem 4. Kraft inequality for uniquely decodable codes

• Every uniquely decodable code with codeword length {li : 1 ≤ i ≤ |X|} satisfies
|X|
X
2−li ≤ 1. (2)
i=1

• If (2) holds for source {li : 1 ≤ i ≤ |X|}, then ∃ a uniquely decodable code with these codeword
lengths.

See the very elegant proof in Theorem 5.5.1 of Cover and Thomas. Would you use a uniquely
decodable code that is not prefix-free?

You might also like