Information Theory
Information Theory
Information Theory
Fundamental limits in
Information Theory
Chapter 10 :
Lecture edition by K.Heikkinen
Chapter 10 goals
To understand limits of information theory
To learn the meaning of entropy and channel
capacity
To understand coding theorems and their
principles
To learn most common theorems used in
information theory (e.g. huffman)
Chapter 10 contents
Introduction
Uncertainty, Information and Entropy
Source Coding Theorem
Data Compaction
Discrete Memoryless Channels
Mutual Information
Channel Capacity
Channel Coding Theorem
Differential Entropy and Mutual Information for Continuous Ensembles
Information Capacity Theorem
Implications of the Information Capacity Theorem
Rate Distortion Theory
Compression of Information
Introduction
What is information theory ?
Information theory is needed to enable the communication
system to carry information (signals) from sender to
receiver over a communication channel
it deals with mathematical modelling and analysis of a
communication system
its major task is to answer to the questions of signal compression
and transfer rate
Those answers can be found and solved by entropy and
channel capacity
Entropy
Entropy is defined in terms of probabilistic
behaviour of a source of information
In information theory the source output are
discrete random variables that have a certain
fixed finite alphabet with certain probabilities
Entropy is an average information content for the
given source symbol
1
0
2
)
1
( log ) (
K
k
k
k
p
p H
Entropy (example)
Entropy of Binary
Memoryless Source
Binary memoryless
source has symbols
0 and 1 which have
probabilities p0 and
p1 (1-p0)
Count the entropy as
a function of p0
Source Coding Theorem
Source coding means an effective
representation of data generated by a
discrete source
representation by source encoder
statistics of the source must be
known (e.g. if coding priorities exist)
Data Compaction
Data compaction (a.k.a lossless data
compression) means that we will remove
redundant information from the signal
prior the transmission
basically this is achieved by assigning short
descriptions to the most frequent outcomes of
the source output and vice versa
Source-coding schemes that are used in
data compaction are e.g. prefix coding,
huffman coding, lempel-ziv
Data Compaction example
Prefix coding has an important feature
that it is always uniquely decodable and
it also satisfies Kraft-McMillan (see
formula 10.22 p. 624) inequality term
Prefix codes can also be referred to as
instantaneous codes, meaning that the
decoding process is achieved
immediately
Data Compaction example
In Huffman coding to
each symbol of a given
alphabet is assigned a
sequence of bits accor-
ding to the symbol
probability
Data Compaction example
In Lempel-Ziv coding no probabilities of
the source symbols is needed, which is
actually most often the case
LZ algorithm parses the source data
stream into segments that are the shortest
subsequences not encountered
previously
Discrete Memoryless Channels
A discrete memoryless channel is a statistical
model with an input of X and output of Y,
which is a noisy version of X (here both are
random variables)
In each time slot the channel accepts an input
symbol X selected from a given alphabet
We can create channel matrix that corresponds
fixed channel inputs and outputs and we can
assume the probabilities of the symbols
Discrete Memoryless Channels
Example : Binary symmetric channel
Mutual Information
Mutual information uses conditional entropy
of X selected from a known alphabet
conditional entropy means the uncertainty
remaining about the channel input after the
channel output has been observed
Mutual information has several properties :
symmetric channel
always nonnegative
relation to the joint entropy of a channel input and
channel output
Mutual Information
Mutual Information
Example : Imagine a village in a distance, in
which the inhabitants are nerd-smurfs. One
could divide these smurfs into two distinctive
groups.
In group A 50% always tells correctly the bitcode if
asked, 30 % will lie if asked, 20 % go beserk
In group B the same percentages are 30 %, 50 %
and 20%
calculate mutual information
Channel Capacity
Capacity in the channel is defined as a
intrinsic ability of a channel to convey
information
Using mutual information the channel
capacity of a discrete memoryless
channel is a maximum average mutual
information in any single use of channel
over all possible probability distributions
Discrete Memoryless Channels
Example : Binary symmetric channel revisited
capacity of a binary symmetric channel with given
input probabilities
variability with the error probability
Channel Coding Theorem
Channel coding consists of mapping the
incoming data sequence into a channel
input sequence and vice versa via
inverse mapping
mapping operations performed by encoders
Source
Channel
Encoder
Channel
Channel
decoder
Destination
Information Capacity Theorem
A channel with noise and the signal are
received is described as discrete time,
memoryless Gaussian channel (with
power-limitation)
example : Sphere Packing
Implications of the Information
Capacity Theorem
Set of M-ary examples