Intro To Communications Madhow
Intro To Communications Madhow
Intro To Communications Madhow
net
Upamanyu Madhow
University of California, Santa Barbara
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Contents
Preface 9
Acknowledgements 13
1 Introduction 15
1.1 Analog or Digital? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.1 Analog communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.2 Digital communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.3 Why digital? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.1.4 Why analog design remains important . . . . . . . . . . . . . . . . . . . . 21
1.2 A Technology Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Scope of this Textbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Why Study Communication Systems? . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5 Concept Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Endnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
2.8.2 Frequency Domain Relationships . . . . . . . . . . . . . . . . . . . . . . . 69
2.8.3 Complex baseband equivalent of passband filtering . . . . . . . . . . . . . 75
2.8.4 General Comments on Complex Baseband . . . . . . . . . . . . . . . . . . 76
2.9 Wireless Channel Modeling in Complex Baseband . . . . . . . . . . . . . . . . . . 77
2.10 Concept Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.11 Endnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
4.3.2 Nyquist Criterion for ISI Avoidance . . . . . . . . . . . . . . . . . . . . . . 165
4.3.3 Bandwidth efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.3.4 Power-bandwidth tradeoffs: a sneak preview . . . . . . . . . . . . . . . . . 170
4.3.5 The Nyquist criterion at the link level . . . . . . . . . . . . . . . . . . . . 172
4.3.6 Linear modulation as a building block . . . . . . . . . . . . . . . . . . . . 173
4.4 Orthogonal and Biorthogonal Modulation . . . . . . . . . . . . . . . . . . . . . . . 173
4.5 Proofs of the Nyquist theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.6 Concept Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.7 Endnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.A Power spectral density of a linearly modulated signal . . . . . . . . . . . . . . . . 191
4.B Simulation resource: bandlimited pulses and upsampling . . . . . . . . . . . . . . 192
5
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
5.D.1 Baseband representation of passband white noise . . . . . . . . . . . . . . 277
5.E SNR Computations for Analog Modulation . . . . . . . . . . . . . . . . . . . . . . 278
5.E.1 Noise Model and SNR Benchmark . . . . . . . . . . . . . . . . . . . . . . . 278
5.E.2 SNR for Amplitude Modulation . . . . . . . . . . . . . . . . . . . . . . . . 278
5.E.3 SNR for Angle Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
6
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
7.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Epilogue 457
7
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
8
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Preface
Progress in telecommunications over the past two decades has been nothing short of revolution-
ary, with communications taken for granted in modern society to the same extent as electricity.
There is therefore a persistent need for engineers who are well-versed in the principles of commu-
nication systems. These principles apply to communication between points in space, as well as
communication between points in time (i.e, storage). Digital systems are fast replacing analog
systems in both domains. This book has been written in response to the following core question:
what is the basic material that an undergraduate student with an interest in communications
should learn, in order to be well prepared for either industry or graduate school? For example, a
number of institutions only teach digital communication, assuming that analog communication
is dead or dying. Is that the right approach? From a purely pedagogical viewpoint, there are
critical questions related to mathematical preparation: how much mathematics must a student
learn to become well-versed in system design, what should be assumed as background, and at
what point should the mathematics that is not in the background be introduced? Classically,
students learn probability and random processes, and then tackle communication. This does not
quite work today: students increasingly (and I believe, rightly) question the applicability of the
material they learn, and are less interested in abstraction for its own sake. On the other hand,
I have found from my own teaching experience that students get truly excited about abstract
concepts when they discover their power in applications, and it is possible to provide the means
for such discovery using software packages such as Matlab. Thus, we have the opportunity to
get a new generation of students excited about this field: by covering abstractions “just in time”
to shed light on engineering design, and by reinforcing concepts immediately using software ex-
periments in addition to conventional pen-and-paper problem solving, we can remove the lag
between learning and application, and ensure that the concepts stick.
This textbook represents my attempt to act upon the preceding observations, and is an out-
growth of my lectures for a two-course undergraduate elective sequence on communication at
UCSB, which is often also taken by some beginning graduate students. Thus, it can be used as
the basis for a two course sequence in communication systems, or a single course on digital com-
munication, at the undergraduate or beginning graduate level. The book also provides a review
or introduction to communication systems for practitioners, easing the path to study of more
advanced graduate texts and the research literature. The prerequisite is a course on signals and
systems, together with an introductory course on probability. The required material on random
processes is included in the text.
A student who masters the material here should be well-prepared for either graduate school or
the telecommunications industry. The student should leave with an understanding of baseband
and passband signals and channels, modulation formats appropriate for these channels, random
processes and noise, a systematic framework for optimum demodulation based on signal space
concepts, performance analysis and power-bandwidth tradeoffs for common modulation schemes,
a hint of the power of information theory and channel coding, and introduction to communication
techniques for dispersive channels and multiple antenna systems. Given the significant ongoing
research and development activity in wireless communication, and the fact that an understanding
of wireless link design provides a sound background for approaching other communication links,
material enabling hands-on discovery of key concepts for wireless system design is interspersed
9
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
throughout the textbook.
I should add that I firmly believe that the utility of this material goes well beyond communica-
tions, important as that field is. Communications systems design merges concepts from signals
and systems, probability and random processes, and statistical inference. Given the broad appli-
cability of these concepts, a background in communications is of value in a large variety of areas
requiring “systems thinking,” as I discuss briefly at the end of Chapter 1.
The goal of the lecture-style exposition in this book is to clearly articulate a selection of concepts
that I deem fundamental to communication system design, rather than to provide comprehensive
coverage. “Just in time” coverage is provided by organizing and limiting the material so that we
get to core concepts and applications as quickly as possible, and by sometimes asking the reader
to operate with partial information (which is, of course, standard operating procedure in the
real world of engineering design). However, the topics that we do cover are covered in sufficient
detail to enable the student to solve nontrivial problems and to obtain hands-on involvement via
software labs. Descriptive material that can easily be looked up online is omitted.
Organization
• Chapter 1 provides a perspective on communication systems, including a discussion of the
transition from analog to digital communication and how it colors the selection of material in
this text.
• Chapter 2 provides a review of signals and systems (biased towards communications applica-
tions), and then discusses the complex baseband representation of passband signals and systems,
emphasizing its critical role in modeling, design and implementation. A software lab on modeling
and undoing phase offsets in complex baseband, while providing a sneak preview of digital mod-
ulation, is included. Chapter 2 also includes a section on wireless channel modeling in complex
baseband using ray tracing, reinforced by a software lab which applies these ideas to simulate
link time variations for a lamppost based broadband wireless network.
• Chapter 3 covers analog communication techniques which are relevant even as the world goes
digital, including superheterodyne reception and phase locked loops. Legacy analog modulation
techniques are discussed to illustrate core concepts, as well as in recognition of the fact that
suboptimal analog techniques such as envelope detection and limiter-discriminator detection
may have to be resurrected as we push the limits of digital communication in terms of speed
and power consumption. Software labs reinforce and extend concepts in amplitude and angle
modulation.
• Chapter 4 discusses digital modulation, including linear modulation using constellations such
as Pulse Amplitude Modulation (PAM), Quadrature Amplitude Modulation (QAM), and Phase
Shift Keying (PSK), and orthogonal modulation and its variants. The chapter includes discussion
of the number of degrees of freedom available on a bandlimited channel, the Nyquist criterion
for avoidance of intersymbol interference, and typical choices of Nyquist and square root Nyquist
signaling pulses. We also provide a sneak preview of power-bandwidth tradeoffs (with detailed
discussion postponed until the effect of noise has been modeled in Chapters 5 and 6). A software
lab providing a hands-on feel for Nyquist signaling is included in this chapter.
The material in Chapters 2 through 4 requires only a background in signals and systems.
• Chapter 5 provides a review of basic probability and random variables, and then introduces
random processes. This chapter provides detailed discussion of Gaussian random variables, vec-
tors and processes; this is essential for modeling noise in communication systems. Examples
which provide a preview of receiver operations in communication systems, and computation of
performance measures such as error probability and signal-to-noise ratio (SNR), are provided.
Discussion of circular symmetry of white noise, and noise analysis of analog modulation tech-
10
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
niques is placed in an appendix, since this is material that is often skipped in modern courses on
communication systems.
• Chapter 6 covers classical material on optimum demodulation for M-ary signaling in the pres-
ence of additive white Gaussian noise (AWGN). The background on Gaussian random variables,
vectors and processes developed in Chapter 5 is applied to derive optimal receivers, and to analyze
their performance. After discussing error probability computation as a function of SNR, we are
able to combine the materials in Chapters 4 and 6 for a detailed discussion of power-bandwidth
tradeoffs. Chapter 6 concludes with an introduction to link budget analysis, which provides
guidelines on the choice of physical link parameters such as transmit and receive antenna gains,
and distance between transmitter and receiver, using what we know about the dependence of
error probability as a function of SNR. This chapter includes a software lab which builds on the
Nyquist signaling lab in Chapter 4 by investigating the effect of noise. It also includes another
software lab simulating performance over a time-varying wireless channel, examining the effects
of fading and diversity, and introduces the concept of differential demodulation for avoidance of
explicit channel tracking.
Chapters 2 through 6 provide a systematic lecture-style exposition of what I consider core con-
cepts in communication at an undergraduate level.
• Chapter 7 provides a glimpse of information theory and coding whose goal is to stimulate the
reader to explore further using more advanced resources such as graduate courses and textbooks.
It shows the critical role of channel coding, provides an initial exposure to information-theoretic
performance benchmarks, and discusses belief propagation in detail, reinforcing the basic con-
cepts through a software lab.
• Chapter 8 provides a first exposure to the more advanced topics of communication over dis-
persive channels, and of multiple antenna systems, often termed space-time communication, or
Multiple Input Multiple Output (MIMO) communication. These topics are grouped together be-
cause they use similar signal processing tools. We emphasize lab-style “discovery” in this chapter
using three software labs, one on adaptive linear equalization for singlecarrier modulation, one on
basic OFDM transceiver operations, and one on MIMO signal processing for space-time coding
and spatial multiplexing. The goal is for students to acquire hands-on insight that hopefully
motivates them to undertake a deeper and more systematic investigation.
• Finally, the epilogue contains speculation on future directions in communications research and
technology. The goal is to provide a high-level perspective on where mastery of the introductory
material in this textbook could lead, and to argue that the innovations that this field has already
seen set the stage for many exciting developments to come.
The role of software: Software problems and labs are integrated into the text, with “code frag-
ments” implementing core functionalities provided in the text. While code can be provided online,
separate from the text (and indeed, sample code is made available online for instructors), code
fragments are integrated into the text for two reasons. First, they enable readers to immediately
see the software realization of a key concept as they read the text. Second, I feel that students
learn more by putting in the work of writing their own code, building on these code fragments
if they wish, rather than using code that is easily available online. The particular software that
we use is Matlab, because of its widespread availability, and because of its importance in design
and performance evaluation in both academia and industry. However, the code fragments can
also be viewed as “pseudocode,” and can be easily implemented using other software packages or
languages. Block-based packages such as Simulink (which builds upon Matlab) are avoided here,
because the use of software here is pedagogical rather than aimed at, say, designing a complete
system by putting together subsystems as one might do in industry.
11
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
12
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Acknowledgements
This book is an outgrowth of lecture notes for an undergraduate elective course sequence in
communications at UCSB, and I am grateful to the succession of students who have used, and
provided encouraging comments on, the evolution of the course sequence and the notes. I would
also like to acknowledge faculty in the communications area at UCSB who were kind enough to
give me a “lock” on these courses over the past few years, as I was developing this textbook.
The first priority in a research university is to run a vibrant research program, hence I must
acknowledge the extraordinarily capable graduate students in my research group over the years
this textbook was developed. They have done superb research with minimal supervision from
me, and the strength of their peer interactions and collaborations is what gave me the mental
space, and time, needed to write this textbook. Current and former group members who have
directly helped with aspects of this book include Andrew Irish, Babak Mamandipoor, Dinesh
Ramasamy, Maryam Eslami Rasekh, Sumit Singh, Sriram Venkateswaran, and Aseem Wadhwa.
I gratefully acknowledge the funding agencies that have provided support for our research group in
recent years, including the National Science Foundation (NSF), the Army Research Office (ARO),
the Defense Advanced Research Projects Agency (DARPA), and the Systems on Nanoscale In-
formation Fabrics (SONIC), a center supported by DARPA and Microelectronics Advanced Re-
search Corporation (MARCO). One of the primary advantages of a research university is that
undergraduate education is influenced, and kept up to date, by cutting edge research. This text-
book embodies this paradigm both in its approach (an emphasis on what one can do with what
one learns) and content (emphasis of concepts that are fundamental background for research in
the area).
I thank Phil Meyler and his colleagues at Cambridge University Press for encouraging me to ini-
tiate this project, and for their blend of patience and persistence in getting me to see it through
despite a host of other commitments. I also thank the anonymous reviewers of the book pro-
posal and sample chapters sent to Cambridge several years back for their encouragement and
constructive comments. I am also grateful to a number of faculty colleagues who have given en-
couragement, helpful suggestions, and pointers to alternative pedagogical approaches: Professor
Soura Dasgupta (University of Iowa), Professor Jerry Gibson (UCSB), Professor Gerhard Kramer
(Technische Universitat Munchen, Munich, Germany), Professor Phil Schniter (Ohio State Uni-
versity), and Professor Venu Veeravalli (University of Illinois at Urbana-Champaign). Finally, I
thank the anonymous reviewers of the almost-final manuscript for their helpful comments.
Finally, I would like to thank my wife and children for always being the most enjoyable and
interesting people to spend time with. Recharging my batteries in their company, and that of
our many pets, is what provides me with the energy needed for an active professional life.
13
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
14
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 1
Introduction
15
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
are the messages we wish to convey over a communication system. In their original form–
both during generation and consumption–these message signals are analog: they are continuous
time signals, with the signal values also lying in a continuum. When someone plays the violin,
an analog acoustic signal is generated (often translated to an analog electrical signal using a
microphone). Even when this music is recorded onto a digital storage medium such as a CD (using
the digital communication framework outlined in Section 1.1.2), when we ultimately listen to the
CD being played on an audio system, we hear an analog acoustic signal. The transmitted signals
corresponding to physical communication media are also analog. For example, in both wireless
and optical communication, we employ electromagnetic waves, which correspond to continuous
time electric and magnetic fields taking values in a continuum.
Information Information
Modulator Channel Demodulator
Source consumer
Figure 1.1: Block diagram for an analog communication system. The modulator transforms
the message signal into the transmitted signal. The channel distorts and adds noise to the
transmitted signal. The demodulator extracts an estimate of the message signal from the received
signal arriving from the channel.
Given the analog nature of both the message signal and the communication medium, a natural
design choice is to map the analog message signal (e.g., an audio signal, translated from the
acoustic to electrical domain using a microphone) to an analog transmitted signal (e.g., a radio
wave carrying the audio signal) that is compatible with the physical medium over which we wish
to communicate (e.g., broadcasting audio over the air from an FM radio station). This approach
to communication system design, depicted in Figure 1.1, is termed analog communication. Early
communication systems were all analog: examples include AM (amplitude modulation) and FM
(frequency modulation) radio, analog television, first generation cellular phone technology (based
on FM), vinyl records, audio cassettes, and VHS or beta videocassettes
While analog communication might seem like the most natural option, it is in fact obsolete. Cel-
lular phone technologies from the second generation onwards are digital, vinyl records and audio
cassettes have been supplanted by CDs, and videocassettes by DVDs. Broadcast technologies
such as radio and television are often slower to upgrade because of economic and political factors,
but digital broadcast radio and television technologies are either replacing or sidestepping (e.g.,
via satellite) analog FM/AM radio and television broadcast. Let us now define what we mean by
digital communication, before discussing the reasons for the inexorable trend away from analog
and towards digital communication.
16
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
digits (zeros or ones), or bits. This is true whether the information source is text, speech, au-
dio or video. Techniques for performing the mapping from the original source signal to a bit
sequence are generically termed source coding. They often involve compression, or removal of
redundancy, in a manner that exploits the properties of the source signal (e.g., the heavy spatial
correlation among adjacent pixels in an image can be exploited to represent it more efficiently
than a pixel-by-pixel representation).
• Digital information transfer: Once the source encoding is done, our communication task re-
duces to reliably transferring the bit sequence at the output of the source encoder across space or
time, without worrying about the original source and the sophisticated tricks that have been used
to encode it. The performance of any communication system depends on the relative strengths
of the signal and noise or interference, and the distortions imposed by the channel. Shannon
showed that, once we fix these operational parameters for any communication channel, there
exists a maximum possible rate of reliable communication, termed the channel capacity. Thus,
given the information bits at the output of the source encoder, in principle, we can transmit them
reliably over a given link as long as the information rate is smaller than the channel capacity,
and we cannot transmit them reliably if the information rate is larger than the channel capac-
ity. This sharp transition between reliable and unreliable communication differs fundamentally
from analog communication, where the quality of the reproduced source signal typically degrades
gradually as the channel conditions get worse.
A block diagram for a typical digital communication system based on these two threads is shown
in Figure 1.2. We now briefly describe the role of each component, together with simplified
examples of its function.
Message
signal
Source encoder: As already discussed, the source encoder converts the message signal into a
sequence of information bits. The information bit rate depends on the nature of the message
signal (e.g., speech, audio, video) and the application requirements. Even when we fix the class
of message signals, the choice of source encoder is heavily dependent on the setting. For example,
video signals are heavily compressed when they are sent over a cellular link to a mobile device,
but are lightly compressed when sent to an high definition television (HDTV) set. A cellular link
17
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
can support a much smaller bit rate than, say, the cable connecting a DVD player to an HDTV
set, and a smaller mobile display device requires lower resolution than a large HDTV screen. In
general, the source encoder must be chosen such that the bit rate it generates can be supported
by the digital communication link we wish to transfer information over. Other than this, source
coding can be decoupled entirely from link design (we comment further on this a bit later).
Example: A laptop display may have resolution 1024 × 768 pixels. For a grayscale digital image,
the intensity for each pixel might be represented by 8 bits. Multiplying by the number of
pixels gives us about 6.3 million bits, or about 0.8 Mbyte (a byte equals 8 bits). However,
for a typical image, the intensities for neighboring pixels are heavily correlated, which can be
exploited for significantly reducing the number of bits required to represent the image, without
noticeably distorting it. For example, one could take a two-dimensional Fourier transform, which
concentrates most of the information in the image at lower frequencies and then discard many
of the high frequency coefficients. There are other possible transforms one could use, and also
several more processing stages, but the bottomline is that, for natural images, state of the art
image compression algorithms can provide 10X compression (i.e., reduction in the number of bits
relative to the original uncompressed digital image) with hardly any perceptual degradation. Far
more aggressive compression ratios are possible if we are willing to tolerate more distortion. For
video, in addition to the spatial correlation exploited for image compression, we can also exploit
temporal correlation across successive frames.
Channel encoder: The channel encoder adds redundancy to the information bits obtained
from the source encoder, in order to facilitate error recovery after transmission over the channel.
It might appear that we are putting in too much work, adding redundancy just after the source
encoder has removed it. However, the redundancy added by the channel encoder is tailored to
the channel over which information transfer is to occur, whereas the redundancy in the original
message signal is beyond our control, so that it would be inefficient to keep it when we transmit
the signal over the channel.
Example: The noise and distortion introduced by the channel can cause errors in the bits we
send over it. Consider the following abstraction for a channel: we can send a string of bits (zeros
or ones) over it, and the channel randomly flips each bit with probability 0.01 (i.e., the channel
has a 1% error rate). If we cannot tolerate this error rate, we could repeat each bit that we wish
to send three times, and use a majority rule to decide on its value. Now, we only make an error
if two or more of the three bits are flipped by the channel. It is left as an exercise to calculate
that an error now happens with probability approximately 0.0003 (i.e., the error rate has gone
down to 0.03%). That is, we have improved performance by introducing redundancy. Of course,
there are far more sophisticated and efficient techniques for introducing redundancy than the
simple repetition strategy just described; see Chapter 7.
Modulator: The modulator maps the coded bits at the output of the channel encoder to a
transmitted signal to be sent over the channel. For example, we may insist that the transmitted
signal fit within a given frequency band and adhere to stringent power constraints in a wireless
system, where interference between users and between co-existing systems is a major concern.
Unlicensed WiFi transmissions typically occupy 20-40 MHz of bandwidth in the 2.4 or 5 GHz
bands. Transmissions in fourth generation cellular systems may often occupy bandwidths ranging
from 1-20 MHz at frequencies ranging from 700 MHz to 3 GHz. While these signal bandwidths
are being increased in an effort to increase data rates (e.g., up to 160 GHz for emerging WiFi
standards, and up to 100 MHz for emerging cellular standards), and new frequency bands are
being actively explored (see the epilogue for more discussion), the transmitted signal still needs
to be shaped to fit within certain spectral constraints.
Example: Suppose that we send bit value 0 by transmitting the signal s(t), and bit value 1 by
transmitting −s(t). Even for this simple example, we must design the signal s(t) so it fits within
spectral constraints (e.g., two different users may use two different segments of spectrum to avoid
interfering with each other), and we must figure out how to prevent successive bits of the same
user from interfering with each other. For wireless communication, these signals are voltages
18
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
generated by circuits coupled to antennas, and are ultimately emitted as electromagnetic waves
from the antennas.
The channel encoder and modulator are typically jointly designed, keeping in mind the antici-
pated channel conditions, and the result is termed a coded modulator.
Channel: The channel distorts and adds noise, and possibly interference, to the transmitted sig-
nal. Much of our success in developing communication technologies has resulted from being able
to optimize communication strategies based on accurate mathematical models for the channel.
Such models are typically statistical, and are developed with significant effort using a combi-
nation of measurement and computation. The physical characteristics of the communication
medium vary widely, and hence so do the channel models. Wireline channels are typically well
modeled as linear and time-invariant, while optical fiber channels exhibit nonlinearities. Wireless
mobile channels are particularly challenging because of the time variations caused by mobility,
and due to the potential for interference due to the broadcast nature of the medium. The link
design also depends on system-level characteristics, such as whether or not the transmitter has
feedback regarding the channel, and what strategy is used to manage interference.
Example: Consider communication between a cellular base station and a mobile device. The elec-
tromagnetic waves emitted by the base station can reach the mobile’s antennas through multiple
paths, including bounces off streets and building surfaces. The received signal at the mobile can
be modeled as multiple copies of the transmitted signal with different gains and delays. These
gains and delays change due to mobility, but the rate of change is often slow compared to the
data rate, hence over short intervals, we can get away with modeling the channel as a linear
time-invariant system that the transmitted signal goes through before arriving at the receiver.
Demodulator: The demodulator processes the signal received from the channel to produce bit
estimates to be fed to the channel decoder. It typically performs a number of signal processing
tasks, such as synchronization of phase, frequency and timing, and compensating for distortions
induced by the channel.
Example: Consider the simplest possible channel model, where the channel just adds noise to
the transmitted signal. In our earlier example of sending ±s(t) to send 0 or 1, the demodulator
must guess, based on the noisy received signal, which of these two options is true. It might make
a hard decision (e.g., guess that 0 was sent), or hedge its bets, and make a soft decision, saying,
for example, that it is 80% sure that the transmitted bit is a zero. There are a host of other
aspects of demodulation that we have swept under the rug: for example, before making any
decisions, the demodulator has to perform functions such as synchronization (making sure that
the receiver’s notion of time and frequency is consistent with the transmitter’s) and equalization
(compensating for the distortions due to the channel).
Channel decoder: The channel decoder processes the imperfect bit estimates provided by
the demodulator, and exploits the controlled redundancy introduced by the channel encoder to
estimate the information bits.
Example: The channel decoder takes the guesses from the demodulator and uses the redundancies
in the channel code to clean up the decisions. In our simple example of repeating every bit three
times, it might use a majority rule to make its final decision if the demodulator is putting out
hard decisions. For soft decisions, it might use more sophisticated combining rules with improved
performance.
While we have described the demodulator and decoder as operating separately and in sequence
for simplicity, there can be significant benefits from iterative information exchange between the
two. In addition, for certain coded modulation strategies in which channel coding and modulation
are tightly coupled, the demodulator and channel decoder may be integrated into a single entity.
Source decoder: The source decoder processes the estimated information bits at the output
of the channel decoder to obtain an estimate of the message. The message format may or may
not be the same as that of the original message input to the source encoder: for example, the
source encoder may translate speech to text before encoding into bits, and the source decoder
19
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
may output a text message to the end user.
Example: For the example of a digital image considered earlier, the compressed image can be
translated back to a pixel-by-pixel representation by taking the inverse spatial Fourier transform
of the coefficients that survived the compression.
We are now ready to compare analog and digital communication, and discuss why the trend
towards digital is inevitable.
20
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The preceding makes it clear that source-channel separation, and the associated bit pipe abstrac-
tion, is crucial in the formation and growth of modern communication networks. However, there
are some important caveats that are worth noting. Joint source-channel design can provide bet-
ter performance in some settings, especially when there are constraints on delay or complexity,
or if multiple users are being supported simultaneously on a given communication medium. In
practice, this means that “local” violations of the separation principle (e.g., over a wireless last
hop in a communication network) may be a useful design trick. Similarly, the bit pipe abstraction
used by network designers is too simplistic for the design of wireless networks at the edge of the
Internet: physical properties of the wireless channel such as interference, multipath propagation
and mobility must be taken into account in network engineering.
21
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the world into cells, with “spatial reuse” of precious spectrum resources in cells that are “far
enough” apart. Base stations serve mobiles in their cells, and hand them off to adjacent base
stations when the mobile moves to another cell. While cellular networks were invented to support
voice calls for mobile users, today’s mobile devices (e.g., “smart phones” and tablet computers)
are actually powerful computers with displays large enough for users to consume video on the go.
Thus, cellular networks must now support seamless access to the Internet. The billions of mobile
devices in use easily outnumber desktop and laptop computers, so that the most important parts
of the Internet today are arguably the cellular networks at its edge. Mobile service providers are
having great difficulty keeping up with the increase in demand resulting from this convergence
of cellular and Internet; by some estimates, the capacity of cellular networks must be scaled up
by several orders of magnitude, at least in densely populated urban areas! As discussed in the
epilogue, a major challenge for the communication researcher and technologist, therefore, is to
come up with the breakthroughs required to deliver such capacity gains.
Another major success in wireless is WiFi, a catchy term for a class of standardized wireless
local area network (WLAN) technologies based on the IEEE 802.11 family of standards. Cur-
rently, WiFi networks use unlicensed spectrum in the 2.4 and 5 GHz bands, and have come into
widespread use in both residential and commercial environments. WiFi transceivers are now
incorporated into almost every computer and mobile device. One way of alleviating the cellular
capacity crunch that was just mentioned is to offload Internet access to the nearest WiFi net-
work. Of course, since different WiFi networks are often controlled by different entities, seamless
switching between cellular and WiFi is not always possible.
It is instructive to devote some thought to the contrast between cellular and WiFi technologies.
Cellular transceivers and networks are far more tightly engineered. They employ spectrum that
mobile operators pay a great deal of money to license, hence it is critical to use this spectrum
efficiently. Furthermore, cellular networks must provide robust wide-area coverage in the face of
rapid mobility (e.g., automobiles at highway speeds). In contrast, WiFi uses unlicensed (i.e., free!)
spectrum, must only provide local coverage, and typically handles much slower mobility (e.g.,
pedestrian motion through a home or building). As a result, WiFi can be more loosely engineered
than cellular. It is interesting to note that despite the deployment of many uncoordinated
WiFi networks in an unlicensed setting, WiFi typically provides acceptable performance, partly
because the relatively large amount of unlicensed spectrum (especially in the 5 GHz band) allows
for channel switching when encountering excessive interference, and partly because of naturally
occurring spatial reuse (WiFi networks that are “far enough” from each other do not interfere
with each other). Of course, in densely populated urban environments with many independently
deployed WiFi networks, the performance can deteriorate significantly, a phenomenon sometimes
referred to as a tragedy of the commons (individually selfish behavior leading to poor utilization
of a shared resource). As we briefly discuss in the epilogue, both the cellular and WiFi design
paradigms need to evolve to meet our future needs.
Technology story 3: Moore’s law. Moore’s “law” is actually an empirical observation at-
tributed to Gordon Moore, one of the founders of Intel Corporation. It can be paraphrased as
saying that the density of transistors in an integrated circuit, and hence the amount of compu-
tation per unit cost, can be expected to increase exponentially over time. This observation has
become a self-fulfilling prophecy, because it has been taken up by the semiconductor industry
as a growth benchmark driving their technology roadmap. While Moore’s law may be slowing
down somewhat, it has already had a spectacular impact on the communications industry by
drastically lowering the cost and increasing the speed of digital computation. By converting
analog signals to the digital domain as soon as possible, advanced transceiver algorithms can
be implemented in digital signal processing (DSP) using low-cost integrated circuits, so that re-
search breakthroughs in coding and modulation can be quickly transitioned into products. This
leads to economies of scale that have been critical to the growth of mass market products in both
wireless (e.g., cellular and WiFi) and wireline (e.g., cable modems and DSL) communication.
22
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Figure 1.3: The Internet has a core of routers and servers connected by high-speed fiber links,
with wireless networks hanging off the edge (figure courtesy Aseem Wadhwa).
How do these stories come together? The sketch in Figure 1.3 highlights key building blocks
of the Internet today. The core of the network consists of powerful routers that direct packets
of data from an incoming edge to an outgoing edge, and servers (often housed in large data
centers) that serve up content requested by clients such as personal computers and mobile devices.
The elements in the core network are connected by high-speed optical fiber. Wireless can be
viewed as hanging off the edge of the Internet. Wide area cellular networks may have worldwide
coverage, but each base station is typically connected by a high-speed link to the wired Internet.
WiFi networks are wireless local area networks, typically deployed indoors (but potentially also
providing outdoor coverage for low-mobility scenarios) in homes and office buildings, connected to
the Internet via last mile links, which might run over copper wires (a legacy of wired telephony,
with transceivers typically upgraded to support broadband Internet access) or coaxial cable
(originally deployed to deliver cable television, but now also providing broadband Internet access).
Some areas have been upgraded to optical fiber to the curb or even to the home, while some
others might be remote enough to require wireless last mile solutions.
Figure 1.4: A segment of a cellular network with idealized hexagonal shapes (figure courtesy
Aseem Wadhwa).
23
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Zooming in now on cellular networks, Figure 1.4 shows three adjacent cells in a cellular network
with hexagonal cells. A working definition of a cell is that it is the area around a base station
where the signal strength is higher than that from other base stations. Of course, under realistic
propagation conditions, cells are never hexagonal, but the concept of spatial reuse still holds: the
interference between distant cells can be neglected, hence they can use the same communication
resources. For example, in Figure 1.4, we might decide to use three different frequency bands
in the three cells shown, but might then reuse these bands in other cells. Figure 1.4 also shows
that a user may be simultaneously in range of multiple base stations when near cell boundaries.
Crossing these boundaries may result in a handoff from one base station to another. In addition,
near cell boundaries, a mobile device may be in communication with multiple base stations
simultaneously, a concept known as soft handoff.
It is useful for a communication system designer to be aware of the preceding “big picture” of
technology trends and network architectures in order to understand how to direct his or her
talents as these systems continue to evolve (the epilogue contains more detailed speculation
regarding this evolution). However, the first order of business is to acquire the fundamentals
required to get going in this field. These are quite simply stated: a communication system
designer must be comfortable with mathematical modeling (in order to understand the state of
the art, as well as to devise new models as required), and with devising and evaluating signal
processing algorithms based on these models. The goal of this textbook is to provide a first
exposure to such a technical background.
24
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
introductions to these topics via code fragments and software labs that hopefully encourage the
reader to explore further.
1.6 Endnotes
There are a large number of textbooks on communication systems at both the undergraduate and
graduate level. Undergraduate texts include Haykin [1], Proakis and Salehi [2], Pursley [3], and
25
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Ziemer and Tranter [4]. Graduate texts, which typically focus on digital communication include
Barry, Lee and Messerschmitt [5], Benedetto and Biglieri [6], Madhow [7], and Proakis and Salehi
[8]. The first coherent exposition of the modern theory of communication receiver design is in
the classical (graduate level) textbook by Wozencraft and Jacobs [9]. Other important classical
graduate level texts are Viterbi and Omura [10] and Blahut [11]. More specialized references (e.g.,
on signal processing, information theory, channel coding, wireless communication) are mentioned
in later chapters. In addition to these textbooks, an overview of many important topics can be
found in the recently updated mobile communications handbook [12] edited by Gibson.
This book is intended to be accessible to readers who have never been exposed to communication
systems before. It has some overlap with more advanced graduate texts (e.g., Chapters 2, 4, 5
and 6 here overlap heavily with Chapters 2 and 3 in the author’s own graduate text [7]), and
provides the technical background and motivation required to easily access these more advanced
texts. Of course, the best way to continue building expertise in the field is by actually working
in it. Research and development in this field requires study of the research literature, of more
specialized texts (e.g., on information theory, channel coding, synchronization), and of commer-
cial standards. The Institute for Electrical and Electronics Engineers (IEEE) is responsible for
publication of many conference proceedings and journals in communications: major conferences
include IEEE Global Telecommunications Conference (Globecom), IEEE International Com-
munications Conference (ICC), major journals and magazines include IEEE Communications
Magazine, IEEE Transactions on Communications, IEEE Journal on Selected Areas in Commu-
nications. Closely related fields such as information theory and signal processing have their own
conferences, journals and magazines. Major conferences include the IEEE International Sympo-
sium on Information Theory (ISIT) and IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), journals include the IEEE Transactions on Information Theory and
the IEEE Transactions on Signal Processing. The IEEE also publishes a number of standards
online, such as the IEEE 802 family of standards for local area networks.
A useful resource for learning source coding and data compression, which are not discussed in
this text, is the textbook by Sayood [13]. Textbooks on core concepts in communication networks
include Bertsekas and Gallager [14], Kumar, Manjunath and Kuri [15], and Walrand and Varaiya
[16].
26
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 2
A communication link involves several stages of signal manipulation: the transmitter transforms
the message into a signal that can be sent over a communication channel; the channel distorts
the signal and adds noise to it; and the receiver processes the noisy received signal to extract
the message. Thus, communication systems design must be based on a sound understanding of
signals, and the systems that shape them. In this chapter, we discuss concepts and terminology
from signals and systems, with a focus on how we plan to apply them in our discussion of
communication systems. Much of this chapter is a review of concepts with which the reader
might already be familiar from prior exposure to signals and systems. However, special attention
should be paid to the discussion of baseband and passband signals and systems (Sections 2.7
and 2.8). This material, which is crucial for our purpose, is typically not emphasized in a first
course on signals and systems. Additional material on the geometric relationship between signals
is covered in later chapters, when we discuss digital communication.
Chapter Plan: After a review of complex numbers and complex arithmetic in Section 2.1, we
provide some examples of useful signals in Section 2.2. We then discuss LTI systems and convolu-
tion in Section 2.3. This is followed by Fourier series (Section 2.4) and Fourier transform (Section
2.5). These sections (Sections 2.1 through Section 2.5) correspond to a review of material that
is part of the assumed background for the core content of this textbook. However, even readers
familiar with the material are encouraged to skim through it quickly in order to gain familiarity
with the notation. This gets us to the point where we can classify signals and systems based
on the frequency band they occupy. Specifically, we discuss baseband and passband signals and
systems in Sections 2.7 and 2.8. Messages are typically baseband, while signals sent over channels
(especially radio channels) are typically passband. We discuss methods for going from baseband
to passband and back. We specifically emphasize the fact that a real-valued passband signal is
equivalent (in a mathematically convenient and physically meaningful sense) to a complex-valued
baseband signal, called the complex baseband representation, or complex envelope, of the pass-
band signal. We note that the information carried by a passband signal resides in its complex
envelope, so that modulation (or the process of encoding messages in waveforms that can be
sent over physical channels) consists of mapping information into a complex envelope, and then
converting this complex envelope into a passband signal. We discuss the physical significance
of the rectangular form of the complex envelope, which corresponds to the in-phase (I) and
quadrature (Q) components of the passband signal, and that of the polar form of the complex
envelope, which corresponds to the envelope and phase of the passband signal. We conclude by
discussing the role of complex baseband in transceiver implementations, and by illustrating its
use for wireless channel modeling.
Software: The software labs in this chapter introduce the use of Matlab for signal processing.
They provide practice in writing Matlab code from scratch (i.e., without using prepackaged
routines or Simulink) for simple computations. Software Lab 2.0 is an introduction to the use of
27
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Matlab for typical operations of interest to us, and illustrates how we approximate continuous
time operations in discrete time. Software Lab 2.1 shows how to model and undo the effects of
carrier phase offsets in complex baseband. Software Lab 2.2 develops complex baseband models
for wireless multipath channels, and explores the phenomenon of signal fading due to constructive
and destructive interference between the paths.
Im(z)
y (x,y)
r
θ
Re(z)
x
√
A complex number z can be written as z = x+jy, where x and y are real numbers, and j = −1.
We say that x = Re(z) is the real part of z and y = Im(z) is the imaginary part of z. As depicted
in Figure 2.1, it is often advantageous to interpret the complex number z as a two-dimensional
real vector, which can be represented in rectangular form as (x, y) = (Re(z), Im(z)), or in polar
form (r, θ) as p
r = |z| = x2 + y 2
(2.1)
θ = z = tan−1 xy
We can go back from polar form to rectangular form as follows:
x = r cos θ, y = r sin θ (2.2)
28
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
29
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 2.1.1 (Computations with complex numbers) Consider the complex numbers
z1 = 1 + j and z2 = 2e−jπ/6 . Find z1 + z2 , z1 z2 , and z1 /z2 . Also specify z1∗ , z2∗ .
For complex addition, it is convenient to express both numbers in rectangular form. Thus,
√
z2 = 2 (cos(−π/6) + j sin(−π/6)) = 3 − j
and √ √
z1 + z2 = (1 + j) + ( 3 − j) = 3 + 1
For complex multiplication
√ jπ/4 and division, it is convenient to express both numbers in polar form.
We obtain z1 = 2e by applying (2.1). Now, from (2.11), we have
√ √ √
z1 z2 = 2ejπ/4 2e−jπ/6 = 2 2ej(π/4−π/6) = 2 2ejπ/12
Similarly, √
2ejπ/4 1 1
z1 /z2 = −jπ/6
= √ ej(π/4+π/6) = √ ej5π/12
2e 2 2
Multiplication using the rectangular forms of the complex numbers yields the following:
√ √ √ √ √
z1 z2 = (1 + j)( 3 − j) = 3 − j + 3j + 1 = 3+1 +j 3−1
√ √
Note that z1∗ = 1 − j = 2e−jπ/4 and z2∗ = 2ejπ/6 = 3 + j. Division using rectangular forms
gives √ √
∗ 2
√ 2 3−1 3+1
z1 /z2 = z1 z2 /|z2 | = (1 + j)( 3 + j)/2 = +j
4 4
But
ej(θ1 +θ2 ) = ejθ1 ejθ2 = (cos θ1 + j sin θ1 ) (cos θ2 + j sin θ2 )
= (cos θ1 cos θ2 − sin θ1 sin θ2 ) + j (cos θ1 sin θ2 + sin θ1 cos θ2 )
Taking the real part, we can read off the identity
2.2 Signals
Signal: A signal s(t) is a function of time (or some other independent variable, such as fre-
quency, or spatial coordinates) which has an interesting physical interpretation. For example, it
is generated by a transmitter, or processed by a receiver. While physically realizable signals such
as those sent over a wire or over the air must take real values, we shall see that it is extremely
30
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
useful (and physically meaningful) to consider a pair of real-valued signals, interpreted as the
real and imaginary parts of a complex-valued signal. Thus, in general, we allow signals to take
complex values.
Discrete versus Continuous Time: We generically use the notation x(t) to denote continuous
time signals (t taking real values), and x[n] to denote discrete time signals (n taking integer
values). A continuous time signal x(t) sampled at rate Ts produces discrete time samples x(nTs +
t0 ) (t0 an arbitrary offset), which we often denote as a discrete time signal x[n]. While signals
sent over a physical communication channel are inherently continuous time, implementations at
both the transmitter and receiver make heavy use of discrete time implementations on digitized
samples corresponding to the analog continuous time waveforms of interest.
We now introduce some signals that recur often in this text.
Sinusoid: This is a periodic function of time of the form
s(t) = A cos(2πf0 t + θ) (2.20)
where A > 0 is the amplitude, f0 is the frequency, and θ ∈ [0, 2π] is the phase. By setting θ = 0,
we obtain a pure cosine A cos 2πfc t, and by setting θ = − π2 , we obtain a pure sine A sin 2πfc t.
In general, using (2.18), we can rewrite (2.20) as
s(t) = Ac cos 2πf0 t − As sin 2πf0 t (2.21)
where Ac = A cos θ and As = A sin θ are real numbers. Using Euler’s formula, we can write
Aejθ = Ac + jAs (2.22)
Thus, the parameters of a sinusoid at frequency f0 can be represented by the complex number in
(2.22), with
p (2.20) using the polar form, and (2.21) the rectangular form, of this number. Note
that A = A2c + A2s and θ = tan−1 A s
Ac
.
Clearly, sinusoids with known amplitude, phase and frequency are perfectly predictable, and
hence cannot carry any information. As we shall see, information can be transmitted by making
the complex number Aejθ = Ac + jAs associated with the parameters of sinusoid vary in a way
that depends on the message to be conveyed. Of course, once this is done, the resulting signal
will no longer be a pure sinusoid, and part of the work of the communication system designer is
to decide what shape such a signal should take in the frequency domain.
We now define complex exponentials, which play a key role in understanding signals and systems
in the frequency domain.
Complex exponential: A complex exponential at a frequency f0 is defined as
so that real-valued sinusoids are “contained in” complex exponentials. Third, as we shall soon
see, the set of complex exponentials {ej2πf t }, where f takes values in (−∞, ∞), form a “basis”
for a large class of signals (basically, for all signals that are of interest to us), and the Fourier
transform of a signal is simply its expansion with respect to this basis. Such observations are
31
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1/a
1/a
t t
−a/2 a/2 −a a
Figure 2.2: The impulse function may be viewed as a limit of tall thin pulses (a → 0 in the
examples shown in the figure).
Unit area
p(t)
s(t)
t
t0
t 0− a1 t0+ a 2
Figure 2.3: Multiplying a signal with a tall thin pulse to select its value at t0 .
key to why complex exponentials play such an important role in signals and systems in general,
and in communication systems in particular.
The Delta, or Impulse, Function: Another signal that plays a crucial role in signals and sys-
tems is the delta function, or the unit impulse, which we denote by δ(t). Physically, we can think
of it as a narrow, tall pulse with unit area: examples are shown in Figure 2.2. Mathematically,
we can think of it as a limit of such pulses as the pulse width shrinks (and hence the pulse height
goes to infinity). Such a limit is not physically realizable, but it serves a very useful purpose in
terms of understanding the structure of physically realizable signals. That is, consider a signal
s(t) that varies smoothly, and multiply it with a tall, thin pulse of unit area, centered at time
t0 , as shown in Figure 2.3. If we now integrate the product, we obtain
Z ∞ Z t0 +a2 Z t0 +a1
s(t)p(t)dt = s(t)p(t)dt ≈ s(t0 ) p(t)dt = s(t0 )
−∞ t0 −a1 t0 −a1
That is, the preceding operation “selects” the value of the signal at time t0 . Taking the limit of
the tall thin pulse as its width a1 + a2 → 0, we get a translated version of the delta function,
namely, δ(t − t0 ). Note that the exact shape of the pulse does not matter in the preceding
argument. The delta function is therefore defined by means of the following sifting property: for
any “smooth” function s(t), we have
Z ∞
s(t)δ(t − t0 )dt = s(t0 ) Sifting property of the impulse (2.24)
−∞
Thus, the delta function is defined mathematically by the way it acts on other signals, rather
than as a signal by itself. However, it is also important to keep in mind its intuitive interpretation
as (the limit of) a tall, thin, pulse of unit area.
32
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The following function is useful for expressing signals compactly.
Indicator function: We use IA to denote the indicator function of a set A, defined as
1, x ∈ A
IA (x) =
0, otherwise
The indicator function of an interval is a rectangular pulse, as shown in Figure 2.4.
I (x)
[a,b]
x
a b
v(t)
u(t)
3
2
1
2
t 1 t
−1 1 −1
−1
Figure 2.5: The functions u(t) = 2(1 − |t|)I[−1,1] (t) and v(t) = 3I[−1,0] (t) + I[0,1] (t) − I[1,2] (t) can
be written compactly in terms of indicator functions.
The indicator function can also be used to compactly express more complex signals, as shown in
the examples in Figure 2.5.
Sinc function: The sinc function, plotted in Figure 2.6, is defined as
sin(πx)
sinc(x) =
πx
where the value at x = 0 is defined as the limit as x → 0 to be sinc(0) = 1. Since | sin(πx)| ≤ 1,
1
we have that |sinc(x)| ≤ π|x| , with equality if and only if x is an odd multiple of 1/2. That is,
1
the sinc function exhibits a sinusoidal variation, with an envelope that decays as |x| .
The analogy between signals and vectors: Even though signals can be complicated functions
of time that live in an infinite-dimensional space, the mathematics for manipulating them are
very similar to those for manipulating finite-dimensional vectors, with sums replaced by integrals.
A key building block of communication theory is the relative geometry of the signals used, which
is governed by the inner products between signals. Inner products for continuous-time signals can
be defined in a manner exactly analogous to the corresponding definitions in finite-dimensional
vector space.
Inner Product: The inner product for two m × 1 complex vectors s = (s[1], ..., s[m])T and
r = (r[1], ..., r[m])T is given by
Xm
hs, ri = s[i]r ∗ [i] = rH s (2.25)
i=1
33
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
0.8
0.6
0.4
sinc(x)
0.2
−0.2
−0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
Similarly, we define the inner product of two (possibly complex-valued) signals s(t) and r(t) as
follows: Z ∞
hs, ri = s(t)r ∗ (t) dt (2.26)
−∞
where a1 , a2 are complex-valued constants, and s, s1 , s2 , r, r1 , r2 are signals (or vectors). The
complex conjugation when we pull out constants from the second argument of the inner product
is something that we need to maintain awareness of when computing inner products for complex-
valued signals.
Energy and Norm: The energy Es of a signal s is defined as its inner product with itself:
Z ∞
2
Es = ||s|| = hs, si = |s(t)|2 dt (2.27)
−∞
where ||s|| denotes the norm of s. If the energy of s is zero, then s must be zero “almost
everywhere” (e.g., s(t) cannot be nonzero over any interval, no matter how small its length).
For continuous-time signals, we take this to be equivalent to being zero everywhere. With this
understanding, ||s|| = 0 implies that s is zero, which is a property that is true for norms in
finite-dimensional vector spaces.
Example 2.2.1 (Energy computations) Consider s(t) = 2I[0,T ] + jI[T /2,2T ] . Writing it out in
more detail, we have
2, 0 ≤ t < T /2
s(t) = 2 + j, T /2 ≤ t < T
j, T ≤ t < 2T
34
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
As another example, consider s(t) = e−3|t|+j2πt , for which the energy is given by
Z ∞ Z ∞ Z ∞
2 −3|t|+j2πt 2 −6|t|
||s|| = |e | dt = e dt = 2 e−6t dt = 1/3
−∞ −∞ 0
Note that the complex phase term j2πt does not affect the energy, since it goes away when we
take the magnitude.
Power: The power of a signal s(t) is defined as the time average of its energy computed over a
large time interval:
Z To
1 2
Ps = lim |s(t)|2 dt (2.28)
To →∞ To − To
2
That is, we compute the time average over an observation interval of length To , and then let
the observation interval get large. We can now rewrite the power computation in (2.28) in this
notation as follows.
Power: The power of a signal s(t) is defined as
Ps = |s(t)|2 (2.30)
Thus,
A
Z Z
| s(t)dt| = | s(t)dt| ≤ ℓ maxt |s(t)| = Aℓ <
I Ir f0
35
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
K/f 0
1/f 0 Interval Ir (length l )
[ ]
Interval I
Figure 2.7: The interval I for computing the time average of a periodic function with period
1/f0 can be decomposed into an integer number K of periods, with the remaining interval Ir of
length ℓ < f10 .
Essentially the same argument implies that, in general, the time average of a periodic signal
equals the average over a single period. We use this fact without further comment henceforth.
Power and DC value of a sinusoid: For a real-valued sinusoid s(t) = A cos(2πf0 t + θ), we
can use the results derived for complex exponentials above. Using Euler’s identity, a real-valued
sinusoid at f0 is a sum of complex exponentials at ±f0 :
A j(2πf0 t+θ) A −j(2πf0 t+θ)
s(t) = e + e
2 2
Since each complex exponential has zero DC value, we obtain
s=0
A2 A2 A2
Ps = s2 (t) = A2 cos2 (2πf0 t + θ) = + cos(4πf0 t + 2θ) =
2 2 2
since the DC value of the sinusoid at 2f0 is zero.
36
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Our primary focus here is on linear time invariant (LTI) systems, which provide good models
for filters at the transmitter and receiver, as well as for the distortion induced by a variety of
channels. We shall see that the input-output relationship is particularly easy to characterize for
such systems.
Linear system: Let x1 (t) and x2 (t) denote arbitrary input signals, and let y1 (t) and y2 (t)
denote the corresponding system outputs, respectively. Then, for arbitrary scalars a1 and a2 , the
response of the system to input a1 x1 (t) + a2 x2 (t) is a1 y1 (t) + a2 y2 (t).
Time invariant system: Let y(t) denote the system response to an input x(t). Then the
system response to a time-shifted version of the input, x1 (t) = x(t − t0 ) is y1 (t) = y(t − t0 ). That
is, a time shift in the input causes an identical time shift in the output.
Example 2.3.1 Examples of linear systems It can (and should) be checked that the following
systems are linear. These examples show that linear systems may or may not be time invariant.
Example 2.3.2 Examples of time invariant systems It can (and should) be checked that
the following systems are time invariant. These examples show that time invariant systems may
or may not be linear.
y(t) = e2x(t−1) nonlinear
Z t
y(t) = x(τ )e−(t−τ ) dτ linear
−∞
Z t+1
y(t) = x2 (τ )dτ nonlinear
t−1
Linear time invariant system: A linear time invariant (LTI) system is (unsurprisingly) defined
to be a system which is both linear and time invariant. What is surprising, however, is how
powerful the LTI property is in terms of dictating what the input-output relationship must look
like. Specifically, if we know the impulse response of an LTI system (i.e., the output signal
when the input signal is the delta function), then we can compute the system response for any
input signal. Before deriving and stating this result, we illustrate the LTI property using an
example; see Figure 2.8. Suppose that the response of an LTI system to the rectangular pulse
p1 (t) = I[− 1 , 1 ] (t) is given by the trapezoidal waveform h1 (t). We can now compute the system
2 2
response to any linear combination of time shifts of the pulse p(t), as illustrated by the example
in the figure. More generally, P using the LTI property,
P we infer that the response to an input
signal of the form x(t) = i ai p1 (t − ti ) is y(t) = i ai h1 (t − ti ).
Can we extend the preceding idea to compute the system response to arbitrary input signals?
The answer is yes: if we know the system response to thinner and thinner pulses, then we
can approximate arbitrary signals better and better using linear combinations of shifts of these
pulses. Consider p∆ (t) = ∆1 I[− ∆ , ∆ ] (t), where ∆ > 0 is getting smaller and smaller. Note that we
2 2
have normalized the area of the pulse to unity, so that the limit of p∆ (t) as ∆ → 0 is the delta
37
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
h 1(t)
p (t)
1
1 1
S
t
−0.5 0.5 t 0 1 2 3
2h1 (t)
2
x(t) = 2 p1 (t) − p (t−1)
1 y(t)
2 2
0 3
=
S + 3 4
1.5 t
t 0 1 2
−0.5
−h1 (t−1)
−1 −1
1 4
−1
Figure 2.8: Given that the response of an LTI system S to the pulse p1 (t) is h1 (t), we can use the
LTI property to infer that the response to x(t) = 2p1 (t) − p1 (t − 1) is y(t) = 2h1 (t) − h1 (t − 1).
x(t)
... ...
t
Figure 2.9: A smooth signal can be approximated as a linear combination of shifts of tall thin
pulses.
38
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
function. Figure 2.9 shows how to approximate a smooth input signal as a linear combination of
shifts of p∆ (t). That is, for ∆ small, we have
∞
X
x(t) ≈ x∆ (t) = x(k∆)∆p∆ (t − k∆) (2.31)
k=−∞
If the system response to p∆ (t) is h∆ (t), then we can use the LTI property to compute the
response y∆ (t) to x∆ (t), and use this to approximate the response y(t) to the input x(t), as
follows: ∞
X
y(t) ≈ y∆ (t) = x(k∆)∆h∆ (t − k∆) (2.32)
k=−∞
As ∆ → 0, the sums above tend to integrals, and the pulse p∆ (t) tends to the delta function δ(t).
The approximation to the input signal in equation (2.31) becomes exact, with the sum tending
to an integral: Z ∞
lim x∆ (t) = x(t) = x(τ )δ(t − τ )dτ
∆→0 −∞
replacing the discrete time shifts k∆ by the continuous variable τ , the discrete increment ∆ by
the infinitesimal dτ , and the sum by an integral. This is just a restatement of the sifting property
of the impulse. That is, an arbitrary input signal can be expressed as a linear combination of
time-shifted versions of the delta function, where we now consider a continuum of time shifts.
In similar fashion, the approximation to the output signal in (2.32) becomes exact, with the sum
reducing to the following convolution integral:
Z ∞
lim y∆ (t) = y(t) = x(τ )h(t − τ )dτ (2.33)
∆→0 −∞
Note that τ is a dummy variable that is integrated out in order to determine the value of the
signal v(t) at each possible time t. The role of u1 and u2 in the integral can be exchanged. This
can be proved using a change of variables, replacing t − τ by τ . We often drop the time variable,
and write v = u1 ∗ u2 = u2 ∗ u1 .
An LTI system is completely characterized by its impulse response: As derived in
(2.33), the output y of an LTI system is the convolution of the input signal u and the system
impulse response h. That is, y = u ∗ h. From (2.34), we realize that the role of the signal and the
system can be exchanged: that is, we would get the same output y if a signal h is sent through
a system with impulse response u.
Flip and slide: Consider the expression for the convolution in (2.34):
Z ∞
v(t) = u1 (τ )u2 (t − τ ) dτ
−∞
Fix a value of time t at which we wish to evaluate v. In order to compute v(t), we must multiply
two functions of a “dummy variable” τ and then integrate over τ . In particular, s2 (τ ) = u2 (−τ )
is the signal u2 (τ ) flipped around the origin, so that u2 (t − τ ) = u2 (−(τ − t)) = s2 (τ − t) is
s2 (τ ) translated to the right by t (if t < 0, translation to the right by t actually corresponds to
39
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
translation is to the left by |t|). In short, the mechanics of computing the convolution involves
flipping and sliding one of the signals, multiplying with the other signal, and integrating. Pictures
are extremely helpful when doing such computations by hand, as illustrated by the following
example.
u1 (τ ) u 2( τ ) Flip u 2 (−τ )
τ τ τ
5 11 1 3 −3 −1
u 2 (t−τ )
(a) t−1 < 5
τ
t−3 t−1
u 2 (t−τ )
(b) t−3 < 5, t−1 > 5
Slide by t
τ
t−3 t−1 Different ranges of t
u 2 (t−τ ) depicted in (a)−(e)
(c) t−3 > 5, t−1 < 11
τ
t−3 t−1
u 2 (t−τ )
(d) t−3 < 11, t−1 > 11
τ
t−3 t−1
u 2 (t−τ )
(e) t−3 > 11
τ
t−3 t−1
Figure 2.10: Illustrating the flip and slide operation for the convolution of two rectangular pulses.
v(t)
6 8 12 14 t
Figure 2.11: The convolution of the two rectangular pulses in Example 2.3.3 results in a trape-
zoidal pulse.
Example 2.3.3 Convolving rectangular pulses: Consider the rectangular pulses u1 (t) =
I[5,11] (t) and u2 (t) = I[1,3] (t). We wish to compute the convolution
Z ∞
v(t) = (u1 ∗ u2 )(t) = u1 (τ )u2 (t − τ )dτ
−∞
We now draw pictures of the signals involved in these “flip and slide” computations in order to
figure out the limits of integration for different ranges of t. Figure 2.10 shows that there are five
different ranges of interest, and yields the following result:
(a) For t < 6, u1 (τ )u2 (t − τ ) ≡ 0, so that v(t) = 0.
(b) For 6 < t < 8, u1 (τ )u2 (t − τ ) = 1 for 5 < τ < t − 1, so that
Z t−1
v(t) = dτ = t − 6
5
40
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(c) For 8 < t < 12, u1 (τ )u2 (t − τ ) = 1 for t − 3 < τ < t − 1, so that
Z t−1
v(t) = dτ = 2
t−3
(d) For 12 < t < 14, u1 (τ )u2 (t − τ ) = 1 for t − 3 < τ < 11, so that
Z 11
v(t) = dτ = 11 − (t − 3) = 14 − t
t−3
1 1 a
* = −(b+a)/2 (b+a)/2
−a/2 a/2 −b/2 b/2 −(b−a)/2 (b−a)/2
1 1 a
* =
−a/2 a/2 −a/2 a/2 −a a
Figure 2.12: Convolution of two rectangular pulses as a function of pulse durations. The trape-
zoidal pulse reduces to a triangular pulse for equal pulse durations.
It is useful to record the general form of the convolution between two rectangular pulses of the
form I[−a/2,a/2] (t) and I[−b/2,b/2] (t), where we take a ≤ b without loss of generality. The result is
a trapezoidal pulse, which reduces to a triangular pulse for a = b, as shown in Figure 2.12. Once
we know this, using the LTI property, we can infer the convolution of any signals which can be
expressed as a linear combination of shifts of rectangular pulses.
Occasional notational sloppiness can be useful: As the preceding example shows, a con-
volution computation as in (2.34) requires a careful distinction between the variable t at which
the convolution is being evaluated, and the dummy variable τ . This is why we make sure that
the dummy variable does not appear in our notation (s ∗ r)(t) for the convolution between sig-
nals s(t) and r(t). However, it is sometimes convenient to abuse notation and use the notation
s(t) ∗ r(t) instead, as long we remain aware of what we are doing. For example, this enables us
to compactly state the following linear time invariance (LTI) property:
(a1 s1 (t − t1 ) + a2 s2 (t − t2 )) ∗ r(t) = a1 (s1 ∗ r)(t − t1 ) + a2 (s2 ∗ r)(t − t2 )
for any complex gains a1 and a2 , and any time offsets t1 and t2 .
Example 2.3.4 (Modeling a multipath channel) We can get a delayed version of a signal
by convolving it with a delayed impulse as follows:
y1 (t) = u(t) ∗ δ(t − t1 ) = u(t − t1 ) (2.35)
41
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
To see this, compute
Z Z
y1 (t) = u(τ )δ(t − τ − t1 )dτ = u(τ )δ(τ − (t − t1 ))dτ = u(t − t1 )
where we first use the fact that the delta function is even, and then use its sifting property.
Reflector
LOS
TX antenna path
RX antenna
Reflector
Figure 2.13: Multipath channels typical of wireless communication can include line of sight (LOS)
and reflected paths.
Equation (2.35) immediately tells us how to model multipath channels, in which multiple scat-
tered versions of a transmitted signal u(t) combine to give a received signal y(t) which is a
superposition of delayed versions of the transmitted signal, as illustrated in Figure 2.13:
(plus noise, which we have not talked about yet). From (2.35), we see that we can write
y(t) = α1 u(t) ∗ δ(t − τ1 ) + ... + αm u(t) ∗ δ(t − τm ) = u(t) ∗ (α1 δ(t − τ1 ) + ... + αm δ(t − τm ))
That is, we can model the received signal as a convolution of the transmitted signal with a
channel impulse response which is a linear combination of time-shifted impulses:
Figure 2.14 illustrates how a rectangular pulse spreads as it goes through a multipath channel
with impulse response h(t) = δ(t − 1) − 0.5δ(t − 1.5) + 0.5δ(t − 3.5). While the gains {αk } in this
example are real-valued, as we shall soon see (in Section 2.8), we need to allow both the signal
u(t) and the gains {αk } to take complex values in order to model, for example, signals carrying
information over radio channels.
42
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Complex exponential through an LTI system: In order to understand LTI systems in the
frequency domain, let us consider what happens to a complex exponential u(t) = ej2πf0 t when it
goes through an LTI system with impulse response h(t). The output is given by
R∞
y(t) = (u ∗ h)(t) = −∞ h(τ )ej2πf0 (t−τ ) dτ
R∞ (2.37)
= ej2πf0 t −∞ h(τ )e−j2πf0 τ dτ = H(f0 )ej2πf0 t
where Z ∞
H(f0 ) = h(τ )e−j2πf0 τ dτ
−∞
is the Fourier transform of h evaluated at the frequency f0 . We discuss the Fourier transform
and its properties in more detail shortly.
Complex exponentials are eigenfunctions of LTI systems: Recall that an eigenvector of
a matrix H is any vector x that satisfies Hx = λx. That is, the matrix leaves its eigenvectors
unchanged except for a scale factor λ, which is the eigenvalue associated with that eigenvector.
In an entirely analogous fashion, we see that the complex exponential signal ej2πf0 t is an eigen-
function of the LTI system with impulse response h, with eigenvalue H(f0). Since we have not
constrained h, we conclude that complex exponentials are eigenfunctions of any LTI system. We
shall soon see, when we discuss Fourier transforms, that this eigenfunction property allows us
to characterize LTI systems in the frequency domain, which in turn enables powerful frequency
domain design and analysis tools.
Matlab implements this using the “conv” function. This can be interpreted as u1 being the input
to a system with impulse response u2 , where a discrete time impulse is simply a one, followed by
all zeros.
Continuous time convolution between u1 (t) and u2 (t) can be approximated using discrete time
convolutions between the corresponding sampled signals. For example, for samples at rate 1/Ts ,
the infinitesimal dt is replaced by the sampling interval Ts as follows:
Z X
y(t) = (u1 ∗ u2 )(t) = u1 (τ )u2 (t − τ )dτ ≈ u1 (kTs )u2 (t − kTs )Ts
k
43
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Letting x[n] = x(nTs ) denote the discrete time waveform corresponding to the nth sample for
each of the preceding waveforms, we have
X
y(nTs ) = y[n] ≈ Ts u1 [k]u2 [n − k] = Ts (u1 ∗ u2 )[n] (2.39)
k
which shows us how to implement continuous time convolution using discrete time operations.
1
u1
0.9 u2
y
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−2 0 2 4 6 8 10 12
Figure 2.16: Two signals and their continuous time convolution, computed in discrete time using
Code Fragment 2.3.1.
The following Matlab code provides an example of a continuous time convolution approximated
numerically using discrete time convolution, and then plotted against the original continuous
time index t, as shown in Figure 2.16 (cosmetic touches not included in the code below). The
two waveforms convolved are u1 (t) = t2 I[−1,1] (t) and u2 (t) = e−(t+1) I[−1,∞) (the latter is truncated
in our discrete time implementation).
44
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
%%PLOT u1, u2 and y
plot(t1,u1,’r-.’);
hold on;
plot(t2,u2,’r:’);
plot(time_axis,y);
legend(’u1’,’u2’,’y’,’Location’,’NorthEast’);
hold off;
where 1/T is the rate at which symbols are generated (termed the symbol rate). In order to
represent the analog pulse p(t) as discrete time samples, we may sample it at rate 1/Ts , typically
chosen to be an integer multiple of the symbol rate, so that T = mTs , where m is a positive
integer. Typical values employed in transmitter DSP modules might be m = 4 or m = 8. Thus,
the system we are interested is multi-rate: waveforms are sampled at rate 1/Ts = m/T , but
the input is at rate 1/T . Set u[k] = u(kTs ) and p[k] = p(kTs ) as the discrete time signals
corresponding to samples of the transmitted waveform u(t) and the pulse p(t), respectively. We
can write the sampled version of (2.40) as
X X
u[k] = b[n]p(kTs − nT ) = b[n]p[k − nm] (2.41)
n n
The preceding almost has the form of a discrete time convolution, but the key difference is
that the successive symbols {b[n]} are spaced by time T , which corresponds to m > 1 samples
at the sampling rate 1/Ts . Thus, in order to implement this system using convolution at rate
1/Ts , we must space out the input symbols by inserting m − 1 zeros between successive symbols
b[n], thus converting a rate 1/T signal to a rate 1/Ts = m/T signal. This process is termed
upsampling. While the upsampling function is available in certain Matlab toolboxes, we provide
a self-contained code fragment below that illustrates its use for digital modulation, and plots
the waveform obtained for symbol sequence −1, +1, +1, −1. The modulating pulse is a sine
pulse: p(t) = sin(πt/T )I[0,T ] , and our convention is to set T = 1 without loss of generality
(or, equivalently, to replace t by t/T ). We set the oversampling factor M = 16 in order to
obtain smooth plots, even though typical implementations in communication transmitters may
use smaller values.
45
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
0.8
0.6
0.4
0.2
u(t)
0
−0.2
−0.4
−0.6
−0.8
−1
0 0.5 1 1.5 2 2.5 3 3.5 4
t/T
Figure 2.17: Digitally modulated waveform obtained using Code Fragment 2.3.2.
%UPSAMPLE BY m
nsymbols = length(symbols);%length of original symbol sequence
nsymbols_upsampled = 1+(nsymbols-1)*m;%length of upsampled symbol sequence
symbols_upsampled = zeros(nsymbols_upsampled,1);%
symbols_upsampled(1:m:nsymbols_upsampled)=symbols;%insert symbols with spacing M
%GENERATE MODULATED SIGNAL BY DISCRETE TIME CONVOLUTION
u=conv(symbols_upsampled,p);
%PLOT MODULATED SIGNAL
time_u = 0:1/m:(length(u)-1)/m; %unit of time = symbol time T
plot(time_u,u);
xlabel(’t/T’);
whose frequencies are integer multiples of the fundamental frequency f0 . That is, we can write
∞
X ∞
X
u(t) = un ψn (t) = un ej2πnf0 t (2.42)
n=−∞ n=−∞
46
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The coefficients {un } are in general complex-valued, and are called the Fourier series for u(t).
They can be computed as follows:
1
Z
uk = u(t)e−j2πkf0 t dt (2.43)
T0 T0
R
where T0 denotes an integral over any interval of length T0 .
Let us now derive (2.43). For m a nonzero integer, consider an arbitrary interval of length T0 , of
the form [D, D + T0 ], where the offset D is free to take on any real value. Then, for any nonzero
integer m 6= 0, we have
R D+T0 j2πmf t D+T0
ej2πmf0 t
D
e 0
dt = j2πmf0
D
(2.44)
ej2πf0 mD −ej(2πmf0 D+2πm)
= j2πmf0
=0
since ej2πm = 1. Thus, when we multiply both sides of (2.42) by e−j2πkf0 t and integrate over a
period, all terms corresponding to n 6= k drop out by virtue of (2.44), and we are left only with
the n = k term:
R D+T0 −j2πkf0 t
R D+T0 P∞ j2πnf0 t
−j2πkf t
D
u(t)e dt = D n=−∞ u n e e 0
dt
R D+T0 R D+T0
ej2πkf0 t e−j2πkf0 t dt + ej2π(n−k)f0 t dt = uk T0 + 0
P
= uk D n6=k un D
The energy over a period for a signal u is given by ||u||2T0 = hu, uiT0 , where ||u||T0 denotes the
norm computed over a period. We have assumed that the Fourier basis {ψn (t)} spans this vector
space, and have computed the Fourier series after showing that the basis is orthogonal:
hψn , ψm iT0 = 0 , n 6= m
47
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
That is,
hu, ψk iT0 hu, ψk iT0
uk = 2
= (2.46)
||ψk || T0
In general, the Fourier series of an arbitrary periodic signal may have an infinite number of terms.
In practice, one might truncate the Fourier series at a finite number of terms, with the number
of terms required to provide a good approximation to the signal depending on the nature of the
signal.
T0
A max
... ...
A min
Example 2.4.1 Fourier series of a square wave: Consider the periodic waveform u(t) as
shown in Figure 2.18. For k = 0, we get the DC value u0 = Amax +A
2
min
. For k 6= 0, we have,
using (2.43), that
1
R0 1
R T0
uk = T0 −
T0 Amin e−j2πkt/T0 dt + T0 0
2
Amax e−j2πkt/T0 dt
2
0 T0
Amin e−j2πkt/T0 Amax e−j2πkt/T0 2
= T0 −j2πk/T0 T0
+ T0 −j2πk/T0
− 2 0
For k even, ejπk = e−jπk = 1, which yields uk = 0. That is, there are no even harmonics. For k
odd, ejπk = e−jπk = −1, which yields uk = Amaxjπk
−Amin
. We therefore obtain
0, k even
uk = Amax −Amin
jπk
, k odd
Example 2.4.2 Fourier series of an impulse train: Even though the delta function is not
physically realizable, the Fourier series of an impulse train, as shown in Figure 2.19 turns out to
be extremely useful in theoretical development and in computations. Specifically, consider
∞
X
u(t) = δ(t − nT0 )
n=−∞
48
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
... ...
−T0 0 T0
using the sifting property of the impulse. That is, the delta function has equal frequency content
at all harmonics. This is yet another manifestation of the physical unrealizability of the impulse:
for well-behaved signals, the Fourier series should decay as the frequency increases.
While we have considered signals which are periodic functions of time, the concept of Fourier
series applies to periodic functions in general, whatever the physical interpretation of the argu-
ment of the function. In particular, as we shall see when we discuss the effect of time domain
sampling in the context of digital communication, the time domain samples of a waveform can
be interpreted as the Fourier series for a particular periodic function of frequency.
49
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
This yields the following Fourier series in terms of real-valued sinusoids:
∞
X ∞
X
u(t) = u0 + 2Ak cos(2πkf0 t + φk ) = u0 + 2|uk | cos (2πkf0 t + uk ) (2.47)
k=1 k=1
... ...
A min
d/dt
A max −A min
... T0/2
...
0 T0
... ...
−(A max −A min )
Figure 2.20: The derivative of a square wave is two interleaved impulse trains.
Compared to the impulse train in Example 2.4.2, the first impulse train above is offset by 0,
while the second is offset by T0 /2 (and inverted). We can therefore infer their Fourier series
using the time delay property, and add them up by linearity, to obtain
Amax − Amin Amax − Amin −j2πf0 kT0 /2 Amax − Amin
1 − e−jπk , , k 6= 0
xk = − e =
T0 T0 T0
50
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Using the differentiation property, we can therefore infer that
xk Amax −Amin
uk = j2πf0 k
= −j2πkf0 T0
1 − e−jπk
which gives us the same result as before. Note that the DC term u0 cannot be obtained using
this approach, since it vanishes upon differentiation. But it is easy to compute, since it is just
the average value of u(t), which can be seen to be u0 = (Amax + Amin )/2 by inspection.
In addition to simplifying computation for waveforms which can be described (or approximated)
as polynomial functions of time (so that enough differentiation ultimately reduces them to im-
pulse trains), the differentiation method explicitly reveals how the harmonic structure (i.e., the
strength and location of the harmonics) of a periodic waveform is related to its transitions in
the time domain. Once we understand the harmonic structure, we can shape it by appropriate
filtering. For example, if we wish to generate a sinusoid of frequency 300 MHz using a digital
circuit capable of generating symmetric square waves of frequency 100 MHz, we can choose a
filter to isolate the third harmonic. However, we cannot generate a sinusoid of frequency 200
MHz (unless we make the square wave suitably asymmetric), since the even harmonics do not
exist for a symmetric square wave (i.e., a square wave whose high and low durations are the
same).
Parseval’s identity (periodic inner product/power can be computed in either time
or frequency domain): Using the orthogonality of complex exponentials over a period, it can
be shown that
Z X∞
∗
hu, viT0 = u(t)v (t)dt = T0 uk vk∗ (2.50)
T0 k=−∞
Setting v = u, and dividing both sides by T0 , the preceding specializes to an expression for signal
power (which can be computed for a periodic signal by averaging over a period):
∞
1
Z X
2
|u(t)| dt = |uk |2 (2.51)
T0 T0 k=−∞
The inverse Fourier transform tells us that any finite energy signal can be written as a linear com-
bination of a continuum of complex exponentials, with the coefficients of the linear combination
given by the Fourier transform U(f ).
Notation: We call a signal and its Fourier transform a Fourier transform pair, and denote them
as u(t) ↔ U(f ). We also denote the Fourier transform operation by F , so that U(f ) = F (u(t)).
51
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 2.5.1 Rectangular pulse and sinc function form a Fourier transform pair:
Consider the rectangular pulse u(t) = I[−T /2,T /2] (t) of duration T . Its Fourier transform is given
by
R∞ R T /2
U(f ) = −∞ u(t)e−j2πf t dt = −T /2 e−j2πf t dt
e−j2πf t T /2 e−jπf T −ejπf T
= −j2πf −T /2
= −j2πf
sin(πf T )
= πf
= T sinc(f T )
We denote this as
I[−T /2,T /2] (t) ↔ T sinc(f T )
Duality: Given the similarity of the form of the Fourier transform (2.52) and inverse Fourier
transform (2.53), we can see that the roles of time and frequency can be switched simply by
negating one of the arguments. In particular, suppose that u(t) ↔ U(f ). Define the time
domain signal s(t) = U(t), replacing f by t. Then the Fourier transform of s(t) is given by
S(f ) = u(−f ), replacing t by −f . Since negating the argument corresponds to reflection around
the origin, we can simply switch time and frequency for signals which are symmetric around the
origin. Applying duality to the Example 2.5.1, we infer that a signal that is ideally bandlimited
in frequency corresponds to a sinc function in time:
I[−W/2,W/2] (f ) ↔ W sinc(W t)
Relation to Fourier series: The Fourier transform can be obtained by taking the limit of the
Fourier series as the period gets large, with T0 → ∞ and f0 → 0 (think of an aperiodic signal
as periodic with infinite period). We do not provide details, but sketch the process of taking
this limit: T0 uk tends to U(f ), where f = kf0 , and the Fourier series sum in (2.42) become the
inverse Fourier transform integral in (2.53), with f0 becoming df . Not surprisingly, therefore, the
Fourier transform exhibits properties entirely analogous to those for Fourier series, as we shall
see shortly. However, the Fourier transform applies to a broader class of signals, and we can
take advantage of time-frequency duality more easily, because both time and frequency are now
continuous-valued variables.
Application to infinite energy signals: In engineering applications, we routinely apply the
Fourier and the inverse Fourier transform to infinite energy signals, even though the derivation
of the Fourier transform as the limit of a Fourier series is based on the assumption that the
signal has finite energy. While infinite energy signals are not physically realizable, they are
useful approximations of finite energy signals, often simplifying mathematical manipulations.
For example, instead of considering a sinusoid over a large time interval, we can consider a
sinusoid of infinite duration. As we shall see, this leads to an impulsive function in the frequency
domain. As another example, delta functions in the time domain are useful in modeling the
impulse response of wireless multipath channels. Basically, once we are willing to work with
impulses, we can use the Fourier transform on a very broad class of signals.
Example 2.5.2 The delta function and the constant function form a Fourier trans-
form pair: For u(t) = δ(t), we have
Z ∞
U(f ) = δ(t)e−j2πf t dt = 1
−∞
52
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 2.5.3 Complex exponentials in the time domain correspond to impulses in
frequency domain: Let us show this using the inverse Fourier transform. For a frequency
domain impulse at f0 , U(f ) = δ(f − f0 ), the inverse Fourier transform is given by
Z ∞
u(t) = δ(f − f0 )ej2πf t df = ej2πf0 t
−∞
Once we embrace frequency domain impulses, we can fold Fourier series into Fourier transforms
as follows.
Example 2.5.4 Fourier series expressed in terms of Fourier transforms: We know that
a periodic signal u(t) with period T0 can be written as
∞
X
u(t) = un ej2πnf0 t
n=−∞
where f0 = 1/T0 is the fundamental frequency and {un } are the Fourier series coefficients. Using
Example 2.5.3 to take the Fourier transform of both sides, we obtain
∞
X
U(f ) = un δ(f − nf0 )
n=−∞
Thus, the Fourier transform of a periodic signal is constituted of impulses at the harmonics, with
coefficients given by the Fourier series.
Now that we have seen both the Fourier series and the Fourier transform, it is worth commenting
on the following frequently asked questions.
What do negative frequencies mean? Why do we need them? Consider a real-valued
sinusoid A cos(2πf0 t + θ), where f0 > 0. If we now replace f0 by −f0 , we obtain A cos(−2πf0 t +
θ) = A cos(2πf0 t−θ), using the fact that cosine is an even function. Thus, we do not need negative
frequencies when working with real-valued sinusoids. However, unlike complex exponentials, real-
valued sinusoids are not eigenfunctions of LTI systems: we can pass a cosine through an LTI
system and get a sine, for example. Thus, once we decide to work with a basis formed by complex
exponentials, we do need both positive and negative frequencies in order to describe all signals
of interest. For example, a real-valued sinusoid can be written in terms of complex exponentials
as
A j(2πf0 t+θ) A A
A cos(2πf0 t + θ) = e + e−j(2πf0 t+θ) = ejθ ej2πf0 t + e−jθ e−j2πf0 t
2 2 2
so that we need complex exponentials at both +f0 and −f0 to describe a real-valued sinusoid
at frequency f0 . Of course, the coefficients multiplying these two complex exponentials are not
arbitrary: they are complex conjugates of each other. More generally, as we have already seen,
such conjugate symmetry holds for both Fourier series and Fourier transforms of real-valued
signals. We can therefore state the following:
(a) We do need both positive and negative frequencies to form a complete basis using complex
exponentials;
(b) For real-valued (i.e., physically realizable) signals, the expansion in terms of a complex
exponential basis, whether it is the Fourier series or the Fourier transform, exhibits conjugate
symmetry. Hence, we only need to know the Fourier series or Fourier transform of a real-valued
signal for positive frequencies.
53
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
U(f − f0 ) ↔ u(t)ej2πf0 t
and has Fourier transform given by ū(t) ≡ ū ↔ ūδ(f ). Thus, we can write the overall Fourier
transform as
X(f )
U(f ) = + ūδ(f ) (2.55)
j2πf
We illustrate this via the following example.
Example 2.5.5 (Fourier transform of a step function) Let us use differentiation to com-
pute the Fourier transform of the unit step function
0, t < 0
u(t) =
1, t ≥ 0
54
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Its DC value is given by
ū = 1/2
and its derivative is the delta function (see Figure 2.21):
d
x(t) = u(t) = δ(t) ↔ X(f ) ≡ 1
dt
Applying (2.55), we obtain that the Fourier transform of the unit step function is given by
du/dt
u(t)
1
0 t 0 t
Figure 2.21: The unit step function and its derivative, the delta function.
1 1
U(f ) = + δ(f )
j2πf 2
Next, we discuss the significance of the Fourier transform in understanding the effect of LTI
systems.
Transfer function for an LTI system: The transfer function H(f ) of an LTI system is
defined to be the Fourier transform of its impulse response h(t). That is, H(f ) = F (h(t)). We
now discuss its significance.
From (2.37), we know that, when the input to an LTI system is the complex exponential ej2πf0 t ,
the output is given by H(f0 )ej2πf0 t . From the inverse Fourier transform (2.53), we know that
any input u(t) can be expressed as a linear combination of complex exponentials. Thus, the
corresponding response, which we know is given by y(t) = (u ∗ h)(t) must be a linear combination
of the responses to these complex exponentials. Thus, we have
Z ∞
y(t) = U(f )H(f )ej2πf t df
−∞
We recognize that the preceding function is in the form of an inverse Fourier transform, and
read off Y (f ) = U(f )H(f ). That is, the Fourier transform of the output is simply the product
of the Fourier transform of the input and the system transfer function. This is because complex
55
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
exponentials at different frequencies propagate through an LTI system without mixing with each
other, with a complex exponential at frequency f passing through with a scaling of H(f ).
Of course, we have also derived an expression for y(t) in terms of a convolution of the input
signal with the system impulse response: y(t) = (u ∗ h)(t). We can now infer the following key
property.
Convolution in the time domain corresponds to multiplication in the frequency do-
main
y(t) = (u ∗ h)(t) ↔ Y (f ) = U(f )H(f ) (2.56)
We can also infer the following dual property, either by using duality or by directly deriving it
from first principles.
Multiplication in the time domain corresponds to convolution in the frequency do-
main
y(t) = u(t)v(t) ↔ Y (f ) = (U ∗ V )(f ) (2.57)
LTI system response to real-valued sinusoidal signals: For a sinusoidal input u(t) =
cos(2πf0 t + θ), the response of an LTI system h is given by
y(t) = (u ∗ h)(t) = |H(f0 )| cos (2πf0 t + θ + H(f0 ))
This can be inferred from what we know about the response for complex exponentials, thanks
to Euler’s formula. Specifically, we have
1 j(2πf0 t+θ) 1 1
u(t) = e + e−j(2πf0 t+θ) = ejθ ej2πf0 t + e−jθ e−j2πf0 t
2 2 2
When u goes through an LTI system with transfer function H(f ), the output is given by
1 1
y(t) = ejθ H(f0 )ej2πf0 t + e−jθ H(−f0 )e−j2πf0 t
2 2
If the system is physically realizable, the impulse response h(t) is real-valued, and the transfer
function is conjugate symmetric. Thus, if H(f0 ) = Gejφ (G ≥ 0), then H(−f0 ) = H ∗ (f0 ) =
Ge−jφ . Substituting, we obtain
G j(2πf0 t+θ+φ) G −j(2πf0 t+θ+φ)
y(t) = e + e = G cos(2πf0 t + θ + φ)
2 2
This yields the well-known result that the sinusoid gets scaled by the magnitude of the transfer
function G = |H(f0 )|, and gets phase shifted by the phase of the transfer function φ = H(f0 ).
Example 2.5.6 (Delay spread, coherence bandwidth, and fading for a multipath
channel) The transfer function of a multipath channel as in (2.36) is given by
56
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The first term e−j2πf τ1 corresponds simply to a pure delay τ1 (seen by all frequencies), and can
be dropped (taking τ1 as our time origin, without loss of generality), so that the transfer function
can be rewritten as
Xm
H(f ) = α1 + αk e−j2πf (τk −τ1 ) (2.59)
k=2
The period of the kth sinusoid above (k ≥ 2) is 1/(τk − τ1 ), so that, the smallest period, and
hence the fastest fluctuations as a function of f , occurs because of the largest delay difference
τd = τm − τ1 , which we call the channel delay spread. Thus, for a frequency interval which is
significantly smaller than 1/τd , the variation of |H(f )| over the interval is small. We define the
channel coherence bandwidth as the inverse of the delay spread, i.e., as Bc = 1/(τm − τ1 ) =
1/τd (this definition is not unique, but in general, the coherence bandwidth is defined to be
inversely proportional to some appropriately defined measure of the channel delay spread). As
we have noted, H(f ) can be well modeled as constant over intervals significantly smaller than
the coherence bandwidth.
Let us apply this to the example in Figure 2.14, where we have a multipath channel with impulse
response h(t) = δ(t − 1) − 0.5δ(t − 1.5) + 0.5δ(t − 3.5). Dropping the first delay as before, we
have
H(f ) = 1 − 0.5e−jπf + 0.5e−j5πf
For concreteness, suppose that time is measured in microseconds (typical numbers for an outdoor
wireless cellular link), so that frequency is measured in MHz. The delay spread is 2.5µs, hence
the coherence bandwidth is 400KHz. We therefore ballpark the size of the frequency interval
over which H(f ) can be approximated as constant to about 40KHz (i.e., of size 10% of the
coherence bandwidth). Note that this is a very fuzzy estimate: if the larger delays occur with
smaller relative amplitudes, as is typical, then they have a smaller effect on H(f ), and we could
potentially approximate H(f ) as constant over a larger fraction of the coherence bandwidth.
Figure 2.22 depicts the fluctuations in H(f ) first on a linear scale, and then on a log scale. A
plot of the transfer function magnitude is shown in Figure 2.22(a). This is the amplitude gain on
a linear scale, and shows significant variations as a function of f (while we do not show it here,
zooming in to 40 KHz bands shows relatively small fluctuations). The amount of fluctuation
becomes even more apparent on a log scale. Interpreting the gain at the smallest delay (α1 = 1
in our case) as that of a nominal channel, the fading gain is defined as the power gain relative
to this nominal, and is given by 20 log10 (|H(f )|/|α1|) in decibels (dB). This is shown in Figure
2.22(b). Note that the fading gain can dip below -18 dB in our example, which we term a fade
of depth 18 dB. If we are using a “narrowband” signal which has a bandwidth small compared to
the coherence bandwidth, and happen to get hit by such a fade, then we can expect much poorer
performance than nominal. To combat this, one must use diversity. For example, a “wideband”
signal whose bandwidth is larger than the coherence bandwidth provides frequency diversity,
while, if we are constrained to use narrowband signals, we may need to introduce other forms of
diversity (e.g., antenna diversity as in Software Lab 2.2).
57
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
2 10
1.8
5
1.6
1 −5
0.8
−10
0.6
0.4
−15
0.2
0 −20
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Frequency (MHz) Frequency (MHz)
(a) Transfer Function Magnitude (linear scale) (b) Frequency-selective fading (dB)
Matlab is good at doing DFTs. When N is a power of 2, the DFT can be computed very
efficiently, and this procedure is called a Fast Fourier Transform (FFT). Comparing (2.60) with
the Fourier transform expression
Z ∞
U(f ) = u(t)e−j2πf t dt (2.61)
−∞
we can view the sum in the DFT (2.60) as an approximation for the integral in (2.61) under the
right set of conditions. Let us first assume that u(t) = 0 for t < 0: any waveform which can be
truncated so that most of its energy falls in a finite interval can be shifted so that this is true.
Next, suppose that we sample the waveform with spacing ts to get
u[n] = u(nts )
Now, suppose we want to compute the Fourier transform U(f ) for f = mfs , where fs is the
desired frequency resolution. We can approximate the integral for the Fourier transform by a
sum, using ts -spaced time samples as follows:
Z ∞ X
U(mfs ) = u(t)e−j2πmfs t dt ≈ u(nts )e−j2πmfs nts ts
−∞ n
(dt in the integral is replaced by the sample spacing ts .) Since u[n] = u(nts ), the approximation
can be computed using the DFT formula (2.60) as follows:
U(mfs ) ≈ ts U[m] (2.62)
as long as fs ts = N1 . That is, using a DFT of length N, we can get a frequency granularity of
fs = N1ts . This implies that if we choose the time samples close together (in order to represent
u(t) accurately), then we must also use a large N to get a desired frequency granularity. Often
this means that we must pad the time domain samples with zeros.
Another important observation is that, while the DFT in (2.60) ranges from m = 0, ..., N − 1, it
actually computes the Fourier transform for both positive and negative frequencies. Noting that
ej2πmn/N = ej2π(−N +m)n/N , we realize that the DFT values for m = N/2, ..., N − 1 correspond
to the Fourier transform evaluated at frequencies (m − N)fs = −N/2fs , ..., −fs . The DFT
values for m = 0, ..., N/2 − 1 correspond to the Fourier transform evaluated at frequencies
58
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0, fs , ..., (N/2 − 1)fs . Thus, we should swap the left and right halves of the DFT output in order
to represent positive and negative frequencies, with DC falling in the middle. Matlab actually
has a function, fftshift, that does this.
Note that the DFT (2.60) is periodic with period N, so that the Fourier transform approximation
(2.62) is periodic with period Nfs = t1s . We typically limit the range of frequencies over which
we use the DFT to compute the Fourier transform to the fundamental period (− 2t1s , 2t1s ). This is
consistent with the sampling theorem, which says that the sampling rate 1/ts must be at least as
large as the size of the frequency band of interest. (The sampling theorem is reviewed in Chapter
4, when we discuss digital modulation.)
2 cos πf −jπf
U(f ) = e (2.63)
π(1 − 4f 2 )
Note that U(f ) has a 0/0 form at f = 1/2, but using L’Hospital’s rule, we can show that
U(1/2) 6= 0. Thus, the first zeros of U(f ) are at f = ±3/2. This is a timelimited pulse and
hence cannot be bandlimited, but U(f ) decays as 1/f 2 for f large, so we can capture most of
the energy of the pulse within a suitably chosen finite frequency interval. Let us use the DFT to
compute U(f ) over f ∈ (−8, 8). This means that we set 1/(2ts ) = 8, or ts = 1/16, which yields
about 16 samples over the interval [0, 1] over which the signal u(t) has support. Suppose now
that we want the frequency granularity to be at least fs = 1/160. Then we must use a DFT
with N ≥ ts1fs = 2560 = Nmin . In order to efficiently compute the DFT using the FFT, we
choose N = 4096, the next power of 2 at least as large as Nmin . Code fragment 2.5.1 performs
and plots this DFT. The resulting plot (with cosmetic touches not included in the code below)
is displayed in Figure 2.23. It is useful to compare this with a plot obtained from the analytical
formula (2.63), and we leave that as an exercise.
0.7
0.6
0.5
Magnitude Spectrum
0.4
0.3
0.2
0.1
0
−8 −6 −4 −2 0 2 4 6 8
Frequency
Figure 2.23: Plot of magnitude spectrum of sine pulse in Example 2.5.7 obtained numerically
using the DFT.
59
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
ts=1/16; %sampling interval
time_interval = 0:ts:1; %sampling time instants
%%time domain signal evaluated at sampling instants
signal_timedomain = sin(pi*time_interval); %sinusoidal pulse in our example
fs_desired = 1/160; %desired frequency granularity
Nmin = ceil(1/(fs_desired*ts)); %minimum length DFT for desired frequency granularity
%for efficient computation, choose FFT size to be power of 2
Nfft = 2^(nextpow2(Nmin)) %FFT size = the next power of 2 at least as big as Nmin
%Alternatively, one could also use DFT size equal to the minimum length
%Nfft=Nmin;
%note: fft function in Matlab is just the DFT when Nfft is not a power of 2
%freq domain signal computed using DFT
%fft function of size Nfft automatically zeropads as needed
signal_freqdomain = ts*fft(signal_timedomain,Nfft);
%fftshift function shifts DC to center of spectrum
signal_freqdomain_centered = fftshift(signal_freqdomain);
fs=1/(Nfft*ts); %actual frequency resolution attained
%set of frequencies for which Fourier transform has been computed using DFT
freqs = ((1:Nfft)-1-Nfft/2)*fs;
%plot the magnitude spectrum
plot(freqs,abs(signal_freqdomain_centered));
xlabel(’Frequency’);
ylabel(’Magnitude Spectrum’);
H(f)
∆f
1
Energy E u( f*) ∆ f
u(t)
Meter
f*
Energy Spectral Density: The energy spectral density Eu (f ) of a signal u(t) can be defined
operationally as shown in Figure 2.24. Pass the signal u(t) through an ideal narrowband filter
60
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
with transfer function as follows:
∆f ∆f
1, f ∗ − < f < f∗ +
Hf ∗ (f ) = 2 2
0, else
The energy spectral density Eu (f ∗ ) is defined to be the energy at the output of the filter, divided
by the width ∆f (in the limit as ∆f → 0). That is, the energy at the output of the filter is
approximately Eu (f ∗ )∆f . But the Fourier transform of the filter output is
assuming that U(f ) varies smoothly and ∆f is small enough. We can now infer that the energy
spectral density is simply the magnitude squared of the Fourier transform:
Eu (f ) = |U(f )|2 (2.64)
The integral of the energy spectral density equals the signal energy, which is consistent with
Parseval’s identity.
The inverse Fourier transform of the energy spectral density has a nice intuitive interpretation.
Noting that |U(f )|2 = U(f )U ∗ (f ) and U ∗ (f ) ↔ u∗ (−t), let us define uM F (t) = u∗ (−t) as (the
impulse response of) the matched filter for u(t), where the reasons for this term will be clarified
later. Then
|U(f )|2 = U(f ∗
R
)U (f ) ↔ (u ∗ u M F )(τ ) = u(t)uM F (τ − t)dt
R (2.65)
= u(t)u∗ (t − τ )dt
where t is a dummy variable for the integration, and the convolution is evaluated at the time
variable τ , which denotes the delay between the two versions of u being correlated: the extreme
right-hand side is simply the correlation of u with itself (after complex conjugation), evaluated
at different delays τ . We call this the autocorrelation function of the signal u. We have therefore
shown the following.
For a finite energy signal, the energy spectral density and the autocorrelation function form a
Fourier transform pair.
Bandwidth: The bandwidth of a signal u(t) is loosely defined to be the size of the band
of frequencies occupied by U(f ). The definition is “loose” because the concept of occupancy
can vary, depending on the application, since signals are seldom strictly bandlimited. One
possibility is to consider the band over which |U(f )|2 is within some fraction of its peak value
(setting the fraction equal to 21 corresponds to the 3 dB bandwidth). Alternatively, we might
be interested in energy containment bandwidth, which is the size of the smallest band which
contains a specified fraction of the signal energy (for a finite power signal, we define analogously
the power containment bandwidth).
Only positive frequencies count when computing bandwidth for physical (real-valued)
signals: For physically realizable (i.e., real-valued) signals, bandwidth is defined as its occupancy
of positive frequencies, because conjugate symmetry implies that the information at negative fre-
quencies is redundant.
While physically realizable time domain signals are real-valued, we shall soon introduce complex-
valued signals that have useful physical interpretation, in the sense that they have a well-defined
61
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
mapping to physically realizable signals. Conjugate symmetry in the frequency domain does not
hold for complex-valued time domain signals, with different information contained in positive
and negative frequencies in general. Thus, the bandwidth for a complex-valued signal is defined
as the size of the frequency band it occupies over both positive and negative frequencies. The
justification for this convention becomes apparent later in this chapter.
where we use Parseval’s identity to simplify computation for timelimited waveforms. Using the
fact that |U(f )| is even, we obtain that
Z W Z W
2
1.98 = 2 |U(f )| df = 2 4sinc2 (2f )df
0 0
where fc > W > 0. A channel modeled as a linear time invariant system is said to be passband
if its transfer function H(f ) satisfies (2.67).
Examples of baseband and passband signals are shown in Figures 2.25 and 2.26, respectively.
Physically realizable signals must be real-valued in the time domain, which means that their
Fourier transforms, which can be complex-valued, must be conjugate symmetric: U(−f ) =
U ∗ (f ). As discussed earlier, the bandwidth B for a real-valued signal u(t) is the size of the
frequency interval (counting only positive frequencies) occupied by U(f ).
Information sources typically emit baseband signals. For example, an analog audio signal has
significant frequency content ranging from DC to around 20 KHz. A digital signal in which zeros
and ones are represented by pulses is also a baseband signal, with the frequency content governed
by the shape of the pulse (as we shall see in more detail in Chapter 4). Even when the pulse is
timelimited, and hence not strictly bandlimited, most of the energy is concentrated in a band
around DC.
62
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Re(U(f))
f
−W 0 W
Im(U(f))
f
−W W
−1
Figure 2.25: Example of the spectrum U(f ) for a real-valued baseband signal. The bandwidth
of the signal is W .
Re(Up (f))
W
f
−f c fc
Im(Up (f))
−f c
f
fc
Figure 2.26: Example of the spectrum U(f ) for a real-valued passband signal. The bandwidth
of the signal is W . The figure shows an arbitrarily chosen frequency fc within the band in which
U(f ) is nonzero. Typically, fc is much larger than the signal bandwidth W .
63
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Wired channels (e.g., telephone lines, USB connectors) are typically modeled as baseband: the
attenuation over the wire increases with frequency, so that it makes sense to design the transmit-
ted signal to utilize a frequency band around DC. An example of passband communication over
a wire is Digital Subscriber Line (DSL), where high speed data transmission using frequencies
above 25 KHz co-exists with voice transmission in the band from 0-4 KHz. The design and use
of passband signals for communication is particularly important for wireless communication, in
which the transmitted signals must fit within frequency bands dictated by regulatory agencies,
such as the Federal Communication Commission (FCC) in the United States. For example, an
amplitude modulation (AM) radio signal typically occupies a frequency interval of length 10 KHz
somewhere in the 540-1600 KHz band allocated for AM radio. Thus, the baseband audio mes-
sage signal must be transformed into a passband signal before it can be sent over the passband
channel spanning the desired band. As another example, a transmitted signal in a WiFi network
may be designed to fit within a 20 MHz frequency interval in the 2.4 GHz unlicensed band, so
that digital messages to be sent over WiFi must be encoded onto passband signals occupying the
designated spectral band.
f
−W 0 W −fc f c −W fc f
Baseband Passband
64
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Note that |Up (f )| and |Vp (f )| have frequency content in a band around fc , and are passband
signals (i.e., living in a band not containing DC) as long as fc > W .
I and Q components: If we use both the cosine and sine carriers, we can construct a passband
signal of the form
up (t) = uc (t) cos 2πfc t − us (t) sin 2πfc t (2.68)
where uc and us are real baseband signals of bandwidth at most W , with fc > W . The signal
uc (t) is called the in-phase (or I) component, and us (t) is called the quadrature (or Q) component.
The negative sign for the Q term is a standard convention. Since the sinusoidal terms are entirely
predictable once we specify fc , all information in the passband signal up must be contained in
the I and Q components. Modulation for a passband channel therefore corresponds to choosing
a method of encoding information into the I and Q components of the transmitted signal, while
demodulation corresponds to extracting this information from the received passband signal. In
order to accomplish modulation and demodulation, we must be able to upconvert from baseband
to passband, and downconvert from passband to baseband, as follows.
cos 2 π fc t 2cos 2 π fc t
u p (t) u p (t)
−sin 2π fc t −2sin 2π fc t
Lowpass u s (t)
u s (t)
Filter
Upconversion Downconversion
(baseband to passband) (passband to baseband)
Figure 2.28: Upconversion from baseband to passband, and downconversion from passband to
baseband.
65
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
We prove the desired result by showing that x(t) is a passband signal at 2fc , so that its DC
component is zero. That is, Z ∞
x(t)dt = X(0) = 0
−∞
which is the desired result. To show this, note that
1 1
p(t) = uc (t)us (t) ↔ (Uc ∗ Us )(f )
2 2
is a baseband signal: if Uc (f ) is baseband with bandwidth W1 and Us (f ) is baseband with
bandwidth W2 , then their convolution has bandwidth at most W1 + W2 . In order for ap to be
passband, we must have fc > W1 , and in order for bp to be passband, we must have fc > W2 .
Thus, 2fc > W1 + W2 , which means that x(t) = p(t) sin 4πfc t is passband around 2fc , and is
therefore zero at DC. This completes the derivation.
Envelope and phase: Since a passband signal up is equivalent to a pair of real-valued baseband
waveforms (uc , us ), passband modulation is often called two-dimensional modulation. The repre-
sentation (2.68) in terms of I and Q components corresponds to thinking of this two-dimensional
waveform in rectangular coordinates (the “cosine axis” and the “sine axis”). We can also rep-
resent the passband waveform using polar coordinates. Consider the rectangular-polar transfor-
mation
p us (t)
e(t) = u2c (t) + u2s (t) , θ(t) = tan−1
uc (t)
where e(t) ≥ 0 is termed the envelope and θ(t) is the phase. This corresponds to uc (t) =
e(t) cos θ(t) and us (t) = e(t) sin θ(t). Substituting in (2.68), we obtain
up (t) = e(t) cos θ(t) cos 2πfc t − e(t) sin θ(t) sin 2πfc t = e(t) cos (2πfc t + θ(t)) (2.70)
This provides an alternate representation of the passband signal in terms of baseband envelope
and phase signals.
Complex envelope: To obtain a third representation of a passband signal, we note that a
two-dimensional point can also be mapped to a complex number; see Section 2.1. We define the
complex envelope u(t) of the passband signal up (t) in (2.68) and (2.70) as follows:
66
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Q
u s (t) u(t)
e(t)
θ(t )
I
u c (t)
While we have obtained (2.72) using the polar representation (2.70), we should also check that
it is consistent with the rectangular representation (2.68), writing out the real and imaginary
parts of the complex waveforms above as follows:
u(t)ej2πfc t = (uc (t) + jus (t)) (cos 2πfc t + j sin 2πfc t)
(2.73)
= (uc (t) cos 2πfc t − us (t) sin 2πfc t) + j (us (t) cos 2πfc t + uc (t) sin 2πfc t)
Taking the real part, we obtain the expression (2.68) for up (t).
The relationship between the three time domain representations of a passband signal in terms
of its complex envelope is depicted in Figure 2.29. We now specify the corresponding frequency
domain relationship.
Information resides in complex baseband: The complex baseband representation corre-
sponds to subtracting out the rapid, but predictable, phase variation due to the fixed reference
frequency fc , and then considering the much slower amplitude and phase variations induced by
baseband modulation. Since the phase variation due to fc is predictable, it cannot convey any
information. Thus, all the information in a passband signal is contained in its complex envelope.
Choice of frequency/phase reference is arbitrary: We can define the complex baseband
representation of a passband signal using an arbitrary frequency reference fc (and can also vary
the phase reference), as long as we satisfy fc > W , where W is the bandwidth. We may often wish
to transform the complex baseband representations for two different references. For example, we
can write
up (t) = uc1(t) cos(2πf1 t+θ1 )−us1 (t) sin(2πf1 t+θ1 ) = uc2 (t) cos(2πf2 t+θ2 )−us2 (t) sin(2πf2 t+θ2 )
We can express this more compactly in terms of the complex envelopes u1 = uc1 + jus1 and
u2 = uc2 + jus2:
up (t) = Re u1 (t)ej(2πf1 t+θ1 ) = Re u2 (t)ej(2πf2 t+θ2 )
(2.74)
We can now find the relationship between these complex envelopes by transforming the expo-
nential term for one reference to the other:
up (t) = Re u1 (t)ej(2πf1 t+θ1 ) = Re [u1 (t)ej(2π(f1 −f2 )t+θ1 −θ2 ) ]ej(2πf2 t+θ2 )
(2.75)
Comparing with the extreme right-hand sides of (2.74) and (2.75), we can read off that
u2 (t) = u1 (t)ej(2π(f1 −f2 )t+θ1 −θ2 )
67
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
While we derived this result using algebraic manipulations, it has the following intuitive interpre-
tation: if the instantaneous phase 2πfi t + θi of the reference is ahead/behind, then the complex
envelope must be correspondingly retarded/advanced, so that the instantaneous phase of the
overall passband signal stays the same. We illustrate this via some examples below.
Example 2.8.2 (Change of reference frequency/phase) Consider the passband signal up (t) =
I[−1,1] (t) cos 400πt.
(a) Find the output when up (t) cos 401πt is passed through a lowpass filter.
(b) Find the output when up (t) sin(400πt − π4 ) is passed through a lowpass filter.
Solution: From Figure 2.28, we recognize that both (a) and (b) correspond to downconversion
operations with different frequency and phase references. Thus, by converting the complex en-
velope with respect to the appropriate reference, we can read off the answers.
(a) Letting u1 = uc1 + jus1 denote the complex envelope with respect to the reference ej401πt , we
recognize that the output of the LPF is uc1/2. The passband signal can be written as
We can now massage it to read off the complex envelope for the new reference:
up (t) = Re I[−1,1] (t)e−jπt ej401πt
from which we see that u1 (t) = I[−1,1] (t)e−jπt = I[−1,1] (t) (cos πt − j sin πt). Taking real and
imaginary parts, we obtain uc1(t) = I[−1,1] (t) cos πt and us1 (t) = −I[−1,1] (t) sin πt, respectively.
Thus, the LPF output is 12 I[−1,1] (t) cos πt.
π
(b) Letting u2 = uc2 + jus2 denote the complex envelope with respect to the reference ej(400πt− 4 ) ,
we recognize that the output of the LPF is −us2 /2. We can convert to the new reference as
before: π π
up (t) = Re I[−1,1] (t)ej 4 ej(400πt− 4 )
π
which gives the complex envelope u2 = I[−1,1] (t)ej 4 = I[−1,1] (t) cos π4 + j sin π4 . Taking real and
imaginary parts, we obtain uc2 (t) = I[−1,1] (t) cos π4 and us2(t) = I[−1,1] (t) sin π4 , respectively. Thus,
the LPF output is given by −us2 /2 = − 12 I[−1,1] (t) sin π4 = − 2√1 2 I[−1,1] (t).
From a practical point of view, keeping track of frequency/phase references becomes important
for the task of synchronization. For example, the carrier frequency used by the transmitter for
upconversion may not be exactly equal to that used by the receiver for downconversion. Thus,
the receiver must compensate for the phase rotation incurred by the complex envelope at the
output of the downconverter, as illustrated by the following example.
68
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Comparing with the desired form
using Euler’s formula. Equating real and imaginary parts on both sides, we obtain
The phase offset therefore results in the I and Q components being mixed together at the output
of the downconverter. Thus, for a coherent receiver recovers the original I and Q components uc ,
us , we must account for the (possibly time varying) phase offset θ(t). In particular, if we have
an estimate of the phase offset, then we can undo it by inverting the relationship in (2.76):
The preceding computations provide a typical example of the advantage of working in complex
baseband. Relationships between passband signals can be compactly represented in complex
baseband, as in (2.76) and (2.78). For signal processing using real-valued arithmetic, these
complex baseband relationships can be expanded out to obtain relationships involving real-valued
quantities, as in (2.77) and (2.79). See Software Lab 2.1 for an example of such computations.
is a real-valued passband signal whose frequency is concentrated around ±fc , away from DC. Let
That is, C(f ) is the complex envelope U(f ), shifted to the right by fc . Since U(f ) has frequency
content in [−W, W ], C(f ) has frequency content around [fc − W, fc + W ]. Since fc − W > 0, this
band does not include DC. Now,
1 1
up (t) = Re (c(t)) = (c(t) + c∗ (t)) ↔ Up (f ) = (C(f ) + C ∗ (−f ))
2 2
69
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Re(C(f)) Im(C(f))
2B
2A
f f
fc fc
Re(U(f)) Im(U(f))
2B
2A
f f
Figure 2.30: Frequency domain relationship between a real-valued passband signal and its com-
plex envelope. The figure shows the spectrum Up (f ) of the passband signal, its scaled restriction
to positive frequencies C(f ), and the spectrum U(f ) of the complex envelope.
70
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Since C ∗ (−f ) is the complex conjugated version of C(f ), flipped around the origin, it has fre-
quency content in the band of negative frequencies [−fc − W, −fc + W ] around −fc , which does
not include DC because −fc + W < 0. Thus, we have shown that up (t) is a passband signal. It
is real-valued by virtue of its construction using the time domain equation (2.80), which involves
taking the real part. But we can also doublecheck for consistency in the frequency domain:
Up (f ) is conjugate symmetric, since its positive frequency component is C(f ), and its nega-
tive frequency component is C ∗ (−f ). Substituting C(f ) by U(f − fc ), we obtain the passband
spectrum in terms of the complex baseband spectrum:
1
Up (f ) = (U(f − fc ) + U ∗ (−f − fc )) (2.82)
2
So far, we have seen how to construct a real-valued passband signal given a complex-valued
baseband signal. To go in reverse, we must answer the following: do the equivalent representa-
tions (2.68), (2.70), (2.72) and (2.82) hold for any passband signal, and if so, how do we find
the spectrum of the complex envelope given the spectrum of the passband signal? To answer
these questions, we simply trace back the steps we used to arrive at (2.82). Given the spec-
trum Up (f ) for a real-valued passband signal up (t), we construct C(f ) as a scaled version of
Up+ (f ) = Up (f )I[0,∞) (f ), the positive frequency part of Up (f ), as follows:
+ 2Up (f ) , f > 0
C(f ) = 2Up (f ) =
0, f <0
This means that Up (f ) = 21 C(f ) for positive frequencies. By the conjugate symmetry of Up (f ),
the negative frequency component must be 21 C ∗ (−f ), so that Up (f ) = 12 C(f ) + 12 C ∗ (−f ). In the
time domain, this corresponds to
1 1
up (t) = c(t) + c∗ (t) = Re (c(t)) (2.83)
2 2
Now, let us define the complex envelope as follows:
u(t) = c(t)e−j2πfc t ↔ U(f ) = C(f + fc )
Since c(t) = u(t)ej2πfc t , we obtain the desired relationship (2.68) on substituting into (2.83).
Since C(f ) has frequency content in a band around fc , U(f ), which is obtained by shifting C(f )
to the left by fc , is indeed a baseband signal with frequency content in a band around DC.
Frequency domain expressions for I and Q components: If we are given the time domain
complex envelope, we can read off the I and Q components as the real and imaginary parts:
uc (t) = Re(u(t)) = 12 (u(t) + u∗ (t))
us (t) = Im(u(t)) = 2j1 (u(t) − u∗ (t))
Taking Fourier transforms, we obtain
Uc (f ) = 12 (U(f ) + U ∗ (−f ))
Us (f ) = 2j1 (U(f ) − U ∗ (−f ))
Figure 2.30 shows the relation between the passband signal Up (f ), its scaled version C(f ) re-
stricted to positive frequencies, and the complex baseband signal U(f ). As this example em-
phasizes, all of these spectra can, in general, be complex-valued. Equation (2.80) corresponds to
starting with an arbitrary baseband signal U(f ) as in the bottom of the figure, and constructing
C(f ) as depicted in the middle of the figure. We then use C(f ) to construct a conjugate sym-
metric passband signal Up (f ), proceeding from the middle of the figure to the top. This example
also shows that U(f ) does not, in general, obey conjugate symmetry, so that the baseband signal
u(t) is, in general, complex-valued. However, by construction, Up (f ) is conjugate symmetric, and
hence the passband signal up (t) is real-valued.
71
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 2.8.4 Let vp (t) denote a real-valued passband signal, with Fourier transform Vp (f )
specified as follows for negative frequencies:
−(f + 99) −101 ≤ f ≤ −99
Vp (f ) =
0 f < −101 or − 99 < f ≤ 0
(a) Sketch Vp (f ) for both positive and negative frequencies.
(b) Without explicitly taking the inverse Fourier transform, can you say whether vp (t) = vp (−t)
or not?
(c) Find and sketch Vc (f ) and Vs (f ), the Fourier transforms of the I and Q components with
respect to a reference frequency fc = 99. Do this without going to the time domain.
(d) Find an explicit time domain expression for the output when vp (t) cos 200πt is passed through
an ideal lowpass filter of bandwidth 4.
(e) Find an explicit time domain expression for the output when vp (t) sin 202πt is passed through
an ideal lowpass filter of bandwidth 4.
Solution:
Vp (f)
f
~
~
~
~
−101 −99 99 101
(a) Since vp (t) is real-valued, we have Vp (f ) = Vp∗ (−f ). Since the spectrum is also given to be
real-valued for f ≤ 0, we have Vp∗ (−f ) = Vp (−f ). The spectrum is sketched in Figure 2.31.
(b) Yes, vp (t) = vp (−t). Since vp (t) is real-valued, we have vp (−t) = vp∗ (−t) ↔ Vp∗ (f ). But
Vp∗ (f ) = Vp (f ), since the spectrum is real-valued.
(c) The spectrum of the complex envelope and the I and Q components are shown in Figure 2.32.
The complex envelope is obtained as V (f ) = 2Vp+ (f + fc ), while the I and Q components satisfy
V (f ) + V ∗ (−f ) V (f ) − V ∗ (−f )
Vc (f ) = , Vs (f ) =
2 2j
In our case, Vc (f ) = |f |I[−2,2] (f ) and jVs (f ) = f I[−2,2] (f ) are real-valued, and are plotted in the
figure.
(d) The output of the LPF is vc (t)/2, where vc is the I component with respect to fc = 100. In
Figure 2.33, we construct the complex envelope and the I component as in (c), except that the
reference frequency is different. Clearly, the boxcar spectrum corresponds to vc (t) = 4sinc(2t),
so that the output is 2sinc(2t).
(e) The output of the LPF is −vs (t)/2, where vs is the I component with respect to fc = 101. In
Figure 2.34, we construct the complex envelope and the Q component as in (c), except that the
reference frequency is different. We now have to take the inverse Fourier transform, which is a
little painful if we do it from scratch. Instead, let us differentiate to see that
dVs (f )
j = I[−2,2] (f ) − 4δ(f ) ↔ 4sinc(4t) − 4
df
dVs (f )
But df
↔ −j2πtvs (t), so that j dVdf
s (f )
↔ 2πtvs (t). We therefore obtain that 2πtvs (t) =
2(sinc(4t)−1) (1−sinc(4t))
4sinc(4t) − 4, or vs (t) = πt
. Thus, the output of the LPF is −vs (t)/2, or πt
.
72
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
V(f)
4
f
0 2
Vc (f)
2
f
−2 2
j Vs (f)
2
−2
f
2
−2
Figure 2.32: Sketch of I and Q spectra in Example 2.8.4(c), taking reference frequency fc = 99.
V(f)
4
f
−1 0 1
Vc (f)
2
f
−1 1
Figure 2.33: Finding the I component in Example 2.8.4(d), taking reference frequency as fc =
100.
73
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
V(f)
4
f
0 2
j Vs (f)
2
f
−2 2
−2
Figure 2.34: Finding the Q component in Example 2.8.4(e), taking reference frequency as fc =
101.
Hp (f) H(f)
A 2A Filter
fc f f
Up (f) U(f)
B 2B Input
fc f f
Yp (f) Y(f)
AB 2AB Output
fc f f
Figure 2.35: The relationship between passband filtering and its complex baseband analogue.
hc y (t)
c
−
hs
Passband uc (t)
Downconverter
Signal us (t)
u p (t) hc ys (t)
hs
1
Figure 2.36: Complex baseband realization of passband filter. The constant scale factors of 2
have been omitted.
74
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1 1
yc = (uc ∗ hc − us ∗ hs ), ys = (us ∗ hc + uc ∗ hs ) (2.85)
2 2
s(t)
1
1 1
1/2
* =
−1 1 0 3 −1 1 2 4 t
Example 2.8.5 The passband signal u(t) = I[−1,1] (t) cos 100πt is passed through the passband
filter h(t) = I[0,3] (t) sin 100πt. Find an explicit time domain expression for the filter output.
Solution: We need to find the convolution yp (t) of the signal up (t) = I[−1,1] (t) cos 100πt with the
impulse response hp (t) = I[0,3] (t) sin 100πt, where we have inserted the subscript to explicitly
denote that the signals are passband. The corresponding relationship in complex baseband is
75
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
y = (1/2)u ∗ h. Taking a reference frequency fc = 50, we can read off the complex envelopes
u(t) = I[−1,1] (t) and h(t) = −jI[0,3] (t), so that
Let s(t) = (1/2)I[−1,1] (t) ∗ I[0,3] (t) denote the trapezoid obtained by convolving the two boxes, as
shown in Figure 2.37. Then
y(t) = −js(t)
That is, yc = 0 and ys = −s(t), so that yp (t) = s(t) sin 100πt.
Energy and power: The energy of a passband signal equals that of its complex envelope, up
to a scale factor which depends on the particular convention we adopt. In particular, for the
convention in (2.68), we have
1 1
||up ||2 = ||uc||2 + ||us ||2 = ||u||2 (2.86)
2 2
That is, the energy equals the sum of the energies of the I and Q components, up to a scalar
constant. The same relationship holds for the powers of finite-power passband signals and their
complex envelopes, since power is computed as a time average of energy. To show (2.86), consider
= u2c (t) cos2 (2πfc t)dt + u2s (t) sin2 (2πfc t)dt − 2 uc (t) cos 2πfc t us (t) sin 2πfc tdt
R R
The I-Q cross term drops out due to I-Q orthogonality, so that we are left with the I-I and Q-Q
terms, as follows:
Z Z
||up|| = uc (t) cos (2πfc t)dt + u2s (t) sin2 (2πfc t)dt
2 2 2
76
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1 1 1 1
Z Z Z Z
2
||up|| h= u2c (t)dt + u2s (t)dt + u2c (t) cos 4πfc tdt − u2s (t) cos 4πfc tdt
2 2 2 2
The last two terms are zero, since they are equal to the DC components of passband waveforms
centered around 2fc , arguing in exactly the same fashion as in our derivation of I-Q orthogonality.
This gives the desired result (2.86).
Correlation between two signals: The correlation, or inner product, of two real-valued
passband signals up and vp is defined as
Z ∞
hup , vp i = up (t)vp (t)dt
−∞
1
hup , vp i = (huc , vc i + hus , vs i) (2.87)
2
That is, we can implement a passband correlation by first downconverting, and then employing
baseband operations: correlating I against I, and Q against Q, and then summing the results. It
is also worth noting how this is related to the complex baseband inner product, which is defined
as R∞ R∞
hu, vi = −∞ u(t)v ∗ (t)dt = −∞ (uc (t) + jus (t)) (vc (t) − jvs (t))
(2.88)
= (huc , vc i + hus , vs i) + j (hus , vc i − huc , vs i)
Comparing with (2.87), we obtain that
1
hup , vp i = Re (hu, vi)
2
That is, the passband inner product is the real part of the complex baseband inner product (up to
scale factor). Does the imaginary part of the complex baseband inner product have any meaning?
Indeed it does: it becomes important when there is phase uncertainty in the downconversion
operation, which causes the I and Q components to leak into each other. However, we postpone
discussion of such issues to later chapters.
77
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Consider a passband transmitted signal at carrier frequency, of the form
up (t) = uc (t) cos 2πfc t − us (t) sin 2πfc t = e(t) cos(2πfc t + θ(t))
where
u(t) = uc (t) + jus (t) = e(t)ejθ(t)
is the complex baseband representation, or complex envelope. In order to model the propagation
of this signal through a multipath environment, let us consider its propagation through a path
of length r. The propagation attenuates the field by a factor of 1/r, and introduces a delay of
τ (r) = rc , where c denotes the speed of light. Suppressing the dependence of τ on r, the received
signal is given by
A
vp (t) = e(t − τ ) cos(2πfc (t − τ ) + θ(t − τ ) + φ)
r
where we consider relative values (across paths) for the constants A and φ. The complex envelope
of vp (t) with respect to the reference ej2πfc t is given by
A
v(t) = u(t − τ )e−j(2πfc τ +φ) (2.89)
r
For example, we may take A = 1, φ = 0 for a direct, or line of sight (LOS), path from transmitter
to receiver, which we may take as a reference. Figure 2.38 shows the geometry of for a reflected
path corresponding to a single bounce, relative to the LOS path. Follow standard terminology, θi
denotes the angle of incidence, and θg = π2 −θi the grazing angle. The change in relative amplitude
and phase due to the reflection depends on the carrier frequency, the reflector material, the angle
of incidence, and the polarization with respect to the orientation of the reflector surface. Since
we do not wish to get into the underlying electromagnetics, we consider simplified models of
relative amplitude and phase. In particular, we note that for grazing incidence (θg ≈ 0), we have
A ≈ 1, φ ≈ π.
Receiver
Transmitter LOS path
θi Reflected path hr
ht
θg
Reflector
Virtual source
Range R
Figure 2.38: Ray tracing for a single bounce path. We can reflect the transmitter around the
reflector to create a virtual source. The line between the virtual source and the receiver tells us
where the ray will hit the reflector, following the law of reflection that the angles of incidence
and reflection must be equal. The length of the line equals the length of the reflected ray to be
plugged into (2.92).
Generalizing (2.89) to multiple paths of length r1 , r2 , ..., the complex envelope of the received
signal is given by
X Ai
v(t) = u(t − τi )e−j(2πfc τi +φi ) (2.90)
i
r i
78
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
ri
where τi = c
,
and Ai , φi depend on the reflector characteristic and incidence angle for the ith
ray. This corresponds to the complex baseband channel impulse response
X Ai
h(t) = e−j(2πfc τi +φi ) δ(t − τi ) (2.91)
i
ri
Ai −j(2πfc τi +φi )
This is in exact correspondence with our original multipath model (2.36), with αi = ri
e .
The corresponding frequency domain response is given by
X Ai
H(f ) = e−j(2πfc τi +φi ) e−j2πf τi (2.92)
i
ri
Since we are modeling in complex baseband, f takes values around DC, with f = 0 corresponding
to the passband reference frequency fc .
Channel delay spread and coherence bandwidth: We have already introduced these con-
cepts in Example 2.5.6, but reiterate them here. Let τmin and τmax denote the minimum and
maximum of the delays {τi }. The difference τd = τmax − τmin is called the channel delay spread.
The reciprocal of the delay spread is termed the channel coherence bandwidth, Bc = τ1d . A base-
band signal of bandwidth W is said to be narrowband if W τd = W/Bc ≪ 1, or equivalently, if
its bandwidth is significantly smaller than the channel coherence bandwidth.
We can now infer that, for a narrowband signal around the reference frequency, the received
complex baseband signal equals a delayed version of the transmitted signal, scaled by the complex
channel gain
X Ai
h = H(0) = e−j(2πfc τi +φi ) (2.93)
i
ri
Example 2.9.1 (Two ray model) Suppose our propagation environment consists of the LOS
ray and the single reflectedpray shown in Figure 2.38. Then we have two rays, with r1 =
p
R2 + (hr − ht )2 and r2 = R2 + (hr + ht )2 . The corresponding delays are τi = ri /c, i = 1, 2,
where c denotes the speed of propagation. The grazing angle is given by θg = tan−1 ht +h
R
r
. Setting
A1 = 1 and φ1 = 0, once we specify A2 and φ2 for the reflected path, we can specify the complex
baseband channel. Numerical examples are explored in Problem 2.21, and in Software Lab 2.2.
79
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
• Fourier transform: standard pairs (sinc and boxcar, impulse and constant), effect of time de-
lay and frequency shift, conjugate symmetry for real-valued signals, Parseval’s identity, use of
differentiation to simplify computation, numerical computation using DFT
• Bandwidth: for physical signals, given by occupancy of positive frequencies; energy spectral
density equals magnitude squared of Fourier transform; computation of fractional energy con-
tainment bandwidth from energy spectral density
2.11 Endnotes
A detailed treatment of the material reviewed in Sections 2.1-2.5 can be found in basic textbooks
on signals and systems such as Oppenheim, Willsky and Nawab [17] or Lathi [18].
The Matlab code fragments and software labs interspersed in this textbook provide a glimpse of
the use of DSP in communication. However, for a background in core DSP algorithms, we refer
the reader to textbooks such as Oppenheim and Schafer [19] and Mitra [20].
Problems
LTI systems and Convolution
(a) Show that the system is LTI and find its impulse response.
(b) Find the transfer function H(f ) and plot |H(f )|.
(c) If the input x(t) = 2sinc(2t), find the energy of the output.
80
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Fourier Series
Problem 2.3 A digital circuit generates the following periodic waveform with period 0.5:
1, 0 ≤ t < 0.1
u(t) =
0, 1 ≤ t < 0.5
where the unit of time is microseconds throughout this problem.
(a) Find the complex exponential Fourier series for du/dt.
(b) Find the complex exponential Fourier series for u(t), using the results of (a).
(c) Find an explicit time domain expression for the output when u(t) is passed through an ideal
lowpass filter of bandwidth 100 KHz.
(d) Repeat (c) when the filter bandwidth is increased to 300 KHz.
(e) Find an explicit time domain expression for the output when u(t) is passed through a filter
with impulse response h2 (t) = sinc(t) cos(8πt).
(f) Can you generate a sinusoidal waveform of frequency 1 MHz by appropriately filtering u(t)?
If so, specify in detail how you would do it.
Problem 2.4 Find and sketch the Fourier transforms for the following signals:
(a) u(t) = (1 − |t|)I[−1,1] (t).
(b) v(t) = sinc(2t)sinc(4t).
(c) s(t) = v(t) cos 200πt.
(d) Classify each of the signals in (a)-(c) as baseband or passband.
Problem
R∞ 2.5 Use Parseval’s identity to compute the following integrals:
(a) −∞ sinc2 (2t)dt
R∞
(b) 0 sinc(t)sinc(2t)dt
Problem 2.6 (a) For u(t) = sinc(t) sinc(2t), where t is in microseconds, find and plot the
magnitude spectrum |U(f )|, carefully labeling the units of frequency on the x axis.
(b) Now, consider s(t) = u(t) cos 200πt. Plot the magnitude spectrum |S(f )|, again labeling
the units of frequency and carefully showing the frequency intervals over which the spectrum is
nonzero.
Problem 2.7 The signal s(t) = sinc4t is passed through a filter with impulse response h(t) =
sinc2 t cos 4πt to obtain output y(t). Find and sketch the Fourier transform Y (f ) of the output
(sketch the real and imaginary parts separately if the spectrum is complex-valued).
81
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(a) Show that the Fourier transform of this pulse is given by
2 cos πf
P (f ) =
π(1 − 4f 2 )
(b) Use this result to derive the formula (2.63) for the sine pulse in Example 2.5.7.
Problem 2.10 (Numerical computation of the Fourier transform) Modify Code Frag-
ment 2.5.1 for Example 2.5.7 to numerically compute the Fourier transform of the tent function
in Problem 2.8. Display the magnitude spectra of the DFT-based numerically computed Fourier
transform and the analytically computed Fourier transform (from Problem 2.8) in the same plot,
over the frequency interval [−10, 10]. Comment on the accuracy of the DFT-based computation.
Problem 2.11 For a signal s(t), the matched filter is defined as a filter with impulse response
h(t) = smf (t) = s∗ (−t) (we allow signals to be complex valued, since we want to handle complex
baseband signals as well as physical real-valued signals).
(a) Sketch the matched filter impulse response for s(t) = I[1,3] (t).
(b) Find and sketch the convolution y(t) = (s ∗ smf )(t). This is the output when the signal is
passed through its matched filter. Where does the peak of the output occur?
(c) (True or False) Y (f ) ≥ 0 for all f .
Problem 2.12 Repeat Problem 2.11 for s(t) = I[1,3] (t) − 2I[2,5] (t).
Problem 2.13 A wireless channel has impulse response given by h(t) = 2δ(t − 0.1) + jδ(t −
0.64) − 0.8δ(t − 2.2), where the unit of time is in microseconds.
(a) What is the delay spread and coherence bandwidth?
(b) Plot the magnitude and phase of the channel transfer function H(f ) over the interval
[−2Bc , 2Bc ], where Bc denotes the coherence bandwidth computed in (a). Comment on how
the phase behaves when |H(f )| is small.
(c) Express |H(f )| in dB, taking 0 dB as the gain of a nominal channel hnom (t) = 2δ(t − 0.1)
corresponding to the first ray alone. What are the fading depths that you see with respect to
this nominal?
Define the average channel power gain over a band [−W/2, W/2] as
Z W/2
1
Ḡ(W ) = |H(f )|2 df
W −W/2
This is a simplified measure of how increasing signal bandwidth W can help compensate for
frequency-selective fading: we hope that, as W gets large, we can average out fluctuations in
|H(f )|.
(d) Plot Ḡ(W ) as a function of W/Bc , and comment on how large the bandwidth needs to be
(as a multiple of Bc ) to provide “enough averaging.”
82
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where a(t) = sinc(2t), and where the unit of time is in microseconds.
(a) What is the frequency band occupied by up (t)?
(b) The signal up (t) cos 199πt is passed through a lowpass filter to obtain an output b(t). Give
an explicit expression for b(t), and sketch B(f ) (if B(f ) is complex-valued, sketch its real and
imaginary parts separately).
(c) The signal up (t) sin 199πt is passed through a lowpass filter to obtain an output c(t). Give
an explicit expression for c(t), and sketch C(f ) (if C(f ) is complex-valued, sketch its real and
imaginary parts separately).
(d) Can you reconstruct a(t) from simple real-valued operations performed on b(t) and c(t)? If
so, sketch a block diagram for the operations required. If not, say why not.
2 cos(401π t)
2 sin(400 π t+ π /4)
Problem 2.15 Consider the signal s(t) = I[−1,1] (t) cos 400πt.
(a) Find and sketch the baseband signal u(t) that results when s(t) is downconverted as shown
in the upper branch of Figure 2.39.
(b) The signal s(t) is passed through the bandpass filter with impulse response h(t) = I[0,1] (t) sin(400πt+
π
4
). Find and sketch the baseband signal v(t) that results when the filter output y(t) = (s ∗ h)(t)
is downconverted as shown in the lower branch of Figure 2.39.
Problem 2.17 Consider a real-valued passband signal vp (t) whose Fourier transform for positive
frequencies is given by
2, 30 ≤ f ≤ 32
Re(Vp (f )) = 0, 0 ≤ f < 30
0, 32 < f < ∞
1 − |f − 32|, 31 ≤ f ≤ 33
Im(Vp (f )) = 0, 0 ≤ f < 31
0, 33 < f < ∞
83
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(a) Sketch the real and imaginary parts of Vp (f ) for both positive and negative frequencies.
(b) Specify, in both the time domain and the frequency domain, the waveform that you get when
you pass vp (t) cos(60πt) through a low pass filter.
Problem 2.18 The passband signal u(t) = I[−1,1] (t) cos 100πt is passed through the passband
filter h(t) = I[0,3] (t) sin 100πt. Find an explicit time domain expression for the filter output.
Problem 2.19 Consider the passband signal up (t) = sinc(t) cos 20πt, where the unit of time is
in microseconds.
(a) Use Matlab to plot the signal (plot over a large enough time interval so as to include “most”
of the signal energy). Label the units on the time axis.
Remark: Since you will be plotting a discretized version, the sampling rate you should choose
should be large enough that the carrier waveform looks reasonably smooth (e.g., a rate of at least
10 times the carrier frequency).
(b) Write a Matlab program to implement a simple downconverter as follows. Pass x(t) =
2up (t) cos 20πt through a lowpass filter which consists of computing a sliding window average
Rt
over a window of 1 microsecond. That is, the LPF output is given by y(t) = t−1 x(τ ) dτ . Plot
the output and comment on whether it is what you expect to see.
and
π
vp (t) = sinc(t) sin(101πt + )
4
(a) Find the complex envelopes u(t) and v(t) for up and vp , respectively, with respect to the
frequency reference fc = 50.
(b) What is the bandwidth of up (t)? What is the bandwidth of vp (t)?
(c) Find the inner product hup, vp i, using the result in (a).
(d) Find the convolution yp (t) = (up ∗ vp )(t), using the result in (a).
Problem 2.21 Consider the two-ray wireless channel model in Example 2.9.1.
(a) Show that, as long as the range R ≫ ht , hr the delay spread is well approximated as
2ht hr
τd ≈
Rc
where c denotes the propagation speed. We assume free space propagation with c = 3 × 108 m/s.
(b) Compare the approximation in (a) with the actual value of the delay spread for R = 200m,
ht = 2m, hr = 10m. (e.g., modeling an outdoor link with LOS and single ground bounce).
(c) What is the coherence bandwidth for the numerical example in (b).
(d) Redo (b) and (c) for R = 10m, ht = hr = 2m (e.g., a model for an indoor link modeling LOS
plus a single wall bounce).
Problem 2.22 Consider R = 200m, ht = 2m, hr = 10m in the two-ray wireless channel model
in Example 2.9.1. Assume A1 = 1 and φ1 = 0, set A2 = 0.95 and φ2 = π, and assume that the
carrier frequency is 5 GHz.
(a) Specify the channel impulse response, normalizing the LOS path to unit gain and zero delay.
84
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Make sure you specify the unit of time being used.
(b) Plot the magnitude and phase of the channel transfer function over [−3Bc , 3Bc ], where Bc
denotes the channel coherence bandwidth.
(c) Plot the frequency selective fading gain in dB over [−3Bc , 3Bc ], using a LOS channel as
nominal. Comment on the fading depth.
(d) As in Problem 2.13, compute the frequency-averaged power gain Ḡ(W ) and plot it as a
function of W/Bc . How much bandwidth is needed to average out the effects of frequency-
selective fading?
Laboratory Assignment
That is, given an input vector of time points, the function should give an output vector with the
values of x evaluated at those time points. For time points falling outside [−3, 4], the function
should return the value zero.
(b) Use the function signalx to plot x(t) versus t, for −6 ≤ t ≤ 6. To do this, create a vector
of sampling times spaced closely enough to get a smooth plot. Generate a corresponding vector
using signalx. Then plot one against the other.
(c) Use the function signalx to plot x(t − 3) versus t.
(d) Use the function signalx to plot x(3 − t) versus t.
(e) Use the function signalx to plot x(2t) versus t.
Convolution
2(a) Write a Matlab function contconv that computes an approximation to continuous-time
convolution as follows.
Inputs: Vectors x1 and x2 representing samples of two signals to be convolved. Scalars t1 , t2
and dt, representing the starting time for the samples of x1 , the starting time for the samples in
x2 , and the spacing of the samples.
Outputs: Vectors y and t, corresponding to the samples of the convolution output and the
sampling times.
(b) Check that your function works by using it to convolve two boxes, 3I[−2,−1] and 4I[1,3] , to get
a trapezoid (e.g., using the following code fragment):
dt=0.01;%sample spacing
s1 = -2:dt:-1; %sampling times over the interval [-2,-1]
s2= 1:dt:3; %sampling times over the interval [1,3]
85
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
x1=3*ones(length(s1),1); %samples for first box
x2=4*ones(length(s2),1); %samples for second box
[y,t]= contconv(x1,x2,s1(1),s2(1),dt);
figure(1);
plot(t,y);
Check that the trapezoid you get spans the correct interval (based on the analytical answer) and
has the correct scaling.
Matched filter
3(a) Consider the signal u(t) = 2I[1,3] (t)−3I[2,4] (t). Plot u(t) and its matched filter umf (t) = u(−t)
on the same plot.
(b) Use the function contconv to convolve u(t) and umf (t). Plot the result of the convolution.
Where is the peak of the signal?
(c) Now, consider a complex-valued signal s(t) = u(t) + jv(t), where v(t) = I[−1,2] (t) + 2I[0,1] (t).
The matched filter is given by smf (t) = s∗ (−t). Plot the real parts of s(t) and smf (t) on one
plot, and the imaginary parts on another.
(d) Use the function contconv to convolve s(t) and smf (t). Plot the real part, the imaginary part,
and the magnitude of the output. Do you see a peak?
(e) Now, use the function contconv to convolve s1 (t) = s(t − t0 )ejθ and smf (t), for t0 = 2 and
θ = π4 Plot the real part, the imaginary part, and the magnitude of the output. Do you see a
peak?
(f) If you did not know t0 and θ, could you estimate it from the output of the convolution in (e)?
Try out some ideas and report on the results.
Fourier transform
The following Matlab function is a modification of Code Fragment 2.5.1.
86
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
%phase shift associated with start time
X=X.*exp(-j*2*pi*f*tstart);
end
4(a) Use the function contFT to compute the Fourier transform of s(t) = 3sinc(2t − 3), where
the unit of time is a microsecond, the signal is sampled at the rate of 16 MHz, and truncated
to the range [−8, 8] microseconds. We wish to attain a frequency resolution of 1 KHz or better.
Plot the magnitude of the Fourier transform versus frequency, making sure you specify the units
on the frequency axis. Check that the plot conforms to your expectations.
(b) Plot the phase of the Fourier transform obtained in (a) versus frequency (again, make sure
the units on the frequency axis are specified). What is the range of frequencies over which the
phase plot has meaning?
Matched filter in frequency domain
5(a) Consider the signal s(t) in 3(c). Assuming that the unit of time is a millisecond and the
desired frequency resolution is 1 Hz, use the function contFT to compute and plot |S(f )|.
(b) Use the function contFT to compute and plot the magnitude of the Fourier transform of the
convolution s ∗ smf numerically computed in 3(d). Also plot for comparison |S(f )|2, using the
output of 5(a). The two plots should match.
(c) Plot the phase of the Fourier transform of s ∗ smf obtained in 5(b). Comment on whether
the plot matches your expectations.
Lab Report
• Discuss the results you obtain, answer any specific questions that are asked, and print out
the most useful plots to support your answers.
• Append your programs to the report. Make sure you comment them in enough detail so
they are easy to understand. In addition to the functions you are asked to write, label the
code fragments used for each assigned segment (1 through 5) separately.
• Write a paragraph about any questions or confusions that you may have experienced with
this lab.
Laboratory Assignment
PN
Consider a pair of independently modulated signals, uc (t) =
PN n=1 bc [n]p(t − n) and us (t) =
n=1 bs [n]p(t − n), where the symbols bc [n], bs [n] are chosen with equal probability to be +1 and
-1, and p(t) = I[0,1] (t) is a rectangular pulse. Let N = 100.
(1.1) Use Matlab to plot a typical realization of uc (t) and us (t) over 10 symbols. Make sure you
sample fast enough for the plot to look reasonably “nice.”
(1.2) Upconvert the baseband waveform uc (t) to get
87
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
This is a so-called binary phase shift keyed (BPSK) signal, since the changes in phase due to
the changes in the signs of the transmitted symbols. Plot the passband signal up,1(t) over four
symbols (you will need to sample at a multiple of the carrier frequency for the plot to look nice,
which means you might have to go back and increase the sampling rate beyond what was required
for the baseband plots to look nice).
(1.3) Now, add in the Q component to obtain the passband signal
up (t) = uc (t) cos 40πt − us (t) sin 40πt
Plot the resulting Quaternary Phase Shift Keyed (QPSK) signal up (t) over four symbols.
(1.4) Downconvert up (t) by passing 2up (t) cos(40πt + θ) and 2up (t) sin(40πt + θ) through crude
lowpass filters with impulse response h(t) = I[0,0.25] (t). Denote the resulting I and Q components
by vc (t) and vs (t), respectively. Plot vc and vs for θ = 0 over 10 symbols. How do they compare
to uc and us ? Can you read off the corresponding bits bc [n] and bs [n] from eyeballing the plots
for vc and vs ?
(1.5) Plot vc and vs for θ = π/4. How do they compare to uc and us ? Can you read off the
corresponding bits bc [n] and bs [n] from eyeballing the plots for vc and vs ?
(1.6) Figure out how to recover uc and us from vc and vs if a genie tells you the value of θ (we are
looking for an approximate reconstruction–the LPFs used in downconversion are non-ideal, and
the original waveforms are not exactly bandlimited). Check whether your method for undoing
the phase offset works for θ = π/4, the scenario in (1.5). Plot the resulting reconstructions ũc and
ũs , and compare them with the original I and Q components. Can you read off the corresponding
bits bc [n] and bs [n] from eyeballing the plots for ũc and ũs ?
Lab Report
• Answer all questions and print out the most useful plots to support your answers.
• Write a paragraph about any questions or confusions that you may have experienced with
this lab.
Laboratory Assignment
88
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Direct path (200 m)
10 m 10 m
0 200 m
Lamppost to Lamppost Link (direct path + ground reflection)
Lamppost 1 Lamppost 2
10 m Car antenna 10 m
(2 m height)
0 D 200 m
First, let us explore the sensitivity of the lamppost to lamppost link to variations in range and
height. Fix the height of the transmitter on lamppost 1 at 10 m. Vary the height of the receiver
on lamppost 2 from 9.5 to 10.5 m.
(2.3) Letting hnom denote the nominal channel gain between two lampposts if you only consider
the direct path and h the net complex gain including the reflected path, plot the normalized
power gain in dB, 20 log10 |h|h|nom
, as a function of the variation in the receiver height. Comment
on the sensitivity of channel quality to variations in the receiver height.
(2.4) Modeling the variations in receiver height as coming from a uniform distribution over
[9.5, 10.5], find the probability that the normalized power gain is smaller than -20 dB? (i.e., that
we have a fade in signal power of 20 dB or worse).
(2.5) Now, suppose that the transmitter has two antennas, vertically spaced by 25 cm, with
the lower one at a height of 10 m. Let h1 and h2 denote the channels from the two antennas
to the receiver. Let hnom be defined as in item (2.3). Plot the normalized power gains in dB,
20 log10 |h|h i|
nom |
, i = 1, 2. Comment on whether or not both gains dip or peak at the same time.
(2.6) Plot 20 log10 max(|h 1 |,|h2 |)
|hnom |
, which is the normalized power gain you would get if you switched
to the transmit antenna which has the better channel. This strategy is termed switched diversity.
(2.7) Find the probability that the normalized power gain of the switched diversity scheme is
smaller than -20 dB.
(2.8) Comment on whether, and to what extent, diversity helped in combating fading.
Fading on the access link
Consider the access channel from lamppost 1 to the car. Let hnom (D) denote the nominal channel
gain from the lamppost to the car, ignoring the ground reflection. Taking into account the ground
reflection, let the channel gain be denoted as h(D). Here D is the distance of the car from the
bottom of lamppost 1, as shown in Figure 2.40.
(2.9) Plot |hnom | and |h| as a function of D on a dB scale (an amplitude α is expressed on the dB
scale as 20 log10 α). Comment on the “long-term” variation due to range, and the “short-term”
variation due to multipath fading.
Lab Report
89
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
• Answer all questions and print out the most useful plots to support your answers.
• Write a paragraph about any questions or confusions that you may have experienced with
this lab.
90
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 3
Modulation is the process of encoding information into a signal that can be transmitted (or
recorded) over a channel of interest. In analog modulation, a baseband message signal, such
as speech, audio or video, is directly transformed into a signal that can be transmitted over
a designated channel, typically a passband radio frequency (RF) channel. Digital modulation
differs from this only in the following additional step: bits are encoded into baseband message
signals, which are then transformed into passband signals to be transmitted. Thus, despite
the relentless transition from digital to analog modulation, many of the techniques developed for
analog communication systems remain important for the digital communication systems designer,
and our goal in this chapter is to study an important subset of these techniques, using legacy
analog communication systems as examples to reinforce concepts.
From Chapter 2, we know that passband signals carry information in their complex envelope,
and that the complex envelope can be represented either in terms of I and Q components, or in
terms of envelope and phase. We study two broad classes of techniques: amplitude modula-
tion, in which the analog message signal appears directly in the I and/or Q components; and
angle modulation, in which the analog message signal appears directly in the phase or in the
instantaneous frequency (i.e., in the derivative of the phase), of the transmitted signal. Examples
of analog communication in space include AM radio, FM radio, and broadcast television, as well
as a variety of specialized radios. Examples of analog communication in time (i.e., for storage)
include audiocassettes and VHS videotapes.
The analog-centric techniques covered in this chapter include envelope detection, superhetero-
dyne reception, limiter discriminators, and phase locked loops. At a high level, these techniques
tell us how to go from baseband message signals to passband transmitted signals, and back
from passband received signals to baseband message signals. For analog communication, this
is enough, since we consider continuous time message signals which are directly transformed to
passband through amplitude or angle modulation. For digital communication, we need to also
figure out how to decode the encoded bits from the received passband signal, typically after down-
conversion to baseband; this is a subject discussed in later chapters. However, between encoding
at the transmitter and decoding at the receiver, a number of analog communication techniques
are relevant: for example, we need to decide between direct and superheterodyne architectures
for upconversion and downconversion, and tailor our frequency planning appropriately; we may
use a PLL to synthesize the local oscillator frequencies at the transmitter and receiver; and
the basic techniques for mapping baseband signals to passband remain the same (amplitude
and/or angle modulation). In addition, while many classical analog processing functionalities
are replaced by digital signal processing in modern digital communication transceivers, when we
push the limits of digital communication systems, in terms of lowering power consumption or
increasing data rates, it is often necessary to fall back on analog-centric, or hybrid digital-analog,
techniques. This is because the analog-to-digital conversion required for digital transceiver im-
91
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
plementations may often be too costly or power-hungry for ultra high-speed, or ultra low-power,
implementations.
Chapter Plan: After a quick discussion of terminology and notation in Section 3.1, we discuss
various forms of amplitude modulation in Section 3.2, including bandwidth requirements and the
tradeoffs between power efficiency and simplicity of demodulation. We discuss angle modulation
in Section 3.3, including the relation between phase and frequency modulation, the bandwidth
of angle modulated signals, and simple suboptimal demodulation strategies.
The superheterodyne up/downconversion architecture is discussed in Section 3.4, and the design
considerations illustrated via the example of analog AM radio. The phase locked loop (PLL)
is discussed in Section 3.5, including discussion of applications such as frequency synthesis and
FM demodulation, linearized modeling and analysis, and a glimpse of the insights provided by
nonlinear models. Finally, as a historical note, we discuss some legacy analog communication
systems in Section 3.6, mainly to highlight some of the creative design choices that were made
in times when sophisticated digital signal processing techniques were not available. This last
section can be skipped if the reader’s interest is limited to learning analog-centric techniques for
digital communication system design.
Software: Software Lab 3.1 reinforces concepts in amplitude modulation, and shows how en-
velope detection, used for analog amplitude modulation, actually remains relevant for downcon-
version for systems where we are pushing the limits in terms of carrier frequency (e.g., coherent
optical communication). Angle modulation is explored further in Software Lab 3.2, which in-
cludes an introduction to a digital communication technique based on angle modulation.
92
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
0.8
0.6
0.4
M(f)
0.2
A m /2
m(t)/Am
0
−0.2
−0.4
−0.6
−0.8
−1 f
0 0.2 0.4 0.6 0.8 1
fm t
1.2 1.4 1.6 1.8 2
−fm fm
where fc is a carrier frequency, uc (t) is the I component, us (t) is the Q component, e(t) ≥ 0 is the
envelope, and θ(t) is the phase. Modulation consist of encoding the message in uc (t) and us (t), or
equivalently, in e(t) and θ(t). In most of the analog amplitude modulation schemes considered,
the message modulates the I component (with the Q component occasionally playing a “sup-
porting role”) as discussed in Section 3.2. The exception is quadrature amplitude modulation, in
which both I and Q components carry separate messages. In phase and frequency modulation,
or angle modulation, the message directly modulates the phase θ(t) or its derivative, keeping the
envelope e(t) unchanged.
The time domain and frequency domain DSB signals for a sinusoidal message are shown in Figure
3.2.
As another example, consider the finite-energy message whose spectrum is shown in Figure 3.3.
Since the time domain message m(t) is real-valued, its spectrum exhibits conjugate symmetry
(we have chosen a complex-valued message spectrum to emphasize the latter property). The
message bandwidth is denoted by B. The bandwidth of the DSB-SC signal is 2B, which is twice
the message bandwidth. This indicates that we are being redundant in our use of spectrum. To
see this, consider the upper sideband (USB) and lower sideband (LSB) depicted in Figure 3.4.
The shape of the signal in the USB (i.e., Up (f ) for fc < f ≤ fc + B) is the same as that of the
message for positive frequencies (i.e., M(f ), f > 0). The shape of the signal in the LSB (i.e.,
93
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0.8
0.6
0.4
0.2
UDSB (f)
0
−0.2 A A m/4
−0.4
−0.6
−0.8
message waveform
DSB waveform
−1 −fc −fm −fc + fm fc − fm fc + fm f
0 2 4 6 8 10 12 14 16 18 20
Figure 3.2: DSB-SC signal in the time and frequency domains for the sinusoidal message m(t) =
Am cos 2πfm t of Figure 3.1.
Re(M(f))
a
f
0
Im(M(f))
0
f
94
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Re(UDSB (f)) 2B
A a/2
f
−fc fc
f
−fc fc
Figure 3.4: The spectrum of the passband DSB-SC signal for the example message in Figure 3.3.
Up (f ) for fc − B ≤ f < fc ) is the same as that of the message for negative frequencies (i.e.,
M(f ), f < 0). Since m(t) is real-valued, we have M(−f ) = M ∗ (f ), so that we can reconstruct
the message if we know its content at either positive or negative frequencies. Thus, the USB and
LSB of u(t) each contain enough information to reconstruct the message. The term DSB refers to
the fact that we are sending both sidebands. Doing this, of course, is wasteful of spectrum. This
motivates single sideband (SSB) and vestigial sideband (VSB) modulation, which are discussed
a little later.
The term suppressed carrier is employed because, for a message with no DC component, we
see from (3.2) that the transmitted signal does not have a discrete component at the carrier
frequency (i.e., Up (f ) does not have impulses at ±fc ).
2cos 2 π fc t
where θr is the phase of the received carrier relative to the local copy of the carrier produced
by the receiver’s local oscillator (LO), and A is the received amplitude, taking into account the
propagation channel from the transmitter to the receiver. The demodulator is shown in Figure
3.5. In order for this demodulator to work well, we must have θr as close to zero as possible;
that is, the carrier produced by the LO must be coherent with the received carrier. To see the
95
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
effect of phase mismatch, let us compute the demodulator output for arbitrary θr . Using the
trigonometric identity 2 cos θ1 cos θ2 = cos(θ1 − θ2 ) + cos(θ1 + θ2 ), we have
2yp (t) cos(2πfc t) = Am(t) cos(2πfc t + θr ) cos(2πfc t) = Am(t) cos θr + Am(t) cos(4πfc t + θr )
We recognize the second term on the extreme right-hand side as being a passband signal at 2fc
(since it is a baseband message multiplied by a carrier whose frequency exceeds the message
bandwidth). It is therefore rejected by the lowpass filter. The first term is a baseband signal
proportional to the message, which appears unchanged at the output of the LPF (except possibly
for scaling), as long as the LPF response has been designed to be flat over the message bandwidth.
The output of the demodulator is therefore given by
We can also infer this using the complex baseband representation, which is what we prefer to
employ instead of unwieldy trigonometric identities. The coherent demodulator in Figure 3.5
extracts the I component relative to the receiver’s LO. The received signal can be written as
from which we can read off the complex envelope y(t) = Am(t)ejθr . The real part yc (t) =
Am(t) cos θr is the I component extracted by the demodulator.
The demodulator output (3.4) is proportional to the message, which is what we want, but
the proportionality constant varies with the phase of the received carrier relative to the LO.
In particular, the signal gets significantly attenuated as the phase mismatch increases, and gets
completely wiped out for θr = π2 . Note that, if the carrier frequency of the LO is not synchronized
with that of the received carrier (say with frequency offset ∆f ), then θr (t) = 2π∆f t+φ is a time-
varying phase that takes all values in [0, 2π), which leads to time-varying signal degradation in
amplitude, as well as unwanted sign changes. Thus, for coherent demodulation to be successful,
we must drive ∆f to zero, and make φ as small as possible; that is, we must synchronize to
the received carrier. One possible approach to use feedback-based techniques such as the phase
locked loop, discussed later in this chapter.
3.2.2 Conventional AM
In conventional AM, we add a large carrier component to a DSB-SC signal, so that the passband
transmitted signal is of the form:
96
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Ac /2 Re(UAM (f)) Ac /2
A a/2
f
−fc fc
2B
Im(UAM (f))
A b/2
f
−fc fc
Figure 3.6: The spectrum of a conventional AM signal for the example message in Figure 3.3.
design approach that has been adopted for broadcast AM radio. A more detailed discussion
follows.
The envelope of the AM signal in (3.5) is given by
e(t) = |Am(t) + Ac |
If the term inside the magnitude operation is always nonnegative, we have e(t) = Am(t) + Ac .
In this case, we can read off the message signal directly from the envelope, using AC coupling to
get rid of the DC offset due to the second term. For this to happen, we must have
Let mint m(t) = −M0 , where M0 = |mint m(t)|. (Note that the minimum value of the message
must be negative if the message has zero DC value.) Equation (3.6) reduces to −AM0 + Ac ≥ 0,
or Ac ≥ AM0 . Let us define the modulation index amod as the ratio of the size of the biggest
negative incursion due to the message term to the size of the unmodulated carrier term:
The condition (3.6) for accurately recovering the message using envelope detection can now be
rewritten as
amod ≤ 1 (3.7)
It is also convenient to define a normalized version of the message as follows:
m(t) m(t)
mn (t) = = (3.8)
M0 |mint m(t)|
which satisfies
mint m(t)
mint mn (t) = = −1
M0
It is easy to see that the AM signal (3.5) can be rewritten as
which clearly brings out the role of modulation index in ensuring that envelope detection works.
97
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1.5
0.5
−0.5
−1
envelope
AM waveform
−1.5
0 2 4 6 8 10 12 14 16 18 20
1.5
0.5
−0.5
−1
−1.5
envelope
AM waveform
−2
0 2 4 6 8 10 12 14 16 18 20
1.5
0.5
−0.5
−1
−1.5
−2
envelope
AM waveform
−2.5
0 2 4 6 8 10 12 14 16 18 20
Figure 3.7: Time domain AM waveforms for a sinusoidal message. The envelope no longer follows
the message for modulation index larger than one.
98
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
+ +
Passband R C Envelope detector
vin (t) vout(t)
AM signal output
− −
Figure 3.8: Envelope detector demodulation of AM. The envelope detector output is typically
passed through a DC blocking capacitance (not shown) to eliminate the DC offset due to the
carrier component of the AM signal.
v1 exp(−(t− t1)/RC)
v1 v2 exp(−(t− t2)/RC)
v2 Envelope detector output vout (t)
Envelope
t
t1 t2
Figure 3.9: The relation between the envelope detector output vout (t) (shown in bold) and input
vin (t) (shown as dashed line). The output closely follows the envelope (shown as dotted line).
99
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Figure 3.7 illustrates the impact of modulation index on the viability of envelope detection, where
the message signal is the sinusoidal message in Figure 3.1. For amod = 0.5 and amod = 1, we see
that envelope equals a scaled and DC-shifted version of the message. For amod = 1.5, we see that
the envelope no longer follows the shape of the message.
Demodulation of Conventional AM: Ignoring noise, the received signal is given by
where θr is a phase offset which is unknown a priori, if we do not perform carrier synchronization.
However, as long as amod ≤ 1, we can recover the message without knowing θr using envelope
detection, since the envelope is still just a scaled and DC-shifted version of the message. Of
course, the message can also be recovered by coherent detection, since the I component of the
received carrier equals a scaled and DC-shifted version of the message. However, by doing enve-
lope detection instead, we can avoid carrier synchronization, thus reducing receiver complexity
drastically. An envelope detector is shown in Figure 3.8, and an example (where the envelope
is a straight line) showing how it works is depicted in Figure 3.9. The diode (we assume that it
is ideal) conducts in only the forward direction, when the input voltage vin (t) of the passband
signal is larger than the output voltage vout (t) across the RC filter. When this happens, the
output voltage becomes equal to the input voltage instantaneously (under the idealization that
the diode has zero resistance). In this regime, we have vout (t) = vin (t). When the input voltage is
smaller than the output voltage, the diode does not conduct, and the capacitor starts discharging
through the resistor with time constant RC. As shown in Figure 3.9, in this regime, starting at
time t1 , we have v(t) = v1 e−(t−t1 )/RC , where v1 = v(t1 ), as shown in Figure 3.9.
Roughly speaking, the capacitor gets charged at each carrier peak, and discharges between peaks.
The time interval between successive charging episodes is therefore approximately equal to f1c ,
the time between successive carrier peaks. The factor by which the output voltage is reduced
during this period due to capacitor discharge is exp (−1/(fc RC)). This must be close to one in
order for the voltage to follow the envelope, rather than the variations in the sinusoidal carrier.
That is, we must have fc RC ≫ 1. On the other hand, the decay in the envelope detector output
must be fast enough (i.e., the RC time constant must be small enough) so that it can follow
changes in the envelope. Since the time constant for envelope variations is inversely proportional
to the message bandwidth B, we must have RC ≪ 1/B. Combining these two conditions for
envelope detection to work well, we have
1 1
≪ RC ≪ (3.11)
fc B
This of course requires that fc ≫ B (carrier frequency much larger than message bandwidth),
which is typically satisfied in practice. For example, the carrier frequencies in broadcast AM
radio are over 500 KHz, whereas the message bandwidth is limited to 5 KHz. Applying (3.11),
the RC time constant for an envelope detector should be chosen so that
2 µs ≪ RC ≪ 200 µs
In this case, a good choice of parameters would be RC = 20µs, for example, with R = 50 ohms,
and C = 400 nanofarads.
Software Lab 3.1 introduces a different application of envelope detection. Adding a strong carrier
component at the receiver, followed by envelope detection, provides an alternative approach to
downconversion that avoids the use of mixers, which are difficult to implement at very high
carrier frequencies (e.g., for coherent optical communication).
Power efficiency of conventional AM: The price we pay for the receiver simplicity of conven-
tional AM is power inefficiency: in (3.5) the unmodulated carrier Ac cos(2πfc t) is not carrying
100
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
any information regarding the message. We now compute the power efficiency ηAM , which is
defined as the ratio of the transmitted power due to the message-bearing term Am(t) cos(2πfc t)
to the total power of uAM (t). In order to express the result in terms of the modulation index,
let us use the expression (3.9).
A2c A2
u2AM (t) = A2c (1 + amod mn (t))2 cos2 (2πfc t) = (1 + amod mn (t))2 + c (1 + amod mn (t))2 cos(4πfc t)
2 2
The second term on the right-hand side is the DC value of a passband signal at 2fc , which is
zero. Expanding out the first term, we have
A2c 2
A2
c 2
u2AM (t) = 2
1 + amod mn + 2amod mn = 2
1 + amod mn (3.12)
2 2
assuming that the message has zero DC value. The power of the message-bearing term can be
similarly computed as
A2
(Ac amod mn (t))2 cos2 (2πfc t) = c a2mod m2n
2
so that the power efficiency is given by
a2mod m2n
ηAM = (3.13)
1 + a2mod m2n
Noting that mn is normalized so that its most negative value is −1, for messages which have
comparable positive and negative excursions around zero, we expect |mn (t)| ≤ 1, and hence
average power m2n ≤ 1 (typical values are much smaller than one). Since amod ≤ 1 for envelope
detection to work, the power efficiency of conventional AM is at best 50%. For a sinusoidal
message, for example, it is easy to see that m2n = 1/2, so that the power efficiency is at most 33%.
For speech signals, which have significantly higher peak-to-average ratio, the power efficiency is
even smaller.
Example 3.2.1 (AM power efficiency computation): The message m(t) = 2 sin 2000πt −
3 cos 4000πt is used in an AM system with a modulation index of 70% and carrier frequency
of 580 KHz. What is the power efficiency? If the net transmitted power is 10 watts, find the
magnitude spectrum of the transmitted signal.
We need to find M0 = |mint m(t)| in order to determine the normalized form mn (t) = m(t)/M0 .
To simplify notation, let x = 2000πt, and minimize g(x) = 2 sin x − 3 cos 2x. Since g is periodic
with period 2π, we can minimize it numerically over a period. However, we can perform the
minimization analytically in this case. Differentiating g, we obtain
g ′(x) = 2 cos x + 6 sin 2x = 0
This gives
2 cos x + 12 sin x cos x = 2 cos x(1 + 6 sin x) = 0
There are two solutions cos x = 0 and sin x = − 61 . The first solution gives cos 2x = 2 cos2 x − 1 =
−1 and sin x = ±1, which gives g(x) = 1, 5. The second solution gives cos 2x = 1 − 2 sin2 x =
1 − 2/36 = 17/18, which gives g(x) = 2(−1/6) − 3(17/18) = −19/6. We therefore obtain
M0 = |mint m(t)| = 19/6
This gives
m(t) 12 18
mn (t) = = sin 1000πt − cos 2000πt
M0 19 19
101
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
This gives
m2n = (12/19)2(1/2) + (18/19)2(1/2) = 0.65
Substituting in (3.13), setting amod = 0.7, we obtain a power efficiency ηAM = 0.24, or 24%.
To figure out the spectrum of the transmitted signal, we must find Ac in the formula (3.9). The
power of the transmitted signal is given by (3.12) to be
A2c A2
1 + a2mod m2n = c 1 + (0.72 )(0.65)
10 =
2 2
which yields Ac ≈ 3.9. The overall AM signal is given by
uAM (t) = Ac (1 + amod mn (t)) cos 2πfc t = Ac (1 + a1 sin 2πf1 t + a2 cos 4πf1 t) cos 2πfc t
where a1 = 0.7(12/19) = 0.44, a2 = 0.7(−18/19) = −0.66, f1 = 1 KHz and fc = 580KHz. The
magnitude spectrum is given by
|UAM (f )| = Ac /2 (δ(f − fc ) + δ(f + fc ))
+ Ac |a1 |/4 (δ(f − fc − f1 ) + δ(f − fc + f1 ) + δ(f + fc + f1 ) + δ(f + fc − f1 ))
+ Ac |a2 |/4 (δ(f − fc − 2f1 ) + δ(f − fc + 2f1 ) + δ(f + fc + 2f1 ) + δ(f + fc − 2f1 ))
|UAM (f)|
1.95 1.95
~
~
−578 −579 −580 −581 −582 578 579 580 581 582 f (KHz)
102
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Re(UUSB (f))
A a/2
f
−fc fc
B
Im(UUSB (f))
A b/2
f
−fc fc
A a/2
f
−fc fc
Im(ULSB (f))
A b/2
f
−fc fc
Figure 3.11: Spectra for SSB signaling for the example message in Figure 3.3.
103
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
demodulator simply extracts the I component of the passband signal, the I component of the
SSB signal must be the message. In order to understand the structure of an SSB signal, it remains
to identify the Q component. This is most easily done by considering the complex envelope of
the passband transmitted signal. Consider again the example USB signal in Figure 3.11(a). The
spectrum U(f ) of its complex envelope relative to fc is shown in Figure 3.12. Now, the spectra
of I and Q components can be inferred as follows:
Aa
f f
−Ab
Figure 3.12: Complex envelope for the USB signal in Figure 3.11(a).
I component Q component
Re(Uc (f))
Re(Us (f))
Aa/2
f f
−Ab/2
Im(Uc (f))
Aa/2
f
Figure 3.13: I and Q components for the USB signal in Figure 3.11(a).
104
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
That is, the Q component is a filtered version of the message, where the filter transfer function
is H(f ) = −jsgn(f ). This transformation is given a special name, the Hilbert transform.
Hilbert transform: The Hilbert transform of a signal x(t) is denoted by x̌(t), and is specified
in the frequency domain as
X̌(f ) = (−jsgn(f )) X(f )
This corresponds to passing u through a filter with transfer function
1
H(f ) = −jsgn(f ) ↔ h(t) =
πt
where the derivation of the impulse response is left as an exercise.
Re(M(f))
f
0
−b
Im(M(f))
a
0
f
Figure 3.14: Spectrum of the Hilbert transform of the example message in Figure 3.3.
Figure 3.14 shows the spectrum of the Hilbert transform of the example message in Figure 3.3.
We see that it is the same (upto scaling) as the Q component of the USB signal, shown in Figure
3.13.
Physical interpretation of the Hilbert transform: If x(t) is real-valued, then so is its
Hilbert transform x̌(t). Thus, the Fourier transforms X(f ) and X̌(f ) must both satisfy conjugate
symmetry, and we only need to discuss what happens at positive frequencies. For f > 0, we have
X̌(f ) = −jsgn(f )X(f ) = −jX(f ) = e−jπ/2 X(f ). That is, the Hilbert transform simply imposes
a π/2 phase lag at all (positive) frequencies, leaving the magnitude of the Fourier transform
unchanged.
105
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Equation (3.14) shows that the Q component of the USB signal is m̌(t), the Hilbert transform
of the message. Thus, the passband USB signal can be written as
Similarly, we can show that the Q component of an LSB signal is −m̌(t), so that the passband
LSB signal is given by
Message signal
m(t)
m(t)
Figure 3.15: SSB modulation using the Hilbert transform of the message.
SSB modulation: Conceptually, an SSB signal can be generated by filtering out one of the
sidebands of a DSB-SC signal. However, it is difficult to implement the required sharp cutoff
at fc , especially if we wish to preserve the information contained at the boundary of the two
sidebands, which corresponds to the message information near DC. Thus, an implementation of
SSB based on sharp bandpass filters runs into trouble when the message has significant frequency
content near DC. The representations in (3.15) and (3.16) provide an alternative approach to
generating SSB signals, as shown in Figure 3.15. We have emphasized the role of 90◦ phase lags
in generating the I and Q components, as well as the LO signals used for upconversion.
UDSB (f)
1/2
f
~
~
~
~
UUSB (f)
1/2
f
~
~
~
~
−fc −fm fc + fm
ULSB (f)
1/2
f
~
~
~
~
−fc + fm fc − fm
106
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 3.2.3 (SSB waveforms for a sinusoidal message): For a sinusoidal message
m(t) = cos 2πfm t, we have m̌(t) = sin 2πfm t from Example 3.2.2. Consider the DSB signal
where we have normalized the signal power to one: u2DSB = 1. The DSB, USB and SSB spectrum
are shown in Figure 3.16. From the SSB spectra shown, we can immediately write down the
following time domain expressions:
uU SB (t) = cos 2π(fc + fm )t = cos 2πfm t cos 2πfc t − sin 2πfm t sin 2πfc t
uLSB (t) = cos 2π(fc − fm )t = cos 2πfm t cos 2πfc t + sin 2πfm t sin 2πfc t
The preceding equations are consistent with (3.15) and (3.16). For both the USB and LSB
signals, the I component equals the message: uc (t) = m(t) = cos 2πfm t. The Q component
for the USB signal is us (t) = m̌(t) = sin 2πfm t, and the Q component for the LSB signal is
us (t) = −m̌(t) = − sin 2πfm t.
SSB demodulation: We know now that the message can be recovered from an SSB signal
by extracting its I component using a coherent demodulator as in Figure 3.5. The difficulty of
coherent demodulation lies in the requirement for carrier synchronization, and we have discussed
the adverse impact of imperfect synchronization for DSB-SC signals. We now show that the
performance degradation is even more significant for SSB signals. Consider a USB received
signal of the form (ignoring scale factors):
where θr is the phase offset with respect to the receiver LO. The complex envelope with respect
to the receiver LO is given by
Taking the real part, we obtain that the I component extracted by the coherent demodulator is
Thus, as the phase error θr increases, not only do we get an attenuation in the first term corre-
sponding to the desired message (as in DSB), but we also get interference due to the second term
from the Hilbert transform of the message. Thus, for coherent demodulation, accurate carrier
synchronization is even more crucial for SSB than for DSB.
Noncoherent demodulation is also possible for SSB if we add a strong carrier term, as in conven-
tional AM. Specifically, for a received signal given by
if |A + m(t)| ≫ |m̌(t)|. Subject to the approximation in (3.18), an envelope detector works just
as in conventional AM.
107
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Hp (f)
M(f+fc ) M(f−fc )
−fc fc f
M(f)
A coherent demodulator extracting the I component passes 2uV SB (t) cos 2πfc t through a lowpass
filter. But
2uV SB (t) cos 2πfc t ↔ UV SB (f − f c) + UV SB (f + fc )
which equals (substituting from (3.19),
The 2fc term, Hp (f − fc )M(f − 2fc ) + Hp (f + fc )M(f + 2fc ), is filtered out by the lowpass filter.
The output of the LPF are the lowpass terms in (3.20), which equal the I component, and are
given by
M(f ) (Hp (f − fc ) + Hp (f + fc ))
108
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
In order for this to equal (a scaled version of) the desired message, we must have
Hp (f + fc ) + Hp (f − fc ) = constant , |f | < W (3.21)
as shown in the example in Figure 3.17. To understand what this implies about the structure
of the passband VSB filter, note that the filter impulse response can be written as hp (t) =
hc (t) cos 2πfc t − hs (t) sin 2πfc t, where hc (t) is obtained by passing 2hp (t) cos(2πfc t) through a
lowpass filter. But 2hp (t) cos(2πfc t) ↔ Hp (f − fc ) + Hp (f + fc ). Thus, the Fourier transform
involved in (3.21) is precisely the lowpass restriction of 2hp (t) cos(2πfc t), i.e., it is Hc (f ). Thus,
the correct demodulation condition for VSB in (3.21) is equivalent to requiring that Hc (f ) be
constant over the message band. Further discussion of the structure of VSB signals is provided
via problems.
As with SSB, if we add a strong carrier component to the VSB signal, we can demodulate it
noncoherently using an envelope detector, again at the cost of some distortion from the presence
of the Q component.
Lowpass ^ (t)
mc
Filter
2 cos 2 πf ct
Passband
QAM
signal −2 sin 2 πf c t
Lowpass ^ (t)
ms
Filter
Demodulation is achieved using a coherent receiver which extracts both the I and Q components,
as shown in Figure 3.18. If the received signal has a phase offset θ relative to the receiver’s LO,
then we get both attenuation in the desired message and interference from the undesired message,
as follows. Ignoring noise and scale factors, the reconstructed complex baseband message is given
by
m̂(t) = m̂c (t) + j m̂s (t) = (mc (t) + jms (t))ejθ(t) = m(t)ejθ(t)
from which we conclude that
m̂c (t) = mc (t) cos θ(t) − ms (t) sin θ(t)
m̂s (t) = ms (t) cos θ(t) + mc (t) sin θ(t)
Thus, accurate carrier synchronization (θ(t) as close to zero as possible) is important for QAM
demodulation to function properly.
109
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
M(f) X(f)
5/2
1 1
(spectrum for
−20 20 negative frequencies 1/2 1/2
f (KHz)
−10 10 not shown)
580 620
−1/2 −1/2 f (KHz)
590 600 610
−1/4 −1/4
Figure 3.19: Spectrum of message and the corresponding AM signal in Example 3.2.4. Axes are
not to scale.
~
Y(f)
Y(f) 5
5/2
(spectrum for 1
negative frequencies 1/2
not shown) 20
620
f (KHz) 0
f (KHz)
600 610 10
−1/4 −1/2
Figure 3.20: Passband output of bandpass filter and its complex envelope with respect to 600
KHz reference, for Example 3.2.4. Axes are not to scale.
Example 3.2.4 The signal m(t) = 2 cos 20πt − cos 40πt, where the unit of time is millisec-
onds, is amplitude modulated using a carrier frequency fc of 600 KHz. The AM signal is given
by
x(t) = 5 cos 2πfc t + m(t) cos 2πfc t
(a) Sketch the magnitude spectrum of x. What is its bandwidth?
(b) What is the modulation index?
(c) The AM signal is passed through an ideal highpass filter with cutoff frequency 595 KHz (i.e.,
the filter passes all frequencies above 595 KHz, and cuts off all frequencies below 595 KHz). Find
an explicit time domain expression for the Q component of the filter output with respect to a
600 KHz frequency reference.
Solution: (a) The message spectrum M(f ) = δ(f − 10) + δ(f + 10) − 21 δ(f − 20) − 12 δ(f + 20).
The spectrum of the AM signal is given by
5 5 1 1
X(f ) = δ(f − fc ) + δ(f + fc ) + M(f − fc ) + M(f + fc )
2 2 2 2
These spectra are sketched in Figure 3.19.
(b) The modulation index amod = M0 /Ac , where −M0 = mint m(t). To simplify notation, let us
minimize g(x) = 2 cos x − cos 2x. We can actually do this by inspection: for x = π, cos x = −1
and cos 2x = 1, so that minx g(x) = −3. Alternatively, we could set the derivative to zero:
g ′ (x) = −2 sin x + 2 sin 2x = −2 sin x + 4 sin x cos x = 2 sin x(−1 + 2 cos x) is satisfied if sin x = 0
110
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(i.e., cos x = ±1) or cos x = 12 . We can check that the first solution with cos x = −1 minimizes
g(x). Thus, we obtain M0 = 3 and hence amod = M0 /Ac = 3/5 or 60%.
(c) From Figure 3.19, it is clear that a highpass filter with cutoff at 595 KHz selects the USB
signal plus the carrier. The passband output has spectrum as shown in Figure 3.20(a), and the
complex envelope with respect to 600 KHz is shown in Figure 3.20(b). Taking the inverse Fourier
transform, the time domain complex envelope is given by
1
ỹ(t) = 5 + ej20πt − ej40πt
2
We can now find the Q component to be
1
ys (t) = Im (ỹ(t)) = sin 20πt − sin 40πt
2
where t is in milliseconds. Another approach is to recognize that the Q component is the Q
component of the USB signal, which is known to be the Hilbert transform of the message.
Yet another approach
is to find the Q component in the frequency domain using jYs (f ) =
Ỹ (f ) − Ỹ ∗ (f ) /2 and then take inverse Fourier transform. In this particular example, the first
approach is probably the simplest.
and
1 dθ(t)
= f (t) = kf m(t) , Frequency Modulation, (3.23)
2π dt
where kp , kf are constants. Integrating (3.23), the phase of the FM waveform is given by:
Z t
θ(t) = θ(0) + 2πkf m(τ )dτ (3.24)
0
Comparing (3.24) with (3.22), we see that FM is equivalent to PM with the integral of the
message. Similarly, for differentiable messages, PM can be interpreted as FM, with the input
to the FM modulator being the derivative of the message. Figure 3.21 provides an example
illustrating this relationship; this is actually a digital modulation scheme called continuous phase
modulation, as we shall see when we study digital communication. In this example, the digital
111
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
+1 +1
0.6
−1 −1 0.4
0.2
u(t)
Integrate 0
−0.2
−0.4
−0.6
−1
0 0.5 1 1.5 2 2.5 3 3.5 4
t
(a) Messages used for angle modulation (b) Angle modulated signal
0.8
0.6
0.4
+1 +1 0.2
u(t)
−0.2
−0.4
−0.6
−0.8
−1
0 0.5 1 1.5 2 2.5 3 3.5 4
−1 −1 t
(a) Digital input to phase modu- (b) Phase shift keyed signal
lator
112
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
message +1, −1, −1, +1 is the input to an FM modulator: the instantaneous frequency switches
from fc + kf (for one time unit) to fc − kf (for two time units) and then back to fc + kf again.
The same waveform is produced when we feed the integral of the message into a PM modulator,
as shown in the figure.
When the digital message of Figure 3.21 is input to a phase modulator, then we get a modulated
waveform with phase discontinuities when the message changes sign. This is in contrast to the
output in Figure 3.21, where the phase is continuous. That is, if we compare FM and PM
for the same message, we infer that FM waveforms should have less abrupt phase transitions
due to the smoothing resulting from integration: compare the expressions for the phases of the
modulated signals in (3.22) and (3.24) for the same message m(t). Thus, for a given level of
message variations, we expect FM to have smaller bandwidth. FM is therefore preferred to
PM for analog modulation, where the communication system designer does not have control
over the properties of the message signal (e.g., the system designer cannot require the message
to be smooth). For this reason, and also given the basic equivalence of the two formats, we
restrict the discussion in the remainder of this section to FM for the most part. PM, however,
is extensively employed in digital communication, where the system designer has significant
flexibility in shaping the message signal. In this context, we use the term Phase Shift Keying
(PSK) to denote the discrete nature of the information encoded in the message. Figure 3.22 is
actually a simple example of PSK, although in practice, the phase of the modulated signal is
shaped to be smoother in order to improve bandwidth efficiency.
Frequency Deviation and Modulation Index: The maximum deviation in instantaneous
frequency due to a message m(t) is given by
113
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
to the carrier frequency, but this operation does not change the frequency modulation. Direct
FM modulation may be employed for both narrowband and wideband modulation.
An alternative approach to wideband modulation is to first generate a narrowband FM signal
(typically using a phase modulator), and to then multiply the frequency (often over multiple
stages) using nonlinearities, thus increasing the frequency deviation as well as the carrier fre-
quency. This method, which is termed indirect FM modulation, is of historical importance, but is
not used in present-day FM systems because direct modulation for wideband FM is now feasible
and cost-effective.
Demodulation: Many different approaches to FM demodulation have evolved over the past cen-
tury. Here we discuss two important classes of demodulators: limiter-discriminator demodulator
in Section 3.3.1, and the phase locked loop in Section 3.5.
Limiter
A cos(2 π fc t + θ (t))
where θ(t) may include contributions due to channel and noise impairments (to be discussed
later), as well as the angle modulation due to the message. An ideal discriminator now produces
the output dθ(t)
dt
(where we ignore scaling factors).
A crude realization of a discriminator, which converts fluctuations in frequency to fluctuations
in envelope, is shown in Figure 3.24. Taking the derivative of the FM signal
Z t
uF M (t) = Ac cos 2πfc t + 2πkf m(τ )dτ + θ0 (3.26)
0
114
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
2 π fc + dθ (t)/dt
we have
Z t
duF M (t)
v(t) = = −Ac (2πfc + 2πkf m(t)) sin 2πfc t + 2πkf m(τ )dτ + θ0
dt 0
The envelope of v(t) is 2πAc |fc + kf m(t)|. Noting that kf m(t) is the instantaneous frequency
deviation from the carrier, whose magnitude is much smaller than fc for a properly designed
system, we realize that fc + kf m(t) > 0 for all t. Thus, the envelope equals 2πAc (fc + kf m(t)),
so that passing the discriminator output through an envelope detector yields a scaled and DC-
shifted version of the message. Using AC coupling to reject the DC term, we obtain a scaled
version of the message m(t), just as in conventional AM.
Approximately
linear response
fc f
f0
Band
occupied by
FM Signal
Figure 3.25: Slope detector using a tuned circuit offset from resonance.
The discriminator as described above corresponds to the frequency domain transfer function
H(f ) = j2πf , and can therefore be approximated (up to DC offsets) by transfer functions that
are approximately linear over the FM band of interest. An example of such a slope detector is
given in Figure 3.25, where the carrier frequency fc is chosen at an offset from the resonance
frequency f0 of a tuned circuit.
One problem with the simple discriminator and its approximations is that the envelope detector
output has a significant DC component: when we get rid of this using AC coupling, we also
attenuate low frequency components near DC. This limitation can be overcome by employing
circuits that rely on the approximately linear variations in amplitude and phase of tuned circuits
around resonance to synthesize approximations to an ideal discriminator whose output is the
derivative of the phase. These include the Foster-Seely detector and the ratio detector. Circuit
level details of such implementations are beyond our scope.
3.3.2 FM Spectrum
We first consider a naive but useful estimate of FM bandwidth termed Carson’s rule. We
then show that the spectral properties of FM are actually quite complicated, even for a simple
sinusoidal message, and outline methods of obtaining more detailed bandwidth estimates.
115
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Consider an angle modulated signal, up (t) = Ac cos (2πfc t + θ(t)), where θ(t) contains the mes-
sage information. For a baseband message m(t) of bandwidth B, the phase θ(t) for PM is also
a baseband signal with the same bandwidth. The phase θ(t) for FM is the integral of the mes-
sage. Since integration smooths out the time domain signal, or equivalently, attenuates higher
frequencies, θ(t) is a baseband signal with bandwidth at most B. We therefore loosely think of
θ(t) as having a bandwidth equal to B, the message bandwidth, for the remainder of this section.
The complex envelope of up with respect to fc is given by
Now, if |θ(t)| is small, as is the case for narrowband angle modulation, then cos θ(t) ≈ 1 and
sin θ(t) ≈ θ(t), so that the complex envelope is approximately given by
Thus, the I component has a large unmodulated carrier contribution as in conventional AM, but
the message information is now in the Q component instead of in the I component, as in AM.
The Fourier transform is given by
Ac Ac
Up (f ) = (δ(f − fc ) + δ(f + fc )) − (Θ(f − fc ) − Θ(f + fc ))
2 2j
where Θ(f ) denotes the Fourier transform of θ(t). The magnitude spectrum is therefore given
by
Ac Ac
|Up (f )| = (δ(f − fc ) + δ(f + fc )) + (|Θ(f − fc )| + |Θ(f + fc )|) (3.27)
2 2
Thus, the bandwidth of a narrowband FM signal is 2B, or twice the message bandwidth, just as
in AM. For example, narrowband angle modulation with a sinusoidal message m(t) = cos 2πfm t
k
occupies a bandwidth of 2fm : θ(t) = fmf sin 2πfm t for FM, and θ(t) = kp cos 2πfm t) for PM.
For wideband FM, we would expect the bandwidth to be dominated by the frequency deviation
kf m(t). For messages that have positive and negative peaks of similar size, the frequency devia-
tion ranges between −∆fmax and ∆fmax , where ∆fmax = kf maxt |m(t)|. In this case, we expect
the bandwidth to be dominated by the instantaneous deviations around the carrier frequency,
which spans an interval of length 2∆fmax .
Carson’s rule: This is an estimate for the bandwidth of a general FM signal, based on simply
adding up the estimates from our separate discussion of narrowband and wideband modulation:
where β = ∆fmax /B is the modulation index, also called the FM deviation ratio, defined earlier.
FM Spectrum for a Sinusoidal Message: In order to get more detailed insight into what
the spectrum of an FM signal looks like, let us now consider the example of a sinusoidal message,
for which the phase deviation is given by θ(t) = β sin 2πfm t, from (3.25). The complex envelope
of the FM signal with respect to fc is given by
116
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
Since the sinusoid in the exponent is periodic with period fm
, so is u(t). It can therefore be
expanded into a Fourier series of the form
∞
X
u(t) = u[n]ej2πnfm t
n=−∞
where Jn (·) is the Bessel function of the first kind of order n. While the integrand above is
complex-valued, the integral is real-valued. To see this, use Euler’s formula:
Since β sin x − nx and the sine function are both odd, the imaginary term sin(β sin x − nx) above
is an odd function, and integrates out to zero over [−π, π]. The real part is even, hence the
integral over [−π, π] is simply twice that over [0, π]. We summarize as follows:
Z π
1 1 π
Z
j(β sin x−nx)
u[n] = Jn (β) = e dx = cos(β sin x − nx)dx (3.29)
2π −π π 0
1
J0(β)
J1(β)
J2(β)
J3(β)
0.5
−0.5
0 1 2 3 4 5 6 7 8 9 10
β
Figure 3.26: Bessel functions of the first kind, Jn (β) versus β, for n = 0, 1, 2, 3.
Bessel functions are available in mathematical software packages such as Matlab and Mathemat-
ica. Figure 3.26 shows some Bessel function plots. Some properties of Bessel functions worth
noting are as follows:
117
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
• For n integer, Jn (β) = (−1)n J−n (β) = (−1)n Jn (−β).
• For fixed β, Jn (β) tends to zero fast as n gets large, so that the complex envelope is well ap-
proximated by a finite number of Fourier series components. In particular, a good approximation
is that Jn (β) is small for |n| > β + 1. This leads to an approximation for the bandwidth of the
FM signal given by 2(β + 1)fm , which is consistent with Carson’s rule.
• For fixed n, Jn (β) vanishes for specific values of β, a fact that can be used for spectral shaping.
Noting that |J−n (β)| = |Jn (β)|, the complex envelope has discrete frequency components at ±nfm
of strength |Jn (β)|: these correspond to frequency components at fc ± nfm in the passband FM
signal.
Fractional power containment bandwidth: By Parseval’s identity for Fourier series, the
power of the complex envelope is given by
∞
X ∞
X
2
1 = |u(t)| = |u(t)|2 = Jn2 (β) = J02 (β) +2 Jn2 (β)
n=−∞ n=1
we can compute the fractional power containment bandwidth as 2Kfm , where K ≥ 1 is the
smaller integer such that
K
X
2
J0 (β) + 2 Jn2 (β) ≥ α
n=1
where α is the desired fraction of power within the band. (e.g., α = 0.99 for the 99% power
containment bandwidth). For integer values of β = 1, ..., 10, we find that K = β + 1 provides a
good approximation to the 99% power containment bandwidth, which is again consistent with
Carson’s formula.
a(t)
... 2 mV
100 ...
t (microsec)
200
−2 mV
The following worked problem brings together some of the concepts we have discussed regarding
FM.
118
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 3.3.1 The signal a(t) shown in Figure 3.27 is fed to a VCO with quiescent frequency
of 5 MHz and frequency deviation of 25 KHz/mV. Denote the output of the VCO by y(t).
(a) Provide an estimate of the bandwidth of y. Clearly state the assumptions that you make.
(b) The signal y(t) is passed through an ideal bandpass filter of bandwidth 5 KHz, centered at
5.005 MHz. Provide the simplest possible expression for the power at the filter output (if you
can give a numerical answer, do so).
Solution: (a) The VCO output is an FM signal with
∆fmax = kf maxt m(t) = 25 KHz/mV × 2 mV = 50 KHz
The message is periodic with period 100 microseconds, hence its fundamental frequency is 10
KHz. Approximating its bandwidth by its first harmonic, we have B ≈ 10 KHz. Using Carson’s
formula, we can approximate the bandwidth of the FM signal at the VCO output as
BF M ≈ 2∆fmax + 2B ≈ 120 KHz
(b) The complex envelope of the VCO output is given by ejθ(t) , where
Z
θ(t) = 2πkf m(τ )dτ
For periodic messages with zero DC value (as is the case for m(t) here), θ(t), and hence, ejθ(t)
has the same period as the message. We can therefore express the complex envelope as a Fourier
series with complex exponentials at frequencies nfm , where fm = 10 KHz is the fundamental
frequency for the message, and where n takes integer values. Thus, the FM signal has discrete
components at fc + nfm , where fc = 5 MHz in this example. A bandpass filter at 5.005 MHz
with bandwidth 5 KHz does not capture any of these components, since it spans the interval
[5.0025, 5.0075] MHz, whereas the nearest Fourier components are at 5 MHz and 5.01 MHz.
Thus, the power at the output of the bandpass filter is zero.
119
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
translating the received signal to a fixed IF. Radio receivers built with discrete components often
take advantage of the widespread availability of inexpensive filters at certain commonly used
IF frequencies, such as 455 KHz (used for AM radio) and 10.7 MHz (used for FM radio). As
carrier frequencies scale up to the GHz range (as is the case for modern digital cellular and
wireless local area network transceivers), circuit components shrink with the carrier wavelength,
and it becomes possible to implement RF amplifiers and filters using integrated circuits. In such
settings, a direct conversion architecture, in which the passband signal is directly translated to
baseband, becomes increasingly attractive, as discussed later in this section.
The key element in frequency translation is a mixer, which multiplies two input signals. For our
present purpose, one of these inputs is a passband received signal A cos(2πfRF t + θ), where the
envelope A(t) and phase θ(t) are baseband signals that contain message information. The second
input is a local oscillator (LO) signal, which is a locally generated sinusoid cos(2πfLO t) (we set
the LO phase to zero without loss of generality, effectively adopting it as our phase reference).
The output of the mixer is therefore given by
A A
A cos(2πfRF t + θ) cos(2πfLO t) = cos (2π(fRF − fLO )t + θ) + cos (2π(fRF + fLO )t + θ)
2 2
Thus, there are two frequency components at the output of the mixer, fRF + fLO and |fRF −
fLO | (remember that we only need to talk about positive frequencies when discussing physically
realizable signals, due to the conjugate symmetry of the Fourier transform of real-valued time
signals). In the superhet receiver, we set one of these as our IF, typically the difference frequency:
fIF = |fRF − fLO |.
RF signal
into antenna
Image reject Mixer Channel select
Local Oscillator
Antenna
receiving Automatic gain control
entire AM band
Mixer
Tunable IF amplifier Envelope Audio To
RF Amplifier (455 KHz) detector amplifier speaker
Center frequency f RF
Station Tunable
selection Local Oscillator
For a given RF and a fixed IF, we therefore have two choices of LO frequency when fIF =
|fRF − fLO |: fLO = fRF − fIF and fLO = fRF + fIF To continue the discussion, let us consider
the example of AM broadcast radio, which operates over the band from 540 to 1600 KHz, with
120
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
10 KHz spacing between the carrier frequencies for different stations. The audio message signal
is limited to 5 KHz bandwidth, modulated using conventional AM to obtain an RF signal of
bandwidth 10 KHz. Figure 3.29 shows a block diagram for the superhet architecture commonly
used in AM receivers. The RF bandpass filter must be tuned to the carrier frequency for the
desired station, and at the same time, the LO frequency into the mixer must be chosen so that
the difference frequency equals the IF frequency of 455 KHz. If fLO = fRF + fIF , then the
LO frequency ranges from 995 to 2055 KHz, corresponding to an approximately 2-fold variation
in tuning range. If fLO = fRF − fIF , then the LO frequency ranges from 85 to 1145 KHz,
corresponding to more than 13-fold variation in tuning range. The first choice is therefore
preferred, because it is easier to implement a tunable oscillator over a smaller tuning range.
BEFORE TRANSLATION TO IF
2fIF
HRF (f)
B channel Image frequency
gets blocked
AFTER TRANSLATION TO IF
f IF
Figure 3.30: The role of image rejection and channel selection in superhet receivers.
Having fixed the LO frequency, we have a desired signal at fRF = fLO − fIF that leads to a
component at IF, and potentially an undesired image frequency at fIM = fLO + fIF = fRF + 2fIF
that also leads to a component at IF. The job of the RF bandpass filter is to block this image
frequency. Thus, the filter must let in the desired signal at fRF (so that its bandwidth must be
larger than 10 KHz), but severely attenuate the image frequency which is 910 KHz away from
the center frequency. It is therefore termed an image reject filter. We see that, for the AM
broadcast radio application, a superhet architecture allows us to design the tunable image reject
filter to somewhat relaxed specifications. However, the image reject filter does let in not only
the signal from the desired station, but also those from adjacent stations. It is the job of the
IF filter, which is tuned to the fixed frequency of 455 KHz, to filter out these adjacent stations.
For this purpose, we use a highly selective filter at IF with a bandwidth of 10 KHz. Figure 3.30
illustrates these design considerations more generally.
Receivers for FM broadcast radio also commonly use a superhet architecture. The FM broadcast
band ranges from 88-108 MHz, with carrier frequency separation of 200 KHz between adjacent
stations. The IF is chosen at 10.7 MHz, so that the LO is tuned from 98.7 to 118.7 MHz for the
choice fLO = fRF + fIF . The RF filter specifications remain relaxed: it has to let in the desired
signal of bandwidth 200 KHz, while rejecting an image frequency which is 2fIF = 21.4 MHz away
from its center frequency. We discuss the structure of the FM broadcast signal, particularly the
way in which stereo FM is transmitted, in more detail in Section 3.6.
Roughly indexing the difficulty of implementing a filter by the ratio of its center frequency to its
121
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
bandwidth, or its Q factor, with high Q being more difficult to implement, we have the following
fundamental tradeoff for superhet receivers. If we use a large IF, then the Q needed for the
image reject filter is smaller. On the other hand, the Q needed for the IF filter to reject an
interfering signal whose frequency is near that of the desired signal becomes higher. In modern
digital communication applications, superheterodyne reception with multiple IF stages may be
used in order to work around this tradeoff, in order to achieve the desired gain for the signal of
interest and to attenuate sufficiently interference from other signals, while achieving an adequate
degree of image rejection. Image rejection can be enhanced by employing appropriately designed
image-reject mixer architectures.
Direct conversion receivers: With the trend towards increasing monolithic integration of
digital communication transceivers for applications such as cellular telephony and wireless local
area networks, the superhet architecture is often being supplanted by direct conversion (or zero
IF) receivers, in which the passband received signal is directly converted down to baseband
using a quadrature mixer at the RF carrier frequency. In this case, the desired signal is its
own image, which removes the necessity for image rejection. Moreover, interfering signals can be
filtered out at baseband, often using sophisticated digital signal processing after analog-to-digital
conversion (ADC), provided that there is enough dynamic range in the circuitry to prevent a
strong interferer from swamping the desired signal prior to the ADC. In contrast, the high Q
bandpass filters required for image rejection and interference suppression in the superhet design
must often be implemented off-chip using, for example, surface acoustic wave (SAW) devices,
which is bulky and costly. Thus, direct conversion is in some sense the “obvious” thing to do,
except that historically, people were unable to make it work, leading to the superhet architecture
serving as the default design through most of the twentieth century.
A key problem with direct conversion is that LO leakage into the RF input of the mixer causes
self-mixing, leading to a DC offset. While a DC offset can be calibrated out, the main problem
is that it can saturate the amplifiers following the mixer, thus swamping out the contribution of
the weaker received signal. Note that the DC offset due to LO leakage is not a problem with a
superhet architecture, since the DC term gets filtered out by the passband IF filter. Other prob-
lems with direct conversion include 1/f noise and susceptibility to second order nonlinearities,
but discussion of these issues is beyond our current scope. However, since the 1990s, integrated
circuit designers have managed to overcome these and other obstacles, and direct conversion re-
ceivers have become the norm for monolithic implementations of modern digital communication
transceivers. These include cellular systems in various licensed bands ranging from 900 MHz to
2 GHz, and WLANs in the 2.4 GHz and 5 GHz unlicensed bands.
The insatiable demand for communication bandwidth virtually assures us that we will seek to
exploit frequency bands well beyond 5 GHz, and circuit designers will be making informed choices
between the superhet and direct conversion architectures for radios at these higher frequencies.
For example, the 60 GHz band in the United States has 7 GHz of unlicensed spectrum; this
band is susceptible to oxygen absorption, and is ideally suited for short range (e.g. 10-500
meters range) communication both indoors and outdoors. Similarly, the 71-76 GHz and 81-
86 GHz bands, which avoid oxygen absorption loss, are available for semi-unlicensed point-to-
point “last mile” links. Just as with cellular and WLAN applications in lower frequency bands,
we expect that proliferation of applications using these “millimeter (mm) wave” bands would
require low-cost integrated circuit transceiver implementations. Based on the trends at lower
frequencies, one is tempted to conjecture that initial circuit designs might be based on superhet
architectures, with direct conversion receivers becoming subsequently more popular as designers
become more comfortable with working at these higher frequencies. It is interesting to note
that the design experience at lower carrier frequencies does not go to waste; for example, direct
conversion receivers at, say, 5 GHz, can serve as the IF stage for superhet receivers for mm wave
communication.
122
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Phase θ o
VCO
VCO Output
The key idea behind the PLL, depicted in Figure 3.31, is as follows: we would like to lock on to
the phase of the input to the PLL. We compare the phase of the input with that of the output of
a voltage controlled oscillator (VCO) using a phase detector. The difference between the phases
drives the input of the VCO. If the VCO output is ahead of the PLL input in phase, then we
would like to retard the VCO output phase. If the VCO output is behind the PLL input in phase,
we would like to advance the VCO output phase. This is done by using the phase difference to
control the VCO input. Typically, rather than using the output of the phase detector directly
for this purpose, we smooth it out using a loop filter in order to reduce the effect of noise.
−Av sin (2 π fc t+ θo )
VCO Output VCO
x(t)
Mixer as phase detector: The classical analog realization of the PLL is based on using a
mixer (i.e., a multiplier) as a phase detector. To see how this works, consider the product of two
sinusoids whose phases we are trying to align:
1 1
cos(2πfc t + θ1 ) cos(2πfc t + θ2 ) = cos(θ1 − θ2 ) + cos(4πfc t + θ1 + θ2 ))
2 2
The second term on the right-hand side is a passband signal at 2fc which can be filtered out
by a lowpass filter. The first term contains the phase difference θ1 − θ2 , and is to be used to
drive the VCO so that we eventually match the phases. Thus, the first term should be small
when we are near a phase match. Since the driving term is the cosine of the phase difference,
123
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the phase match condition is θ1 − θ2 = π/2. That is, using a mixer as our phase detector means
that, when the PLL is locked, the phase at the VCO output is 90◦ offset from the phase of the
PLL input. Now that we know this, we adopt a more convenient notation, changing variables
to define a phase difference whose value at the desired matched state is zero rather than π/2.
Let the PLL input be denoted by Ac cos(2πfc + θi (t)), and let the VCO output be denoted by
Av cos(2πfc + θo (t) + π2 ) = −Av sin(2πfc + θo (t)). The output of the mixer is now given by
Ac Av Ac Av
= 2
sin (θi (t) − θo (t)) − 2
sin (4πfc t + θi (t) + θo (t))
The second term on the right-hand side is a passband signal at 2fc which can be filtered out
as before. The first term is the desired driving term, and with the change of notation, we note
that the desired state, when the driving term is zero, corresponds to θi = θo . The mixer based
realization of the PLL is shown in Figure 3.32.
The instantaneous frequency of the VCO is proportional to its input. Thus the phase of the
VCO output −sin(2πfc t + θo (t)) is given by
Z t
θo (t) = Kv x(τ )dτ
0
ignoring integration constants. Taking Laplace transforms, we have Θo (s) = Kv X(s)/s. The
reference frequency fc is chosen as the quiescent frequency of the VCO, which is the frequency it
would produce when its input voltage is zero.
PLL Input
XOR gate
(square wave) Loop
Filter
VCO Output
PLL Input
γ
Output of XOR gate VHI
VLO
Mixed signal phase detectors: Modern hardware realizations of the PLL, particularly for
applications involving digital waveforms (e.g., a clock signal), often realize the phase detector
using digital logic. The most rudimentary of these is an exclusive or (XOR) gate, as shown in
Figure 3.33. For the scenario depicted in the figure, we see that the average value of the output
124
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
V’
V = V’ − V
( LO +VHI )/2
VHI
(a) DC value of output of XOR gate. (b) XOR phase detector output after axes translation.
of the XOR gate is linearly related to the phase offset γ. Normalizing a period of the square
wave to length 2π, this DC value V ′ is related to γ as shown in Figure 3.34(a). Note that, for
zero phase offset, we have V ′ = VHI , and that the response is symmetric around γ = 0. In order
to get a linear phase detector response going through the origin, we translate this curve along
both axes: we define V = V ′ − (VLO + VHI ) /2 as a centered response, and we define the phase
offset θ = γ − π2 . Thus, the lock condition (θ = 0) corresponds to the square waves being 90◦ out
of phase. This translation gives us the phase response shown in Figure 3.34(b), which looks like
a triangular version of the sinusoidal response for the mixer-based phase detector.
The simple XOR-based phase detector has the disadvantage of requiring that the waveforms
have 50% duty cycle. In practice, more sophisticated phase detectors, often based on edge
detection, are used. These include “phase-frequency detectors” that directly provide information
on frequency differences, which is useful for rapid locking. While discussion of the many phase
detector variants employed in hardware design is beyond our scope, references for further study
are provided at the end of this chapter.
VCO
FM Demodulator
Output
125
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
dθo dθi
If θo ≈ θi , then dt
≈ dt
, so that
Kv x(t) ≈ 2πkf m(t)
That is, the VCO input is approximately equal to a scaled version of the message. Thus, the
PLL is an FM demodulator, where the FM signal is the input to the PLL, and the demodulator
output is the VCO input, as shown in Figure 3.35.
fcrystal
Crystal Phase Loop
Oscillator Frequency Detector Filter
reference
VCO
Kfcrystal Frequency
synthesizer
output
Frequency Divider
(divide by K)
Figure 3.36: Frequency synthesis using a PLL by inserting a frequency divider into the loop.
PLL as frequency synthesizer: The PLL is often used to synthesize the local oscillators used
in communication transmitters and receivers. In a typical scenario, we might have a crystal
oscillator which provides an accurate frequency reference at a relatively low frequency, say 40
MHz. We wish to use this to derive an accurate frequency reference at a higher frequency, say 1
GHz, which might be the local oscillator used at an IF or RF stage in the transceiver. We have
a VCO that can produce frequencies around 1 GHz (but is not calibrated to produce the exact
value of the desired frequency), and we wish to use it to obtain a frequency f0 that is exactly
K times the crystal frequency fcrystal . This can be achieved by adding a frequency divider into
the PLL loop, as shown in Figure 3.36. Such frequency dividers can be implemented digitally
by appropriately skipping pulses. Many variants of this basic concept are possible, such as using
multiple frequency dividers, frequency multipliers, or multiple interacting loops.
All of these applications rely on the basic property that the VCO output phase successfully tracks
some reference phase using the feedback in the loop. Let us now try to get some insight into how
this happens, and into the impact of various parameters on the PLL’s performance.
In deriving this model, we can ignore the passband term at 2fc , which will get rejected by the
integration operation due to the VCO, as well as by the loop filter (if a nontrivial lowpass loop
126
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Loop gain
and filter
sin( ) K G(s)
θi
−
θo
1/s
VCO functionality
(normalized)
filter is employed). From Figure 3.32, the sine of the phase difference is amplified by 12 Ac Av due
to the amplitudes of the PLL input and VCO output. This is passed through the loop filter,
which has transfer function G(s), and then through the VCO, which has a transfer function Kv /s.
The loop gain K shown in Figure 3.37 is set to be the product K = 21 Ac Av Kv (in addition, the
loop gain also includes additional amplification or attenuation in the loop that is not accounted
for in the transfer function G(s)).
Loop gain
and filter
θi K G(s)
−
θo
1/s
VCO functionality
(normalized)
The model in Figure 3.37 is difficult to analyze because of the sin(·) nonlinearity after the phase
difference operation. One way to avoid this difficulty is to linearize the model by simply dropping
the nonlinearity. The motivation is that, when the input and output phases are close, as is the
case when the PLL is in tracking mode, then
sin(θi − θo ) ≈ θi − θo
Applying this approximation, we obtain the linearized model of Figure 3.38. Note that, for the
XOR-based response shown in Figure 3.34(b), the response is exactly linear for |θ| ≤ π2 .
127
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
from which we infer the input-output relationship
Θo (s) KG(s)
H(s) = = (3.32)
Θi (s) s + KG(s)
It is also useful to express the phase error θe in terms of the input θi , as follows:
For this LTI model, the same transfer functions also govern the relationships between the input
s s
and output instantaneous frequencies: since Fi (s) = 2π Θi (s) and Fo (s) = 2π Θo (s), we obtain
Fo (s)/Fi (s) = Θo (s)/Θi (s). Thus, we have
Fo (s) KG(s)
= H(s) = (3.34)
Fi (s) s + KG(s)
Fi (s) − Fo (s) s
= He (s) = (3.35)
Fi (s) s + KG(s)
First order PLL: When we have a trivial loop filter, G(s) = 1, we obtain the first order response
K s
H(s) = , He (s) =
s+K s+K
which is a stable response for loop gain K > 0, with a single pole at s = −K. It is interesting to
see what happens when the input phase is a step function, θi (t) = ∆θI[0,∞) (t), or Θi (s) = ∆θ/s.
We obtain
K∆θ ∆θ ∆θ
Θo (s) = H(s)Θi (s) = = −
s(s + K) s s+K
Taking the inverse Laplace transform, we obtain
so that θo (t) → ∆θ as t → ∞. Thus, the first order PLL can track a sudden change in phase,
with the output phase converging to the input phase exponentially fast. The residual phase error
is zero. Note that we could also have inferred this quickly from the final value theorem, without
taking the inverse Laplace transform:
lim θe (t) = lim sΘe (s) = lim sHe (s)Θi (s) (3.36)
t→∞ s→0 s→0
We now examine the response of the first order PLL to a frequency step ∆f , so that the instanta-
neous input frequency is fi (t) = ∆f I[0,∞) (t). The corresponding Laplace transform is Fi (s) = ∆f
s
.
The input phase is the integral of the instantaneous frequency:
Z t
θi (t) = 2π fi (τ )dτ
0
128
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The Laplace transform of the input phase is therefore given by
2π∆f
Θi (s) = 2πF (s)/s =
s2
Given that the input-output relationships are identical for frequency and phase, we can reuse
the computations we did for the phase step input, replacing phase by frequency, to conclude that
fo (t) = ∆f (1 − e−Kt )I[0,∞) (t) → ∆f as t → ∞, so that the steady-state frequency error is zero.
The corresponding output phase trajectory is left as an exercise, but we can use the final value
theorem to compute the limiting value of the phase error:
s 2π∆f 2π∆f
lim θe (t) = lim s 2
=
t→∞ s→0 s+K s K
Thus, the first order PLL can adapt its frequency to track a step frequency change, but there is
a nonzero steady-state phase error. This can be fixed by increasing the order of the PLL, as we
now show below.
Second order PLL: We now introduce a loop filter which feeds back both the phase error and
the integral of the phase error to the VCO input (in control theory terminology, we are using
”proportional plus integral” feedback). That is, G(s) = 1 + a/s, where a > 0. This yields the
second order response
KG(s) K(s + a)
H(s) = = 2
s + KG(s) s + Ks + Ka
s s2
He (s) = = 2
s + KG(s) s + Ks + Ka
√
2
The poles of the response are at s = −K± K2 −4Ka . It is easy to check that the response is stable
(i.e., the poles are in the left half plane) for K > 0. The poles are conjugate symmetric with an
imaginary component if K 2 − 4Ka < 0, or K < 4a, otherwise they are both real-valued. Note
that the phase error due to a step frequency response does go to zero. This is easily seen by
invoking the final value theorem (3.36):
s2 2π∆f
lim θe (t) = lim s =0
t→∞ s→0 s + Ks + Ka s2
2
Thus, the second order PLL has zero steady state frequency and phase errors when responding
to a constant frequency offset.
We have seen now that the first order PLL can handle step phase changes, and the second order
PLL can handle step frequency changes, while driving the steady-state phase error to zero. This
pattern continues as we keep increasing the order of the PLL: for example, a third order PLL
can handle a linear frequency ramp, which corresponds to Θi (s) being proportional to 1/s3.
Linearized analysis provides quick insight into the complexity of the phase/frequency variations
that the PLL can track, as a function of the choice of loop filter and loop gain. We now take
another look at the first order PLL, accounting for the sin(·) nonlinearity in Figure 3.37, in
order to provide a glimpse of the approach used for handling the nonlinear differential equations
involved, and to compare the results with the linearized analysis.
Nonlinear model for the first order PLL: Let us try to express the phase error θe in terms of
the input phase for a first order PLL, with G(s) = 1. The model of Figure 3.37 can be expressed
in the time domain as: Z t
K sin(θe (τ ))dτ = θo (t) = θi (t) − θe (t)
0
129
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Differentiating with respect to t, we obtain
dθi dθe
K sin θe = − (3.37)
dt dt
(Both θe and θi are functions of t, but we suppress the dependence for notational simplicity.)
Let us now specialize to the specific example of a step frequency input, for which
dθi
= 2π∆f
dt
Plugging into (3.37) and rearranging, we get
dθe
= 2π∆f − K sin θe (3.38)
dt
2π ∆f
θe
2π ∆f − K
θ e (0) θ e (1)
2π ∆f
2π ∆f − K
θe
We cannot solve the nonlinear differential equation (3.38) for θe analytically, but we can get
useful insight by a “phase plane plot” of dθdte against θe , as shown in Figure 3.39. Since sin θe ≤ 1,
we have dθdte ≥ 2π∆f − K, so that, if ∆f > 2π K
, then dθdte > 0 for all t. Thus, for large enough
K
frequency offset, the loop never locks. On the other hand, if ∆f < 2π , then the loop does lock.
In this case, starting from an initial error, say θe (0), the phase error follows the trajectory to the
right (if the derivative is positive) or left (if the derivative is negative) until it hits a point at
which dθdte = 0. From (3.38), this happens when
2π∆f
sin θe = (3.39)
K
130
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Due to the periodicity of the sine function, if θ is a solution to the preceding equation, so is
θ + 2π. Thus, if the equation has a solution, there must be at least one solution in the basic
interval [−π, π]. Moreover, since sin θ = sin(π − θ), if θ is a solution, so is π− θ, so that there
are actually two solutions in [−π, π]. Let us denote by θe (0) = sin−1 2π∆f K
the solution that
lies in the interval [−π/2, π/2]. This forms a stable equilibrium: from (3.38), we see that the
derivative is negative for phase error slightly above θe (0), and is positive as the phase error
slightly below θe (0), so that the phase error is driven back to θe (0) in each case. Using exactly
the same argument, we see that the points θe (0) + 2nπ are also stable equilibria, where n takes
integer values. However, another solution to (3.39) is θe (1) = π − θ(0), and translations of it
by . It is easy to see that this is an unstable equilibrium: when there is a slight perturbation,
the sign of the derivative is such that it drives the phase error away from θe (1). In general,
θe (1) + 2nπ are unstable equlibria, where n takes integer values. Thus, if the frequency offset is
K
within the “pull-in range” 2π of the first order PLL, then the steady state phase offset (modulo
−1 2π∆f
, which, for small values of 2π∆f
2π) is θe (0) = sin K K
, is approximately equal to the value
2π∆f
K
predicted by the linearized analysis.
Linear versus nonlinear model: Roughly speaking, the nonlinear model (which we simply
simulate when phase-plane plots get too complicated) tells us when the PLL locks, while the
linearized analysis provides accurate estimates when the PLL does lock. The linearized model
also tells us something about scenarios when the PLL does not lock: when the phase error blows
up for the linearized model, it indicates that the PLL will perform poorly. This is because the
linearized model holds under the assumption that the phase error is small; if the phase error
under this optimistic assumption turns out not to be small, then our initial assumption must
have been wrong, and the phase error must be large.
VCO
(10 KHz/V)
Example 3.5.1 Consider the PLL shown in Figure 3.40, assumed to be locked at time zero.
(a) Suppose that the input phase jumps by e = 2.72 radians at time zero (set the phase just
before the jump to zero, without loss of generality). How long does it take for the difference
between the PLL input phase and VCO output phase to shrink to 1 radian? (Make sure you
specify the unit of time that you use.)
(b) Find the limiting value of the phase error (in radians) if the frequency jumps by 1 KHz just
after time zero.
Solution: Let θe (t) = θi (t) − θo (t) denote the phase error. In the s domain, it is related to the
input phase as follows:
K
Θi (s) − Θe (s) = Θe (s)
s
so that
Θe (s) s
=
Θi (s) s+K
131
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(a) For a phase jump of e radians at time zero, we have Θi (s) = es , which yields
s e
Θe (s) = Θi (s) =
s+K s+K
Going to the time domain, we have
2π∆f
Θi (s) =
s2
so that the phase error is given by
s 2π∆f
Θe (s) = Θi (s) =
s+K s(s + K)
2π∆f
lim θe (t) = lim sΘe (s) =
t→∞ s→0 K
For ∆f = 1 KHz and K = 5 KHz/radian, this yields a phase error of 2π/5 radians, or 72◦ .
3.6.1 FM radio
Pilot
L+R signal DSB−SC modulated
L−R signal
15 19 23 38 53 Frequency (KHz)
132
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
FM mono radio employs a peak frequency deviation of 75 KHz, with the baseband audio message
signal bandlimited to 15 KHz; this corresponds to a modulation index of 5. Using Carson’s
formula, the bandwidth of the FM radio signal can be estimated as 180 KHz. The separation
between adjacent radio stations is 200 KHz. FM stereo broadcast transmits two audio channels,
“left” and “right,” in a manner that is backwards compatible with mono broadcast, in that
a standard mono receiver can extract the sum of the left and right channels, while remaining
oblivious to whether the broadcast signal is mono or stereo. The structure of the baseband signal
into the FM modulator is shown in Figure 3.41. The sum of the left and right channels, or the
L + R signal, occupies a band from 30 Hz to 15 KHz. The difference, or the L − R signal (which
also has a bandwidth of 15 KHz), is modulated using DSB-SC, using a carrier frequency of 38
KHz, and hence occupies a band from 23 KHz to 53 KHz. A pilot tone at 19 KHz, at half the
carrier frequency for the DSB signal, is provided to enable coherent demodulation of the DSB-SC
signal. The spacing between adjacent FM stereo broadcast stations is still 200 KHz, which makes
it a somewhat tight fit (if we apply Carson’s formula with a maximum frequency deviation of 75
KHz, we obtain an RF bandwidth of 256 KHz).
Frequency divide
by two
19 KHz pilot
38 KHz clock
L channel signal Transmitted
FM modulator
Composite signal
R channel signal
message
signal
The format of the baseband signal in Figure 3.41 (in particular, the DSB-SC modulation of the
difference signal) seems rather contrived, but the corresponding modulator can be implemented
quite simply, as sketched in Figure 3.42: we simply switch between the L and R channel audio
signals using a 38 KHz clock. As we shown in one of the problems, this directly yields the L + R
signal, plus the DSB-SC modulated L − R signal. It remains to add in the 19 KHz pilot before
feeding the composite baseband signal to the FM modulator.
The receiver employs an FM demodulator to obtain an estimate of the baseband transmitted
signal. The L + R signal is obtained by bandlimiting the output of the FM demodulator to 15
KHz using a lowpass filter; this is what an oblivious mono receiver would do. A stereo receiver,
in addition, processes the output of the FM demodulator in the band from 15 KHz to 53 KHz.
It extracts the 19 KHz pilot tone, doubles its frequency to obtain a coherent carrier reference,
and uses that to demodulate the L − R signal sent using DSB-SC. It then obtains the L and R
channels by adding and subtracting the L + R and L − R signals from each other, respectively.
133
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
simplification provided by the source-channel separation principle in digital communication (men-
tioned in Chapter 1). Indeed, from Chapter 4 onwards, where we restrict attention to digital
communication, we do not need to discuss source characteristics.
Electron beam Fluorescent screen
Horizontal line scan
Horizontal retrace
Magnetic fields
controlling beam
trajectory
CRT Schematic Raster scan pattern
Figure 3.43: Implementing raster scan in a CRT monitor requires magnetic fields controlled by
sawtooth waveforms.
We first need a quick discussion of CRT TV monitors. An electron beam impinging on a fluores-
cent screen is used to emit the light that we perceive as the image on the TV. The electron beam
is “raster scanned” in horizontal lines moving down the screen, with its horizontal and vertical
location controlled by two magnetic fields created by voltages, as shown in Figure 3.43. We rely
on the persistence of human vision to piece together these discrete scans into a continuous image
in space and time. Black and white TV monitors use a phosphor (or fluorescent material) that
emits white light when struck by electrons. Color TV monitors use three kinds of phosphors,
typically arranged as dots on the screen, which emit red, green and blue light, respectively, when
struck by electrons. Three electron beams are used, one for each color. The intensity of the
emitted light is controlled by the intensity of the electron beam. For historical reasons, the scan
rate is chosen to be equal to the frequency of the AC power (otherwise, for the power supplies
used at the time, rolling bars would appear on the TV screen). In the United States, this means
that the scan rate is set at 60 Hz (the frequency of the AC mains).
In order to enable the TV receiver to control the operation of the CRT monitor, the received signal
must contain not only intensity and color information, but also the timing information required
to correctly implement the raster scan. Figure 3.44 shows the format of the composite video signal
containing this information. In order to reduce flicker (again a historical legacy, since older CRT
monitors could not maintain intensities long enough if the time between refreshes is too long), the
CRT screen is painted in two rounds for each image (or frame): first the odd lines (comprising the
odd field) are scanned, then the even lines (comprising the even field) are scanned. For the NTSC
standard, this is done at a rate of 60 fields per second, or 30 frames per second. A horizontal sync
pulse is inserted between each line. A more complex vertical synchronization waveform is inserted
between each field; this enables vertical synchronization (as well as other functionaliities that
we do not discuss here). The receiver can extract the horizontal and vertical timing information
from the composite video signal, and generate the sawtooth waveforms required for controlling
134
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Video information Video information
(odd field) (even field)
Line 1 Line 3 Line 479 Line 2 Line 4 Line 480
Vertical sync
waveforms
(not shown)
Figure 3.44: The structure of a black and white composite video signal (numbers apply to the
NTSC standard).
the electron beam (one of the first widespread commercial applications of the PLL was for this
purpose). For the NTSC standard, the composite video signal spans 525 lines, about 486 of
which are actually painted (counting both the even and odd fields). The remaining 39 lines
accommodate the vertical synchronization waveforms.
The bandwidth of the baseband video signal can be roughly estimated as follows. Assuming
about 480 lines, with about 640 pixels per line (for an aspect ratio of 4:3), we have about 300,000
pixels, refreshed at the rate of 30 times per second. Thus, our overall sampling rate is about
9 Msamples/second. This can accurately represent a signal of bandwidth 4.5 MHz. For a 6
MHz TV channel bandwidth, DSB and wideband FM are therefore out of the question, and
VSB was chosen to modulate the composite video signal. However, the careful shaping of the
spectrum required for VSB is not carried out at the transmitter, because this would require
the design of high-power electronics with tight specifications. Instead, the transmitter uses a
simple filter, while the receiver, which deals with a low-power signal, accomplishes the VSB
shaping requirement in (3.21). Audio modulation is done using FM in a band adjacent to the
one carrying the video signal.
While the signaling for black and white TV is essentially the same for all existing analog TV
standards, the insertion of color differs among standards such as NTSC, PAL and SECAM. We
do not go into details here, but, taking NTSC as an example, we note that the frequency domain
characteristics of the black and white composite video signal is exploited in rather a clever way
to insert color information. The black and white signal exhibits a clustering of power around the
Fourier series components corresponding to the horizontal scan rate, with the power decaying
around the higher order harmonics. The color modulated signal uses the same band as the black
and white signal, but is inserted between two such harmonics, so as to minimize the mutual
interference between the intensity information and the color information. The color information
is encoded in two baseband signals, which are modulated on to the I and Q components using
QAM. Synchronization information that permits coherent recovery of the color subcarrier for
quadrature demodulation is embedded in the vertical synchronization waveform.
135
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
136
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
waveform.
• A simple-to-understand, suboptimal demodulator for FM is the limiter-discriminator, which
consists of differentiation followed by envelope detection. However, feedback-based techniques
such as the PLL provide superior performance for analog communication, while demodulation
techniques exploiting the structure of the message are preferred for digital communication.
Superheterodyne receiver
• The superhet receiver achieves downconversion in multiple stages, mixing the passband received
waveform at RF down to another passband waveform at IF by beating it against a local oscillator
offset from the desired carrier frequency by fIF . This is followed by demodulation to baseband
by any of a variety of techniques, including coherent downconversion and envelope detection.
• For operation over multiple bands, the RF filter and the LO are tunable. The specifications on
the RF, or image reject, filter can be fairly relaxed, since its key function is to reject the image
frequency (separated from the desired band by 2fIF ).
• The image reject filter typically lets in bands adjacent to the desired frequency, hence the IF
filter must have sharp cutoffs in order to suppress “adjacent channel interference” from these
bands. The fact that the IF is fixed facilitates designing to such tight specifications.
3.8 Endnotes
A towering figure in the history of analog communication techniques is Edwin Howard Arm-
strong, who invented the regenerative circuit (for feedback-based amplification), the superhet
receiver, and FM. Mentioning Armstrong gives us the opportunity for a quick discussion on
the evolution of design philosophy from analog to digital communication. From what we know
about Armstrong, he was a clear thinker who would use systematic experimentation rather than
trial and error, along with physical intuition, to arrive at his inventions. However, he distrusted
results obtained only using mathematics because of the potential for hidden, and potentially
flawed, assumptions in mathematical models. While some amount of skepticism of this nature
is warranted, it is worth noting that digital communication would not exist today if it were not
for mathematical abstractions of the physical world. Indeed, as mentioned in Chapter 7, Claude
Shannon created information theory as a mathematical framework promising the existence of
reliable and efficient communication systems in 1948, but it took many decades of effort by
communication system designers to build practical systems approaching information-theoretic
limits. Shannon’s promise, based purely on idealized mathematical models (which Armstrong
would perhaps not have approved of) was essential in motivating this effort. Furthermore, as
137
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
we gain more confidence in the accuracy of our mathematical models, they play a bigger role in
design, as we shall see when we transition from the piecemeal ideas in this chapter to the more
systematic framework for digital communication in forthcoming chapters. Specifically, the digital
communication system designer today employs sophisticated and accurate mathematical models
for communication channels (developed based on Armstrong’s blend of physical intuition and
experimentation) to establish systematic principles for practical transceiver design and imple-
mentation that approach the performance limits promised by Shannon’s theoretical framework.
Since our treatment of analog communication techniques here emphasizes those that remain
relevant in the digital age, we refer readers interested in a deeper look at analog communication
techniques to the excellent treatment in Ziemer and Tranter [4]. For classic treatments of the
PLL, see Gardner [21] and Viterbi [22] (the latter provides analysis for a nonlinear PLL model).
More recent books include those by Best [23] and Razavi [24].
3.9 Problems
Amplitude modulation
30
20
10
−10
−20
−30
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (milliseconds)
Problem 3.1 Figure 3.45 shows a signal obtained after amplitude modulation by a sinusoidal
message. The carrier frequency is difficult to determine from the figure, and is not needed for
answering the questions below.
(a) Find the modulation index.
(b) Find the signal power.
(c) Find the bandwidth of the AM signal.
138
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(c) Let vp (t) denote the waveform obtained by high-pass filtering the signal u(t) so as to let
through only frequencies above 200 Hz. Find vc (t) and vs (t) such that we can write
vp (t) = vc (t) cos 400πt − vs (t) sin 400πt
and sketch the envelope of v.
Problem 3.4 Consider a message signal m(t) = cos(2πfm t + φ), and a corresponding DSB-SC
signal up (t) = Am(t) cos 2πfc t, where fc > fm .
(a) Sketch the spectra of the corresponding LSB and USB signals (if the spectrum is complex-
valued, sketch the real and imaginary parts separately).
(b) Find explicit time domain expressions for the LSB and USB signals.
Problem 3.5 One way of avoiding the use of a mixer in generating AM is to pass x(t) =
m(t) + α cos 2πfc t through a memoryless nonlinearity and then a bandpass filter.
(a) Suppose that M(f ) = (1 − |f |/10)I[−10,10] (the unit of frequency is in KHz) and fc is 900
KHz. For a nonlinearity f (x) = βx2 + x, sketch the magnitude spectrum at the output of the
nonlinearity when the input is x(t), carefully labeling the frequency axis.
(b) For the specific settings in (a), characterize the bandpass filter that you should use at the
output of the nonlinearity so as to generate an AM signal carrying the message m(t)? That is,
describe the set of the frequencies that the BPF must reject, and those that it must pass.
Problem 3.6 Consider a DSB signal corresponding to the message m(t) = sinc(2t) and a carrier
frequency fc which is 100 times larger than the message bandwidth, where the unit of time is
milliseconds.
(a) Sketch the magnitude spectrum of the DSB signal 10m(t) cos 2πfc t, specifying the units on
the frequency axis.
(b) Specify a time domain expression for the corresponding LSB signal.
(c) Now, suppose that the DSB signal is passed through a bandpass filter whose transfer function
is given by
1
Hp (f ) = (f − fc + )I[fc − 1 ,fc + 1 ] + I[fc + 1 ,fc + 3 ] , f > 0
2 2 2 2 2
139
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Lowpass
filter
Lowpass
filter
Figure 3.46: Block diagram of Weaver’s SSB modulator for Problem 3.7.
Problem 3.7 Figure 3.46 shows a block diagram of Weaver’s SSB modulator, which works if we
choose f1 , f2 and the bandwidth of the lowpass filter appropriately. Let us work through these
choices for a waveform of the form m(t) = AL cos(2πfL t + φL ) + AH cos(2πfH t + φH ), where
fH > fL (the design choices we obtain will work for any message whose spectrum lies in the band
[fL , fH ].
(a) For f1 = (fL + fH )/2 (i.e., choosing the first LO frequency to be in the middle of the message
band), find the time domain waveforms at the outputs of the upper and lower branches after the
first mixer.
(b) Choose the bandwidth of the lowpass filter to be W = fH +2f 2
L
(assume the lowpass filter is
ideal). Find the time domain waveforms at the outputs of the upper and lower branches after
the LPF.
(c) Now, assuming that f2 ≫ fH , find a time domain expression for the output waveform, as-
suming that the upper and lower branches are added together. Is this an LSB or USB waveform?
What is the carrier frequency?
(d) Repeat (c) when the lower branch is subtracted from the upper branch.
Remark: Weaver’s modulator does not require bandpass filters with sharp cutoffs, unlike the
direct approach to generating SSB waveforms by filtering DSB-SC waveforms. It is also simpler
than the Hilbert transform method (the latter requires implementation of a π/2 phase shift over
the entire message band).
Hp (f)
Problem 3.8 Consider the AM signal up (t) = 2(10 + cos 2πfm t) cos 2πfc t, where the message
frequency fm is 1 MHz and the carrier frequency fc is 885 MHz.
(a) Suppose that we use superheterodyne reception with an IF of 10.7 MHz, and envelope detec-
tion after the IF filter. Envelope detection is accomplished as in Figure 3.8, using a diode and
an RC circuit. What would be a good choice of C if R = 100 ohms?
(b) The AM signal up (t) is passed through the bandpass filter with transfer function Hp (f ) de-
picted (for positive frequencies) in Figure 3.47. Find the I and Q components of the filter output
with respect to reference frequency fc of 885 MHz. Does the filter output represent a form of
modulation you are familiar with?
140
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 3.9 Consider a message signal m(t) with spectrum M(f ) = I[−2,2] (f ).
(a) Sketch the spectrum of the DSB-SC signal uDSB−SC = 10m(t) cos 300πt. What is the power
and bandwidth of u?
(b) The signal in (a) is passed through an envelope detector. Sketch the output, and comment
on how it is related to the message.
(c) What is the smallest value of A such that the message can be recovered without distortion
from the AM signal uAM = (A + m(t)) cos 300πt by envelope detection?
(d) Give a time-domain expression of the form
obtained by high-pass filtering the DSB signal in (a) so as to let through only frequencies above
150 Hz.
(e) Consider a VSB signal constructed by passing the signal in (a) through a passband filter with
transfer function for positive frequencies specified by:
f − 149 149 ≤ f ≤ 151
Hp (f ) =
2 f ≥ 151
(you should be able to sketch Hp (f ) for both positive and negative frequencies.) Find a time
domain expression for the VSB signal of the form
Problem 3.10 Consider Figure 3.17 depicting VSB spectra. Suppose that the passband VSB
filter Hp (f ) is specified (for positive frequencies) as follows:
1, 101 ≤ f < 102
1
Hp (f ) = (f − 99) , 99 ≤ f ≤ 101
2
0, else
(a) Sketch the passband transfer function Hp (f ) for both positive and negative frequencies.
(b) Sketch the spectrum of the complex envelope H(f ), taking fc = 100 as a reference.
(c) Sketch the spectra (show the real and imaginary parts separately) of the I and Q components
of the impulse response of the passband filter.
(d) Consider a message signal of the form m(t) = 4sinc4t − 2 cos 2πt. Sketch the spectrum of the
DSB signal that results when the message is modulated by a carrier at fc = 100.
(e) Now, suppose that the DSB signal in (d) is passed through the VSB filter in (a)-(c). Sketch the
spectra of the I and Q components of the resulting VSB signal, showing the real and imaginary
parts separately.
(f) Find a time domain expression for the Q component.
Problem 3.12 Find an explicit time domain expression for the Hilbert transform of m(t) =
sinc(2t).
141
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Superheterodyne reception
Problem 3.13 A dual band radio operates at 900 MHz and 1.8 GHz. The channel spacing in
each band is 1 MHz. We wish to design a superheterodyne receiver with an IF of 250 MHz. The
LO is built using a frequency synthesizer that is tunable from 1.9 to 2.25 GHz, and frequency
divider circuits if needed (assume that you can only implement frequency division by an integer).
(a) How would you design a superhet receiver to receive a passband signal restricted to the band
1800-1801 MHz? Specify the characteristics of the RF and IF filters, and how you would choose
and synthesize the LO frequency.
(b) Repeat (a) when the signal to be received lies in the band 900-901 MHz.
Angle modulation
600
400
Phase deviation (degrees)
200
−200
−400
−600
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (milliseconds)
Problem 3.14 Figure 3.48 shows, as a function of time, the phase deviation of a bandpass FM
signal modulated by a sinusoidal message.
(a) Find the modulation index (assume that it is an integer multiple of π for your estimate).
(b) Find the message bandwidth.
(c) Estimate the bandwidth of the FM signal using Carson’s formula.
Problem 3.15 The input m(t) to an FM modulator with kf = 1 has Fourier transform
j2πf |f | < 1
M(f ) =
0 else
The output of the FM modulator is given by
u(t) = A cos(2πfc t + φ(t))
where fc is the carrier frequency.
(a) Find an explicit time domain expression for φ(t) and carefully sketch φ(t) as a function of
time.
(b) Find the magnitude of the instantaneous frequency deviation from the carrier at time t = 41 .
(c) Using the result from (b) as an approximation for the maximum frequency deviation, estimate
the bandwidth of u(t).
142
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 3.16 Let p(t) = I[− 1 , 1 ] (t) denote a rectangular pulse of unit duration. Construct the
2 2
signal
X∞
m(t) = (−1)n p(t − n)
n=−∞
where Z t
φ(t) = 20π m(τ )dτ + a
−∞
Problem 3.17 Let u(t) = 20 cos(2000πt + φ(t)) denote an angle modulated signal.
(a) For φ(t) = 0.1 cos 2πt, what is the approximate bandwidth of u?
(b) Let y(t) = u12 (t). Specify the frequency bands spanned by y(t). In particular, specify the
output when y is passed through:
(i) A BPF centered at 12KHz. Using Carson’s formula, determine the bandwidth of the BPF
required to recover most of the information in φ from the output.
(ii) An ideal LPF of bandwidth 200 Hz.
(iii) A BPF of bandwidth
P 100 Hz centered at 11 KHz.
(c) For φ(t) = 2 n s(t − 2n), where s(t) = (1 − |t|)I[−1,1] .
(i) Sketch the instantaneous frequency deviation from the carrier frequency of 1 KHz.
(ii) Show that we can write
X
u(t) = cn cos(2000πt + nαt)
n
Problem 3.18 Consider the set-up of Problem 3.16, taking the unit of time in milliseconds for
concreteness. You do not need the value of fc , but you can take it to be 1 MHz.
(a) Numerically (e.g., using Matlab) compute the Fourier series expansion for the complex enve-
lope of the FM waveform, in the same manner as was done for a sinusoidal message. Report the
magnitudes of the Fourier series coefficients for the first 5 harmonics.
(b) Find the 90%, 95% and 99% power containment bandwidths. Compare with the estimate
from Carson’s formula obtained in Problem 3.16(b).
Problem 3.19 A VCO with a quiescent frequency of 1 GHz, with a frequency sweep of 2
MHz/mV produces an angle modulated signal whose phase deviation θ(t) from a carrier frequency
fc of 1 GHz is shown in Figure 3.49.
(a) Sketch the input m(t) to the VCO, carefully labeling both the voltage and time axes.
(b) Estimate the bandwidth of the angle modulated signal at the VCO output. You may ap-
proximate the bandwidth of a periodic signal by that of its first harmonic.
143
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
θ (t)
VCO 10 π
m(t) cos(2π fc t + θ (t))
2 MHz/mV
−3 −1 1 3 t(microseconds)
Uncategorized problems
Problem 3.20 The signal m(t) = 2 cos 20πt − cos 40πt, where the unit of time is millisec-
onds, and the unit of amplitude is millivolts (mV), is fed to a VCO with quiescent frequency
of 5 MHz and frequency deviation of 100 KHz/mV. Denote the output of the VCO by y(t).
(a) Provide an estimate of the bandwidth of y.
(b) The signal y(t) is passed through an ideal bandpass filter of bandwidth 5 KHz, centered at
5.005 MHz. Describe in detail how you would compute the power at the filter output (if you can
compute the power in closed form, do so).
m(t)
... 10 mV
t (ms)
0 1 2 3
...
−10 mV
Lowpass u c (t)
Filter
2cos 402 π t
u p (t)
−2sin 402π t
Lowpass u s (t)
Filter
Figure 3.51: Downconversion using 201 KHz LO (t in ms in the figure) for Problem 3.21(b)-(c).
Problem 3.21 Consider the AM signal up (t) = (A + m(t)) cos 400πt (t in ms) with message
signal m(t) as in Figure 3.50, where A is 10 mV.
(a) If the AM signal is demodulated using an envelope detector with an RC filter, how should
you choose C if R = 500 ohms? Try to ensure that the first harmonic (i.e., the fundamental)
and the third harmonic of the message are reproduced with minimal distortion.
(b) Now, consider an attempt at synchronous demodulation, where the AM signal is downcon-
verted using a 201 KHz LO, as shown in Figure 3.51, find and sketch the I and Q components,
144
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
uc (t) and us (t), for 0 ≤ t ≤ 2 (t in ms).
(c) Describe how you would recover the original message m(t) from the downconverter outputs
uc (t) and us (t), drawing block diagrams as needed.
Problem 3.22 The square wave message signal m(t) in Figure 3.50 is input to a VCO with
quiescent frequency 200 KHz and frequency deviation 1 KHz/mV. Denote the output of the
VCO by up (t).
(a) Sketch the I and Q components of the FM signal (with respect to a frequency reference of
200 KHz and a phase reference chosen such that the phase is zero at time zero) over the time
interval 0 ≤ t ≤ 2 (t in ms), clearly labeling the axes.
(b) In order to extract the I and Q components using a standard downconverter (mix with LO
and then lowpass filter), how would you choose the bandwidth of the LPFs used at the mixer
outputs?
φ (t)
8π
4π
1 2 3 4 5 6 7 8 t (msec)
−4π
Problem 3.23 The output of an FM modulator is the bandpass signal y(t) = 10 cos(300πt +
φ(t)), where the unit of time is milliseconds, and the phase φ(t) is as sketched in Figure 3.52.
(a) Suppose that y(t) is the output of a VCO with frequency deviation 1 KHz/mV and quiescent
frequency 149 KHz, find and sketch the input to the VCO.
(b) Use Carson’s formula to estimate the bandwidth of y(t), clearly stating the approximations
that you make.
Set-up for PLL problems: For the next few problems on PLL modeling and analysis, consider
the linearized model in Figure 3.38, with the following notation: loop filter G(s), loop gain K,
and VCO modeled as 1/s. Recall from your background on signals and systems that a second
order system of the form s2 +2ζω1n s+ω2 is said to have natural frequency ωn (in radians/second)
n
and damping factor ζ.
Problem 3.24 Let H(s) denote the gain from the PLL input to the output of the VCO. Let
He (s) denote the gain from the PLL input to the input to the loop filter. Let Hm (s) denote the
gain from the PLL input to the VCO input.
(a) Write down the formulas for H(s), He (s), Hm (s), in terms of K and G(s).
(b) Which is the relevant transfer function if the PLL is being used for FM demodulation?
(c) Which is the relevant transfer function if the PLL is being used for carrier phase tracking?
(d) For G(s) = s+8
s
and K = 2, write down expressions for H(s), He (s) and Hm (s). What is the
natural frequency and the damping factor?
145
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 3.25 Suppose the PLL input exhibits a frequency jump of 1 KHz.
(a) How would you choose the loop gain K for a first order PLL (G(s) = 1) to ensure a steady
state error of at most 5 degrees?
(b) How would you choose the parameters a and K for a second order PLL (G(s) = s+a s
) to have
1
a natural frequency of 1.414 KHz and a damping factor of √2 . Specify the units for a and K.
(c) For the parameter choices in (b), find and roughly sketch the phase error as a function of
time for a frequency jump of 1 KHz.
Phase Detector
G(s) = (s+a)/s
1 volt/radian
VCO
1KHz/volt
v(t)
Problem 3.27 Consider the PLL depicted in Figure 3.53, with input phase φ(t). The output
signal of interest to us here is v(t), the VCO input. The parameter for the loop filter G(s) is
given by a = 1000π radians/sec.
(a) Assume that the PLL is locked at time 0, and suppose that φ(t) = 1000πtI{t>0} . Find the
limiting value of v(t).
(b) Now, suppose that φ(t) = 4π sin 1000πt. Find an approximate expression for v(t). For full
credit, simplify as much as possible.
(c) For part (b), estimate the bandwidth of the passband signal at the PLL input.
Problem 3.28 Answer the following questions regarding commercial analog communication sys-
tems (some of which may no longer exist in your neighborhood).
(a) (True or False) The modulation format for analog cellular telephony was conventional AM.
(b) (Multiple choice) FM was used in analog TV as follows:
(i) to modulate the video signal
(ii) to modulate the audio signal
(iii) FM was not used in analog TV systems.
(c) A superheterodyne receiver for AM radio employs an intermediate frequency (IF) of 455 KHz,
and has stations spaced at 10 KHz. Comment briefly on each of the following statements:
146
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(i) The AM band is small enough that the problem of image frequencies does not occur.
(ii) A bandwidth of 20 KHz for the RF front end is a good choice.
(iii) A bandwidth of 20 KHz for the IF filter is a good choice.
Laboratory Assignment
1) Generate a message signal m(t) using binary digital modulation with a sine pulse by modi-
fying Code Fragment 2.3.2 to use random bits. Set the symbol time T to 1 millisecond. Take
a waveform segment spanning ns symbols and take its Fourier transform by modifying Code
Fragment 2.5.1 (or using the code from Lab 1), choosing the length of the FFT (and hence ns )
so as to get a frequency resolution of 1 Hz. Plot the magnitude squared of the Fourier transform,
divided by ns T , the length of the observation interval. This is an estimate of the power spectral
density (PSD) Sm (f ), which is formally defined later, in Chapters 4 and 5.
2) Repeat the PSD estimation in 1) over multiple runs and average the estimates, choosing the
number of runs large enough so as to get a smooth estimate of the PSD. Eyeball the PSD to
estimate the bandwidth of the signal (the units should be consistent with our assumption of
T = 1 ms).
3) Now, generate the DSB signal u(t) = m(t) cos 2πfc t, where fc = 10/T . Choose the sampling
rate for generating discrete time samples as 4fc . Plot the DSB signal over 4 symbols.
4) Estimate the PSD Su (f ) of the DSB signal generated in 3) by choosing a large enough number
of symbols as in 1), and averaging over several runs as in 2). What is the relationship with the
PSD obtained in 2)?
5) Repeat 3) and 4) for the AM signal u(t) = (Ac + m(t)) cos 2πfc t, where fc = 10/T and Ac
is chosen to have the smallest possible value that allows envelope detection. As before, choose
the sampling rate for generating discrete time samples as 4fc . Do you run into difficulty when
computing the PSD? Explain.
6) Starting with an AM signal as in 5), implement an envelope detector as follows:
(a) Pass u(t) through an idealized diode to obtain u+ (t) = u(t)Iu(t)≥0 .
t
(b) Pass u+ (t) through an RC filter with impulse response h(t) = e− RC It≥0 . You can use the
contconv function in Lab 1 for this purpose. Choose the value of RC based on the design rule of
thumb discussed in Chapter 3.
(c) Implement a DC block simply by subtracting out the empirical mean from the output of (b).
Plot the output of the envelope detector, along with the original message m(t).
7) Repeat 6) for different values of RC (both too large and too small), and comment on how the
resulting message estimate is affected by the value of RC.
Envelope detector based I/Q downconversion
We know from Chapter 2 that a passband signal can be downconverted to complex baseband by
mixing with the cosine and sine of the carrier and then low pass filtering. However, implementing
mixers may not be easy at really high carrier frequencies (e.g., for coherent optical communi-
cation). Envelope detection, after adding strong locally generated carrier components to the
147
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
received waveform, provides an alternative approach for downconversion that may be easier to
implement in such scenarios. Consider the QAM received signal
where vc , vs are real baseband messages. The receiver’s local oscillator generates Ac cos(2πfc t+θ)
and Ac sin(2πfc t+θ), where θ is the offset between the carrier reference used at the transmitter to
generate the QAM signal and the local copy of the carrier at the receiver. Instead of mixing the
local oscillator outputs against v(t), we add them to v(t) and then perform envelope detection,
as described in 6). That is, we perform the following operations:
• Pass v(t) + Ac cos(2πfc t + θ) through an envelope detector to obtain ṽc (t).
• Pass v(t) + Ac sin(2πfc t + θ) through an envelope detector to obtain ṽs (t).
8) For Ac large enough, can you find simple relationships between ṽc (t), ṽs (t) and the original
messages vc (t), vs (t)? Is there a simple relationship between the complex baseband waveforms
v(t) = vc (t) + jvs (t) and ṽ(t) = ṽc (t) + jṽs (t)?
9) Generate vc and vs as in 1), using different sequences of random bits, and generate v(t) as
in (3.40), setting fc = 10/T as in 3). Implement the preceding envelope detection operations,
first setting θ = 0. Plot ṽc (t) and ṽs (t) for Ac “large enough.” Also plot for reference vc (t) and
vs (t). Comment on whether the results conform to your expectations from 7). How small can
you make Ac while still getting “good” results?
10) Repeat 9) for θ = π4 .
11) Assuming that the receiver knows θ, can you recover estimates of vc (t) and vs (t) from ṽc (t),
ṽs (t)? Implement these operations and show plots of the estimates and the original waveforms.
Lab Report
• Answer all questions and print out the most useful plots to support your answers.
• Write a paragraph about any questions or confusions that you may have experienced with
this lab.
Laboratory Assignment
P
Consider the message signal m(t) = n b[n]p(t − nT ), where b[n] are chosen from {−1, +1}, and
T is the symbol interval. You can generate such a signal by modifying Code Fragment 2.3.2.
Define a passband FM signal modulated by the message by
148
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(assume t ≥ 0). The complex envelope of sp with respect to reference 2πfc t is given by s(t) =
ejθ(t) . We consider a digital message signal of the form m(t) = n b[n]p(t − nT ), where b[n] are
P
chosen independently and with equal probability from {−1, +1}.
The following code fragment generates the FM waveform for a rectangular pulse I[0,1] .
oversampling_factor = 16;
%for a pulse with amplitude one, the max frequency deviation is given by kf
kf=4;
%increase the oversampling factor if kf (and hence frequency deviation, and hence bw of FM s
oversampling_factor = ceil(max(kf,1)*oversampling_factor);
ts=1/oversampling_factor;%sampling time
nsamples = ceil(1/ts);
pulse = ones(nsamples,1); %rectangular pulse
nsymbols =10;
symbols=zeros(nsymbols,1);
%random symbol sequence
symbols = sign(rand(nsymbols,1)-0.5);
%generate digitally modulated message
nsymbols_upsampled=1+(nsymbols-1)*nsamples;
symbols_upsampled=zeros(nsymbols_upsampled,1);
symbols_upsampled(1:nsamples:nsymbols_upsampled)=symbols;
message = conv(symbols_upsampled,pulse);
%FM signal phase obtained by integrating the message
theta = 2*pi*kf*ts*cumsum(message);
cenvelope=exp(j*theta);
L=length(cenvelope);
time=(0:L-1)*ts;
Icomponent = real(cenvelope);
Qcomponent= imag(cenvelope);
%plot I component
plot(time,Icomponent);
1) By modifying and enhancing the preceding code fragment as needed, plot the I and Q compo-
nents of the complex envelope for a random sequence of bits as a function of time for kf = 0.25.
Also plot θ(t)/π versus t. How big are the changes in θ(t) corresponding to a given message
bit b[n]? Do you notice a pattern in how the I and Q components depend on the message bits
{b[n]}?
Remark: The special case of kf = 1/4 is a digital modulation scheme known as Minimum Shift
Keying (MSK). It can be viewed as FM modulation using a digital message, but the plots of the I
and Q components should indicate that MSK can also be interpreted as the I and Q components
each being amplitude modulated by a different set of bits, with an offset between the I and Q
components.
2) Now redo (1) for kf = 4. The patterns in the I and Q components are much harder to see now.
We typically do not use such wideband FM for digital modulation, but may use it for analog
messages.
3) For a complex baseband waveform y(t) = yc (t) + jys (t) = e(t)ejθ(t) , we know that
ys (t)
θ(t) = tan−1
yc (t)
149
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Show that
d yc (t)ys′ (t) − ys (t)yc′ (t)
θ(t) = (3.41)
dt yc2 (t) + ys2 (t)
1 d
For an FM signal, the message can be estimated as 2πk f dt
θ(t), with the derivative computed using
a highpass filter. Thus, this can be viewed as a baseband version of the limiter-discriminator
demodulator for FM. It can be implemented using the following code fragment.
%baseband discriminator
%differencing operation approximates derivative
Iderivative = [0;diff(Icomponent)]/ts;
Qderivative = [0;diff(Qcomponent)]/ts;
message_estimate = (1/(2*pi*kf))*(Icomponent.*Qderivative - Qcomponent.*Iderivative)./(I
4) Apply the preceding approach to the noiseless FM signals generated in parts 1) and 2). Plot
the estimated message and the original message on the same plot, and comment on whether you
are getting a good estimate.
5) Add an arbitrary phase to the complex envelope.
Redo 4). What happens to the estimated message? Are you still getting a good estimate of the
original message?
6) Now, add a frequency offset as well as a phase offset to the complex envelope.
Redo 4). What happens to the estimated message? Are you still getting a good estimate of the
original message? If you are not quite getting the original message back, what can you do to fix
the situation?
Remark: You should find that this crude differentiation technique does work for low noise (we
are considering zero noise). However, it is rather fragile when noise is inserted. We do not
explore this in this lab, but you are welcome to try adding Gaussian noise samples to the I and
Q components and see how the discriminator performs for different values of noise variance. At
the very least, one would need to lowpass filter the message estimate obtained above to average
out noise, but it is far better to use feedback-based techniques such as the PLL for general FM
demodulation, or, for digital messages, to use demodulation techniques that use the structure of
the message. We do not discuss such techniques in this lab.
7) We now explore the spectral properties of FM. For the complex envelope s(t) = ejθ(t) , compute
the Fourier transform numerically, choosing the length of the FFT (and hence ns ) so as to get
a frequency resolution of 0.1. You can modify Code Fragment 2.5.1 or reuse code from Lab 1.
Compute the power spectral density (PSD), defined as the magnitude squared of the Fourier
transform divided by the interval over which you are computing it, and then averaged over
multiple runs. Plot the PSD for kf = 1/4 and kf = 4. You can modify the following code
fragment as needed.
150
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
nsymbols =1000;
symbols=zeros(nsymbols,1);
nruns=1000;
fs=0.1;
Nmin = ceil(1/(fs_desired*ts)); %minimum length DFT for desired frequency granularity
message_length=1+(nsymbols-1)*nsamples+length(pulse)-1;
Nmin = max(message_length,Nmin);
% %for efficient computation, choose FFT size to be power of 2
Nfft = 2^(nextpow2(Nmin)) %FFT size = the next power of 2 at least as big as Nmin
psd=zeros(Nfft,1);
for runs=1:nruns,
%random symbol sequence
symbols = sign(rand(nsymbols,1)-0.5);
nsymbols_upsampled=1+(nsymbols-1)*nsamples;
symbols_upsampled=zeros(nsymbols_upsampled,1);
symbols_upsampled(1:nsamples:nsymbols_upsampled)=symbols;
message = conv(symbols_upsampled,pulse);
%FM signal phase
theta = 2*pi*kf*ts*cumsum(message);
cenvelope=exp(j*theta);
time=(0:length(cenvelope)-1)*ts;
% %freq domain signal computed using DFT
cenvelope_freq = ts*fft(cenvelope,Nfft); %FFT of size Nfft, automatically zeropads as needed
cenvelope_freq_centered = fftshift(cenvelope_freq); %shifts DC to center of spectrum
psd=psd+abs(cenvelope_freq_centered).^2;
end
psd=psd/(nruns*nsymbols);
fs=1/(Nfft*ts) %actual frequency resolution attained
% %set of frequencies for which Fourier transform has been computed using DFT
freqs = ((1:Nfft)-1-Nfft/2)*fs;
%plot the PSD
plot(freqs,psd);
1
8) Plot the PSD for kf = 4
and kf = 4. Are your results consistent with Carson’s formula?
9) Redo 8), replacing the rectangular pulse by a sine pulse: p(t) = sin(πt) I[0,1] (t). Are your results
consistent with Carson’s formula? Compare the spectrum occupancy in 8) and 9), commenting
on the roles of kf and p(t).
pulse =transpose(sin(pi*(0:ts:1)));
10) Now, let us increase the dynamic range of the message by replacing the bits by numbers
drawn from a Gaussian distribution with the same variance.
symbols = randn(nsymbols,1);
Compute and plot the PSD for a sinusoidal pulse, and compare with the spectral occupancy with
that in 9).
11) Assuming that the unit of time is 1 ms, estimate the bandwidth of the FM signals whose
PSDs you plotted in 9). You can either eyeball it, or estimate the length of the interval over
which 95% of the signal power is contained.
Lab Report
151
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
• Answer all questions and print out the most useful plots to support your answers.
• Write a paragraph about any questions or confusions that you may have experienced with
this lab.
152
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 4
Digital Modulation
+1 +1 +1 +1
Bit−to−Symbol Map
0 +1
Pulse ... ...
...0110100... Modulation
1 −1
−1 −1 −1
Symbol interval
T
Figure 4.1: Running example: Binary antipodal signaling using a timelimited pulse.
Digital modulation is the process of translating bits to analog waveforms that can be sent over
a physical channel. Figure 4.1 shows an example of a baseband digitally modulated waveform,
where bits that take values in {0, 1} are mapped to symbols in {+1, −1}, which are then used
to modulate translates of a rectangular pulse, where the translation corresponding to successive
symbols is the symbol interval T . The modulated waveform can be represented as a sequence of
symbols (taking values ±1 in the example) multiplying translates of a pulse (rectangular in the
example). This is an example of a widely used form of digital modulation termed linear modula-
tion, where the transmitted signal depends linearly on the symbols to be sent. Our treatment of
linear modulation in this chapter generalizes this example in several ways. The modulated signal
in Figure 4.1 is a baseband signal, but what if we are constrained to use a passband channel
(e.g., a wireless cellular system operating at 900 MHz)? One way to handle this to simply trans-
late this baseband waveform to passband by upconversion; that is, send up (t) = u(t) cos 2πfc t,
where the carrier frequency fc lies in the desired frequency band. However, what if the frequency
occupancy of the passband signal is strictly constrained? (Such constraints are often the result
of guidelines from standards or regulatory bodies, and serve to limit interference between users
operating in adjacent channels.) Clearly, the timelimited modulation pulse used in Figure 4.1
spreads out significantly in frequency. We must therefore learn to work with modulation pulses
which are better constrained in frequency. We may also wish to send information on both the
I and Q components. Finally, we may wish to pack in more bits per symbol; for example, we
could send 2 bits per symbol by using 4 levels, say {±1, ±3}.
Chapter plan: In Section 4.1, we develop an understanding of the structure of linearly mod-
ulated signals, using the binary modulation in Figure 4.1 to lead into variants of this example,
corresponding to different signaling constellations which can be used for baseband and passband
channels. In Section 4.2, we discuss how to quantify the bandwidth of linearly modulated signals
by computing the power spectral density. With these basic insights in place, we turn in Section
4.3 to a discussion of modulation for bandlimited channels, treating signaling over baseband and
passband channels in a unified framework using the complex baseband representation. We note,
153
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
invoking Nyquist’s sampling theorem to determine the degrees of freedom offered by bandlimited
channels, that linear modulation with a bandlimited modulation pulse can be used to fill all of
these degrees of freedom. We discuss how to design bandlimited modulation pulses based on the
Nyquist criterion for intersymbol interference (ISI) avoidance. These concepts are reinforced by
Software Lab 8.1, which provides a hands-on demonstration of Nyquist signaling in the absence
of noise. Finally, we discuss orthogonal and biorthogonal modulation in Section 4.4.
Software: Over the course of this and later chapters, we develop a simulation framework for
simulating linear modulation over noisy dispersive channels. Software Lab 4.1 in this chapter is
a first step in this direction. Appendix 4.B provides guidance for developing the software for this
lab.
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
0 0.5 1 1.5 2 2.5 3
t/T
Figure 4.2: BPSK illustrated for fc = T4 and symbol sequence +1, −1, −1. The solid line corre-
sponds to the passband signal up (t), and the dashed line to the baseband signal u(t). Note that,
due to the change in sign between the first and second symbols, there is a phase discontinuity of
π at t = T .
The linearly modulated signal depicted in Figure 4.1 can be written in the following general
form: X
u(t) = b[n]p(t − nT ) (4.1)
n
where {b[n]} is a sequence of symbols, and p(t) is the modulating pulse. The symbols take values
in {−1, +1} in our example, and the modulating pulse is a rectangular timelimited pulse. As we
proceed along this chapter, we shall see that linear modulation as in (4.1) is far more generally
applicable, in terms of the set of possible values taken by the symbol sequence, as well as the
choice of modulating pulse.
The modulated waveform (4.1) is a baseband waveform. While it is timelimited in our example,
and hence cannot be strictly bandlimited, it is approximately bandlimited to a band around DC.
Now, if we are given a passband channel over which to send the information encoded in this
waveform, one easy approach is to send the passband signal
where fc is the carrier frequency. That is, the modulated baseband signal is sent as the I
component of the passband signal. To see what happens to the passband signal as a consequence
of the modulation, we plot it in Figure 4.2. For the nth symbol interval nT ≤ t < (n + 1)T , we
have up (t) = cos 2πfc t if b[n] = +1, and up (t) = − cos 2πfc t = cos(2πfc t + π) if b[n] = −1. Thus,
154
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
binary antipodal modulation switches the phase of the carrier between two values 0 and π, which
is why it is termed Binary Phase Shift Keying (BPSK) when applied to a passband channel:
We know from Chapter 2 that any passband signal can be represented in terms of two real-valued
baseband waveforms, the I and Q components.
up (t) = uc (t) cos 2πfc t − us (t) sin 2πfc t
The complex envelope of up (t) is given by u(t) = uc (t) + jus (t). For BPSK, the I component is
modulated using binary antipodal signaling, while the Q component is not used, so that u(t) =
uc (t). However, noting that the two signals, uc (t) cos 2πfc t and us (t) sin 2πfc t are orthogonal
regardless of the choice of uc and us , we realize that we can modulate both I and Q components
independently, without affecting their orthogonality. In this case, we have
X X
uc (t) = bc [n]p(t − nT ), us (t) = bs [n]p(t − nT )
n n
0.5
−0.5
−1
−1.5
0 0.5 1 1.5 2 2.5 3
t/T
Figure 4.3: QPSK illustrated for fc = T4 , with symbol sequences {bc [n]} = {+1, −1, −1} and
{bs [n]} = {−1, +1, −1}. The phase of the passband signal is −π/4 in the first symbol interval,
switches to 3π/4 in the second, and to −3π/4 in the third.
Let us see what happens to the passband signal when bc [n], bs [n] each take values in {±1} (i.e.,
b[n] = bc [n] + jbs [n] takes values√in {±1 ± j}). For the nth symbol interval nT ≤ t < (n + 1)T :
up (t) = cos 2πfc t − sin 2πfc t = √2 cos (2πfc t + π/4) if bc [n] = +1, bs [n] = +1;
up (t) = cos 2πfc t + sin 2πfc t = 2√cos (2πfc t − π/4) if bc [n] = +1, bs [n] = −1;
up (t) = − cos 2πfc t − sin 2πfc t = √2 cos (2πfc t + 3π/4) if bc [n] = −1, bs [n] = +1;
up (t) = − cos 2πfc t + sin 2πfc t = 2 cos (2πfc t − 3π/4) if bc [n] = −1, bs [n] = −1.
Thus, the modulation causes the passband signal to switch its phase among four possibilities,
{±π/4, ±3π/4}, as illustrated in Figure 4.3, which is why we call it Quadrature Phase Shift
Keying (QPSK).
Equivalently, we could have √ seen this from the complex envelope. Note that the QPSK symbols
can be written as b[n] = 2ejθ[n], where θ[n] ∈ {±π/4, ±3π/4}. Thus, over the nth symbol, we
have
√ √
up (t) = Re b[n]ej2πfc t = Re 2ejθ[n] ej2πfc t = 2 cos (2πfc t + θ[n]) , nT ≤ t < (n + 1)T
155
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
This indicates that it is actually easier to figure out what is happening to the passband signal
by working with the complex envelope. We therefore work in the complex baseband domain for
the remainder of this chapter.
In general, the complex envelope for a linearly modulated signal is given by (4.1), where b[n] =
bc [n] + jbs [n] = r[n]ejθ[n] can be complex-valued. We can view this as bc [n] modulating the
I component and bs [n] modulating the Q component, or as scaling the envelope by r[n] and
switching the phase by θ[n]. The set of values that each symbol can take is called the signaling
alphabet, or constellation. We can plot the constellation in a two-dimensional plot, with the x-
axis denoting the real part bc [n] (corresponding to the I component) and the y-axis denoting the
imaginary part bs [n] (corresponding to the Q component). Indeed, this is why linear modulation
over passband channels is also termed two-dimensional modulation. Note that this provides a
unified description of constellations that can be used over both baseband and passband channels:
for physical baseband channels, we simply constrain b[n] = bc [n] to be real-valued, setting bs [n] =
0.
BPSK/2PAM
4PAM
16QAM
QPSK/4PSK/4QAM 8PSK
Figure 4.4: Some commonly used constellations. Note that 2PAM and 4PAM can be used over
both baseband and passband channels, while the two-dimensional constellations QPSK, 8PSK
and 16QAM are for use over passband channels.
Figure 4.4 shows some common constellations. Pulse Amplitude Modulation (PAM) corresponds
to using multiple amplitude levels along the I component (setting the Q component to zero).
This is often used for signaling over physical baseband channels. Using PAM along both I and Q
axes corresponds to Quadrature Amplitude Modulation (QAM). If the constellation points lie on
a circle, they only affect the phase of the carrier: such signaling schemes are termed Phase Shift
Keying (PSK). When naming a modulation scheme, we usually indicate the number of points
in the constellations. BPSK and QPSK are special: BPSK (or 2PSK) can also be classified as
2PAM, while QPSK (or 4PSK) can also be classified as 4QAM.
Each symbol in a constellation of size M can be uniquely mapped to log2 M bits. For a symbol
rate of 1/T symbols per unit time, the bit rate is therefore logT2 M bits per unit time. Since the
transmitted bits often contain redundancy due to a channel code employed for error correction or
detection, the information rate is typically smaller than the bit rate. The choice of constellation
for a particular application depends on considerations such as power-bandwidth tradeoffs and
implementation complexity. We shall discuss these issues once we develop more background.
156
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
H(f)
∆f
1
x(t) Power Meter S x( ν) ∆ f
We now introduce the important concept of power spectral density (PSD), which specifies how
the power in a signal is distributed in different frequency bands.
Power Spectral Density: The power spectral density (PSD), Sx (f ), for a finite-power signal
x(t) is defined through the conceptual measurement depicted in Figure 4.5. Pass x(t) through
an ideal narrowband filter with transfer function
1, ν − ∆f < f < ν + ∆f
Hν (f ) = 2 2
0, else
The PSD evaluated at ν, Sx (ν), is defined as the measured power at the filter output, divided
by the filter width ∆f (in the limit as ∆f → 0).
Example (PSD of complex exponentials): Let us now find the PSD of x(t) = Aej(2πf0 t+θ) .
Since the frequency content of x is concentrated at f0 , the power meter in Figure 4.5 will have
zero output for ν 6= f0 (as ∆f → 0, f0 falls outside the filter bandwidth for any such f0 ). Thus,
157
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Sx (f ) = 0 for f 6= f0 . On the other hand, for ν = f0 , the output of the power meter is the entire
power of x, which is
Z f0 + ∆f
2
2
Px = A = Sx (f )df
f0 − ∆f
2
We conclude that the PSD is Sx (f ) = A2 δ(f −f0 ). Extending this reasoning to a sum of complex
exponentials, we have
X X
PSD of Ai ej(2πfi t+θi ) = A2i δ(f − fi )
i i
where fi are distinct frequencies (positive or negative), and Ai , θi are the amplitude and phase,
respectively, of the ith complex exponential. Thus, for a real-valued sinusoid, we obtain
1 1 1 1
Sx (f ) = δ(f − f0 ) + δ(f + f0 ) , for x(t) = cos(2πf0 t + θ) = ej(2πf0 t+θ) + e−j(2πf0 t+θ) (4.4)
4 4 2 2
Periodogram-based PSD estimation: One way to carry out the conceptual measurement in
Figure 4.5 is to limit x(t) to a finite observation interval, compute its Fourier transform and hence
its energy spectral density (which is the magnitude square of the Fourier transform), and then
divide by the length of the observation interval. The PSD is obtained by letting the observation
interval get large. Specifically, define the time-windowed version of x as
xTo (t) = x(t)I[− To , To ] (t) (4.5)
2 2
where To is the length of the observation interval. Since To is finite and x(t) has finite power,
xTo (t) has finite energy, and we can compute its Fourier transform
XTo (f ) = F (xTo )
The energy spectral density of xTo is given by |XTo (f )|2 . Averaging this over the observation
interval, we obtain the estimated PSD
|XTo (f )|2
Ŝx (f ) = (4.6)
To
The estimate in (4.6), which is termed a periodogram, can typically be obtained by taking the
DFT of a sampled version of the time windowed signal; the time interval To must be large enough
to give the desired frequency resolution, while the sampling rate must be large enough to capture
the variations in x(t). The estimated PSDs obtained over multiple observation intervals can then
be averaged further to get smoother estimates.
Formally, we can define the PSD in the limit of large time windows as follows:
|XTo (f )|2
Sx (f ) = lim (4.7)
To →∞ To
Units for PSD: Power per unit frequency has the same units as power multiplied by time, or
energy. Thus, the PSD is expressed in units of Watts/Hertz, or Joules.
Power in terms of PSD: The power Px of a finite power signal x is given by integrating its
PSD: Z ∞
Px = Sx (f )df (4.8)
−∞
158
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Theorem 4.2.1 (PSD of a linearly modulated signal) Consider a linearly modulated signal
X
u(t) = b[n]p(t − nT )
n
where the symbol sequence {b[n]} is zero mean and uncorrelated with average symbol energy
N
1 X
|b[n]|2 = lim |b[n]|2 = σb2
N →∞ 2N + 1
n=−N
σb2 ||p||2
Pu = (4.10)
T
where ||p||2 denotes the energy of the modulating pulse.
See Appendix 4.A for a proof of (4.9), which follows from specializing a more general expression.
The expression for power follows from integrating the PSD:
∞ ∞ ∞
σ2 σ2 σb2 ||p||2
Z Z Z
Pu = Su (f )df = b |P (f )| df = b
2
|p(t)|2 dt =
−∞ T −∞ T −∞ T
159
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
contains a given fraction of the power. For example, for symmetric Su (f ), the 99% fractional
power containment bandwidth B is defined by
Z B/2 Z ∞
Su (f )df = 0.99Pu = 0.99 Su (f )df
−B/2 −∞
(replace 0.99 in the preceding equation by any desired fraction γ to get the corresponding γ
power containment bandwidth).
Time/frequency normalization: Before we discuss examples in detail, let us simplify our
life by making a simple observation on time and frequency scaling. Suppose we have a linearly
modulated system operating at a symbol rate of 1/T , as in (4.1). We can think of it as a
normalized system operating at a symbol rate of one, where the unit of time is T . This implies
that the unit of frequency is 1/T . In terms of these new units, we can write the linearly modulated
signal as X
u1 (t) = b[n]p1 (t − n)
n
where p1 (t) is the modulation pulse for the normalized system. For example, for a rectangular
pulse timelimited to the symbol interval, we have p1 (t) = I[0,1] (t). Suppose now that the band-
width of the normalized system (computed using any definition that we please) is B1 . Since
the unit of frequency is 1/T , the bandwidth in the original system is B1 /T . Thus, in terms of
determining frequency occupancy, we can work, without loss of generality, with the normalized
system. In the original system, what we are really doing is working with the normalized time
t/T and the normalized frequency f T .
1
rect. pulse
sine pulse
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
fT
Figure 4.6: PSD corresponding to rectangular and sine timelimited pulses. The main lobe of the
PSD is broader for the sine pulse, but its 99% power containment bandwidth is much smaller.
Rectangular pulse: Without loss of generality, consider a normalized system with p1 (t) =
I[0,1] (t), for which P1 (f ) = sinc(f )e−jπf . For {b[n]} i.i.d., taking values ±1 with equal probability,
we have σb2 = 1. Applying (4.9), we obtain
160
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Note that the PSD for the rectangular pulse has much fatter tails, which does not bode well for
its bandwidth efficiency. For fractional power containment bandwidth with fraction γ, we have
the equation
Z B1 /2 Z ∞ Z 1
2 2
sinc f df = γ sinc f df = γ 12 dt = γ
−B1 /2 −∞ 0
using Parseval’s identity. We therefore obtain, using the symmetry of the PSD, that the band-
width is the numerical solution to the equation
Z B1 /2
sinc2 f df = γ/2 (4.12)
0
For example, for γ = 0.99, we obtain B1 = 10.2, while for γ = 0.9, we obtain B1 = 0.85.
Thus, if we wish to be strict about power containment (e.g., in order to limit adjacent channel
interference in wireless systems), the rectangular timelimited pulse is a very poor choice. On the
other hand, in systems where interference or regulation are not significant issues (e.g., low-cost
wired systems), this pulse may be a good choice because of its ease of implementation using
digital logic.
Smoothing out the rectangular pulse: A useful alternative to using the rectangular pulse,
while still keeping the modulating pulse timelimited to a symbol interval, is the sine pulse, which
for the normalized system equals
√
p1 (t) = 2 sin(πt) I[0,1] (t)
Since the sine pulse does not have the sharp edges of the rectangular pulse in the time domain,
we expect it to be more compact in the frequency domain. Note that we have normalized the
pulse to have unit energy, as we did for the normalized rectangular pulse. This implies that the
power of the modulated signal is the same in the two cases, so that we can compare PSDs under
161
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the constraint that the area under the PSDs remains constant. Setting σb2 = 1 and using (4.9),
we obtain (see Problem 4.1):
8 cos2 πf
Su1 (f ) = |P1 (f )|2 = (4.13)
π 2 (1 − 4f 2 )2
Proceeding as we did for obtaining (4.12), the fractional power containment bandwidth for frac-
tion γ is given by the formula:
B1 /2
8 cos2 πf
Z
df = γ/2 (4.14)
0 π 2 (1 − 4f 2 )2
For γ = 0.99, we obtain B1 = 1.2, which is an order of magnitude improvement over the
corresponding value of B1 = 10.2 for the rectangular pulse.
While the sine pulse has better frequency domain containment than the rectangular pulse, it is
still not suitable for strictly bandlimited channels. We discuss pulse design for such channels
next.
Theorem 4.3.1 (Nyquist’s sampling theorem) Any signal s(t) bandlimited to [− W2 , W2 ] can
be described completely by its samples {s( Wn )} at rate W . The signal s(t) can be recovered from
its samples using the following interpolation formula:
∞ n
X n
s(t) = s p t− (4.15)
n=−∞
W W
162
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Degrees of freedom: What does the sampling theorem tell us about digital modulation? The
interpolation formula (4.15) tells us that we can interpret s(t) as a linearly modulated signal
with symbol sequence equal to the samples {s(n/W )}, symbol rate 1/T equal to the bandwidth
W , and modulation pulse given by p(t) = sinc(W t) ↔ P (f ) = W1 I[−W/2,W/2] (f ). Thus, linear
modulation with the sinc pulse is able to exploit all the “degrees of freedom” available in a
bandlimited channel.
Signal space: If we signal over an observation interval of length To using linear modulation
according to the interpolation formula (4.15), then we have approximately W To complex-valued
samples. Thus, while the signals we send are continuous-time signals, which in general, lie in an
infinite-dimensional space, the set of possible signals we can send in a finite observation interval
of length To live in a complex-valued vector space of finite dimension W To , or equivalently, a
real-valued vector space of dimension 2W To . Such geometric views of communication signals as
vectors, often termed signal space concepts, are particularly useful in design and analysis, as we
explore in more detail in Chapter 6.
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
−10 −5 0 5 10 15
t/T
Figure 4.7: Three successive sinc pulses (each pulse is truncated to a length of 10 symbol intervals
on each side) modulated by +1,-1,+1. The actual transmitted signal is the sum of these pulses
(not shown). Note that, while the pulses overlap, the samples at t = 0, T, 2T are equal to the
transmitted bits because only one pulse is nonzero at these times.
The concept of Nyquist signaling: Since the sinc pulse is not timelimited to a symbol interval,
in principle, the symbols could interfere with each other. The time domain signal corresponding
to a bandlimited modulation pulse such as the sinc spans an interval significantly larger than the
symbol interval (in theory, the interval is infinitely large, but we always truncate the waveform
in implementations). This means that successive pulses corresponding to successive symbols
which are spaced by the symbol interval (i.e., b[n]p(t − nT ) as we increment n) overlap with,
and therefore can interfere with, each other. Figure 4.7 shows the sinc pulse modulated by three
bits, +1,-1,+1. While the pulses corresponding to the three symbols do overlap, notice that, by
sampling at t = 0, t = T and t = 2T , we can recover the three symbols because exactly one of the
pulses is nonzero at each of these times. That is, at sampling times spaced by integer multiples of
the symbol time T , there is no intersymbol interference. We call such a pulse Nyquist for signaling
at rate T1 , and we discuss other examples of such pulses soon. Designing pulses based on the
Nyquist criterion allows us the freedom to expand the modulation pulses in time beyond the
symbol interval (thus enabling better containment in the frequency domain), while ensuring that
there is no ISI at appropriately chosen sampling times despite the significant overlap between
successive pulses.
163
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1.5
0.5
−0.5
−1
−1.5
−10 −5 0 5 10 15 20
t/T
Figure 4.8: The baseband signal for 10 BPSK symbols of alternating signs, modulated using the
sinc pulse. The first symbol is +1, and the sample at time t = 0, marked with ’x’, equals +1, as
desired (no ISI). However, if the sampling time is off by 0.25T , the sample value, marked by ’+’,
becomes much smaller because of ISI. While it still has the right sign, the ISI causes it to have
significantly smaller noise immunity. See Problem 4.14 for an example in which the ISI due to
timing mismatch actually causes the sign to flip.
164
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The problem with sinc: Are we done then? Should we just use linear modulation with a sinc
pulse when confronted with a bandlimited channel? Unfortunately, the answer is no: just as the
rectangular timelimited pulse decays too slowly in frequency, the rectangular bandlimited pulse,
corresponding to the sinc pulse in the time domain, decays too slowly in time. Let us see what
happens as a consequence. Figure 4.8 shows a plot of the modulated waveform for a bit sequence
of alternating sign. At the correct sampling times, there is no ISI. However, if we consider a small
timing error of 0.25T , the ISI causes the sample value to drop drastically, making the system
more vulnerable to noise. What is happening is that, when there is a small sampling offset,
we can make the ISI add up to a large value by choosing the interfering symbols so that their
contributions all have signs opposite to that of the desired symbol at the sampling time. Since
the sinc pulse decays as 1/t, the ISI created for a given symbol by an interfering symbol which
is n symbol intervals away decays as 1/n, soP that, in the worst-case, the contributions from the
interfering symbols roughly have the form n n1 , a series that is known to diverge. Thus, in
theory, if we do not truncate the sinc pulse, we can make the ISI arbitrarily large when there is
a small timing offset. In practice, we do truncate the modulation pulse, so that we only see ISI
from a finite number of symbols. However, even when we do truncate, as we see from Figure 4.8,
the slow decay of the sinc pulse means that the ISI adds up quickly, and significantly reduces
the margin of error when noise is introduced into the system.
While the sinc pulse may not be a good idea in practice, the idea of using bandwidth-efficient
Nyquist pulses is a good one, and we now develop it further.
We say that the pulse p(t) is Nyquist (or satisfies the Nyquist criterion) for signaling at rate T1
if the symbol-spaced samples of the modulated signal are equal to the symbols (or a fixed scalar
multiple of the symbols); that is, u(kT ) = b[k] for all k. That is, there is no ISI at appropriately
chosen sampling times spaced by the symbol interval.
In the time domain,
Pit is quite easy to see what is required to satisfy the Nyquist criterion. The
samples u(kT ) = n b[n]p(kT − nT ) = b[k] (or a scalar multiple of b[k]) for all k if and only
if p(0) = 1 (or some nonzero constant) and p(mT ) = 0 for all integers m 6= 0. However, for
design of bandwidth efficient pulses, it is important to characterize the Nyquist criterion in the
frequency domain. This is given by the following theorem.
Theorem 4.3.2 (Nyquist criterion for ISI avoidance): The pulse p(t) ↔ P (f ) is Nyquist
for signaling at rate T1 if
1 m=0
p(mT ) = δm0 = (4.16)
0 m 6= 0
or equivalently,
∞
1 X k
P (f + ) = 1 for all f (4.17)
T T
k=−∞
The proof of this theorem is given in Section 4.5, where we show that both the Nyquist sampling
theorem, Theorem 4.3.1, and the preceding theorem are based on the same mathematical result,
that the samples of a time domain signal have a one-to-one mapping with the sum of translated
(or aliased) versions of its Fourier transform.
165
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
In this section, we explore the design implications of Theorem 4.3.2. In the frequency domain,
the translates of P (f ) by integer multiples of 1/T must add up to a constant. As illustrated by
Figure 4.9, the minimum bandwidth pulse for which this happens is the ideal bandlimited pulse
over an interval of length 1/T .
Not Nyquist Nyquist with minimum bandwidth
P(f + 1/T) P(f) P(f − 1/T) P(f + 1/T) P(f) P(f − 1/T)
As we have already discussed, the sinc pulse is not a good choice in practice because of its slow
decay in time. To speed up the decay in time, we must expand in the frequency domain, while
conforming to the Nyquist criterion. The trapezoidal pulse depicted in Figure 4.9 is an example
of such a pulse.
f
−(1+a)/(2T) −(1−a)/(2T) (1−a)/(2T) (1+a)/(2T)
Figure 4.10: A trapezoidal pulse which is Nyquist at rate 1/T . The (fractional) excess bandwidth
is a.
The role of excess bandwidth: We have noted earlier that the problem P∞with the sinc pulse
1
arises because of its 1/t decay and the divergence of the harmonic series n=1 n , which implies
that the worst-case contribution from “distant” interfering symbols at a given sampling instant
can blow up. Using thePsame reasoning, however, a pulse p(t) decaying as 1/tb for b > 1 should
work, since the series ∞ 1
n=1 nb does converge for b > 1. A faster time decay requires a slower
decay in frequency. Thus, we need excess bandwidth, beyond the minimum bandwidth dictated
by the Nyquist criterion, to fix the problems associated with the sinc pulse. The (fractional)
excess bandwidth for a linear modulation scheme is defined to be the fraction of bandwidth
over the minimum required for ISI avoidance at a given symbol rate. In particular, Figure 4.10
shows that a trapezoidal pulse (in the frequency domain) can be Nyquist for suitably chosen
parameters, since the translates {P (f + k/T )} as shown in the figure add up to a constant. Since
trapezoidal P (f ) is the convolution of two boxes in the frequency domain, the time domain pulse
p(t) is the product of two sinc functions, as worked out in the example below. Since each sinc
decays as 1/t, the product decays as 1/t2 , which implies that the worst-case ISI with timing
mismatch is indeed bounded.
166
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 4.3.1 Consider the trapezoidal pulse of excess bandwidth a shown in Figure 4.10.
(a) Find an explicit expression for the time domain pulse p(t).
(b) What is the bandwidth required for a passband system using this pulse operating at 120
Mbps using 64QAM, with an excess bandwidth of 25%?
Solution: (a) It is easy to check that the trapezoid is a convolution of two boxes as follows (we
assume 0 < a ≤ 1):
T2
P (f ) = I 1 1 (f ) ∗ I[− 2Ta , 2Ta ] (f )
a [− 2T , 2T ]
Taking inverse Fourier transforms, we obtain
T2 1
a
p(t) = sinc(t/T ) sinc(at/T ) = sinc(t/T )sinc(at/T ) (4.18)
a T T
The presence of the first sinc provides the zeroes required by the time domain Nyquist criterion:
p(mT ) = 0 for nonzero integers m 6= 0. The presence of a second sinc yields a 1/t2 decay,
providing robustness against timing mismatch.
(b) Since 64 = 26 , the use of 64QAM corresponding to sending 6 bits/symbol, so that the symbol
rate is 120/6 = 20 Msymbols/sec. The minimum bandwidth required is therefore 20 MHz, so
that 25% excess bandwidth corresponds to a bandwidth of 20 × 1.25 = 25 MHz.
Raised cosine pulse: Replacing the straight line of the trapezoid with a smoother cosine-
shaped curve in the frequency domain gives us the raised cosine pulse shown in Figure 4.12,
which has a faster, 1/t3 , decay in the time domain.
|f | ≤ 1−a
T, 2T
T
P (f ) = 2
[1 + cos((|f | − 1−a
2T
) πT
a
)], 1−a
2T
≤ |f | ≤ 1+a
2T
0, |f | > 1+a
2T
where a is the fractional excess bandwidth, typically chosen in the range where 0 ≤ a < 1. As
shown in Problem 4.11, the time domain pulse s(t) is given by
t cos πa Tt
p(t) = sinc( )
T 1 − 2at 2
T
This pulse inherits the Nyquist property of the sinc pulse, while having an additional multiplica-
tive factor that gives an overall 1/t3 decay with time. The faster time decay compared to the
sinc pulse is evident from a comparison of Figures 4.12(b) and 4.11(b).
We now comment on some interesting properties of Nyquist pulses.
For both the trapezoidal and raised cosine waveforms, the time domain pulse has a sinc(at) term
which provides zeros at integer multiples of T = a1 . This means that the pulse is Nyquist at rate
1
T
= a. In other words, a time domain factor which provides “zeros at rate a” (i.e., spaced by
1/a) enables Nyquist signaling at rate a. A pulse which is trapezoidal has a time domain pulse
of the form sincat sincbt, which provides zeros at rate a as well as at rate b. Thus, this pulse is
Nyquist rate a and at rate b.
It is also interesting to note that, once we have zeros at integer multiples of T , we also have zeros
at integer multiples of KT , where K is any positive integer. In other words, if a pulse is Nyquist
at rate T1 , then it is also Nyquist at integer submultiples of this rate; that is, it is Nyquist for all
1
rates of the form KT , for K a positive integer. Thus, a factor sinc(at) in the pulse guarantees
the Nyquist property for all rates a/K.
Of course, we are typically only interested in the highest rate for a given bandwidth, but it is
interesting to play with the preceding observations, as we do in the following example involving
a trapezoidal pulse.
167
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0.8
0.6
0.4
0.2
X(f)
T 0
−0.2
fT −0.4
−1/2 0 1/2 −5 −4 −3 −2 −1 0 1 2 3 4 5
t/T
Figure 4.11: Sinc pulse for minimum bandwidth ISI-free signaling at rate 1/T . Both time and
frequency axes are normalized to be dimensionless.
0.8
0.6
0.4
X(f)
0.2
T
0
T/2
fT −0.2
−(1+a)/2 −1/2 −(1−a)/2 0 (1−a)/2 1/2 (1+a)/2 −5 −4 −3 −2 −1 0 1 2 3 4 5
t/T
(a) Frequency domain raised cosine (b) Time domain pulse (excess bandwidth a = 0.5)
Figure 4.12: Raised cosine pulse for minimum bandwidth ISI-free signaling at rate 1/T , with
excess bandwidth a. Both time and frequency axes are normalized to be dimensionless.
P(f)
1
f (MHz)
−10 −4 0 4 10
168
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
14
f (MHz)
−10 −4 0 4 10
1
Figure 4.14: The frequency domain Nyquist criterion is satisfied for T
= 14 Msymbols/sec in
Example 4.3.2(a).
Example 4.3.2 Consider passband linear modulation using the bandlimited pulse shown in
Figure 4.13. Answer the following True/False questions, clearly stating your reasoning.
(a) True or False: The pulse p(t) can be used for Nyquist signaling at a bit rate of 56 Mbps
using a 16QAM constellation.
(b) True or False: The pulse p(t) can be used for Nyquist signaling at a bit rate of 21 Mbps
using an 8PSK constellation.
(c) True or False: The pulse p(t) can be used for Nyquist signaling at a bit rate of 18 Mbps
using an 8PSK constellation.
(d) True or False: The pulse p(t) can be used for Nyquist signaling at a bit rate of 25 Mbps
using a QPSK constellation.
Solution: (a) The symbol rate is
1 56 Mbps
= = 14 Msymbols/sec
T 4 bits/symbol
From Figure 4.14, we see that for this rate, the frequency domain Nyquist criterion is satisfied:
+ Tk ) is constant. Alternatively, we know that the frequency domain trapezoid corre-
P
k P (f
sponds to p(t) = sincat sincbt in the time domain, where (a − b)/2 = 4, (a + b)/2 = 10. Solving,
we obtain a = 14 MHz, b = 6 MHz. Thus, the time domain pulse provides zeros at rates 14 MHz
and 6 MHz, hence it is indeed Nyquist at rate 14 Msymbols/sec. The statement is therefore
True.
(b) The symbol rate is T1 = 21Mbps/3bits/symbol = 7 Msymbols/sec. Since the pulse is Nyquist
at 14 Msymbols/sec, it is also Nyquist at 14/2 = 7 Msymbols/sec. The statement is therefore
True.
(c) The symbol rate is T1 = 18Mbps/3bits/symbol = 6 Msymbols/sec. As shown in (a), the pulse
has a sinc6t term that provides zeros at rate 6 MHz, hence the statement is True.
(d) The symbol rate is T1 = 25Mbps/2bits/symbol = 12.5 Msymbols/sec. This is not an integer
submultiple of either 14 MHz or 6 MHz, the rates at which zeros are provided by the two sinc
factors. Thus, the Nyquist property does not hold, and the statement is False.
ηB = log2 M bits/symbol
The Nyquist criterion for ISI avoidance says that the minimum bandwidth required for ISI-free
transmission using linear modulation equals the symbol rate, using the sinc as the modulation
pulse. For such an idealized system, we can think of ηB as bits/second per Hertz, since the symbol
rate equals the bandwidth. Thus, knowing the bit rate Rb and the bandwidth efficiency ηB of
169
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the modulation scheme, we can determine the symbol rate, and hence the minimum required
bandwidth Bmin . as follows:
Rb
Bmin =
ηB
This bandwidth would then be expanded by the excess bandwidth used in the modulating pulse.
However, this is not included in our definition of bandwidth efficiency, because excess bandwidth
is a highly variable quantity dictated by a variety of implementation considerations. Once we
decide on the fractional excess bandwidth a, the actual bandwidth required is
Rb
B = (1 + a)Bmin = (1 + a)
ηB
dmin
dmin 2
Scale up
by factor
1 Es
Es
of two
−1 1 −2 2
−1
−2
Intuitively speaking, the effect of noise is to perturb constellation points from the nominal loca-
tions shown in Figure 4.4, which leads to the possibility of making an error in deciding which
point was transmitted. For a given noise “strength” (which determines how much movement the
noise can produce), the closer the constellation points, the more the possibility of such errors.
In particular, as we shall see in Chapter 6, the minimum distance between constellation points,
termed dmin , provide a good measure of how vulnerable we are to noise. For a given constellation
shape, we can increase dmin simply by scaling up the constellation, as shown in Figure 4.15, but
this comes with a corresponding increase in energy expenditure. To quantify this, define the
energy per symbol Es for a constellation as the average of the squared Euclidean distances of the
points from the origin. For an M-ary constellation, each symbol carries log2 M bits of informa-
tion, and we can define the average energy per bit Eb as Eb = logEsM . Specifically, dmin increases
2
170
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
from 2 to 4 by scaling as shown in Figure 4.15. Correspondingly, Es = 2 and Eb = 1 is increased
to Es = 8 and Eb = 4 in Figure 4.15(b). Thus, doubling the minimum distance in Figure 4.15
d2min
leads to a four-fold increase in Es and Eb . However, the quantity E b
does not change due to
scaling; it depends only on the relative geometry of the constellation points. We therefore adopt
this scale-invariant measure as our notion of power efficiency for a constellation:
d2min
ηP = (4.19)
Eb
Since this quantity is scale-invariant, we can choose any convenient scaling in computing it: for
QPSK, choosing the scaling on the left in Figure 4.15, we have dmin = 2, Es = 2, Eb = 1, which
gives ηP = 4.
It is important to understand how these quantities relate to physical link parameters. For a
given bit rate Rb and received power PRX , the energy per bit is given by Eb = PRRX b
. It is worth
verifying that the units make sense: the numerator has units of Watts, or Joules/sec, while the
denominator has units of bits/sec, so that Eb has units of joules/bit. We shall see in Chapter 6
that the reliability of communication is determined by the power efficiency ηP (a scale-invariant
quantity which is a function of the constellation shape) and the dimensionless signal-to-noise ratio
(SNR) measure Eb /N0 , where N0 is the noise power spectral density, which has units of watts/Hz,
Eb
or Joules. Specifically, the reliability can be approximately characterized by the product ηP N 0
, so
that, for a given desired reliability, the required energy per bit (and hence power) scales inversely
as power efficiency for a fixed bit rate. Communication link designers use such concepts as the
basis for forming a “link budget” that can be used to choose link parameters such as transmit
power, antenna gains and range.
Even based on these rather sketchy and oversimplified arguments, we can draw quick conclusions
on the power-bandwidth tradeoffs in using different constellations, as shown in the following
example.
Example 4.3.3 We wish to design a passband communication system operating at a bit rate of
40 Mbps.
(a) What is the bandwidth required if we employ QPSK, with an excess bandwidth of 25%.
(b) What if we now employ 16QAM, again with excess bandwidth 25%.
(c) Suppose that the QPSK system in (a) attains a desired reliability when the transmit power is
50 mW. Give an estimate of the transmit power needed for the 16QAM system in (b) to attain
a similar reliability.
(d) How does the bandwidth and transmit power required change for the QPSK system if we
increase the bit rate to 80 Mbps.
(e) How does the bandwidth and transmit power required change for the QPSK system if we
increase the bit rate to 80 Mbps.
Solution: (a) The bandwidth efficiency of QPSK is 2 bits/symbol, hence the minimum bandwidth
required is 20 MHz. For excess bandwidth of 25%, the bandwidth required is 25 MHz.
(b) The bandwidth efficiency of 16QAM is 4 bits/symbol, hence, reasoning as in (a), the band-
width required is 12.5 MHz.
(c) We wish to set ηP Eb /N0 to be equal for both systems in order to keep the reliability roughly
the same. Assuming that the noise PSD N0 is the same for both systems, the required Eb scales
as 1/ηP . Since the bit rates Rb for both systems are equal, the required received power P = Eb Rb
(and hence the required transmit power, assuming that received power scales linearly with trans-
mit power) also scales as 1/ηP . We already know that ηP = 4 for QPSK. It remains to find ηP for
16QAM, which is shown in Problem 4.15 to equal 8/5. We therefore conclude that the transmit
power for the 16QAM system can be estimated as
ηP (QP SK)
PT (16QAM) = PT (QP SK)
ηP (16QAM)
171
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
which evaluates for 125 mW.
(d) For fixed bandwidth efficiency, required bandwidth scales linearly with bit rate, hence the
new bandwidth required is 50 MHz. In order to maintain a given reliability, we must maintain
the same value of ηP Eb /N0 as in (c). The power efficiency ηP is unchanged, since we are using
the same constellation. Assuming that the noise PSD N0 is unchanged, the required energy per
bit Eb is unchanged, hence transmit power must scale up linearly with bit rate Rb . Thus, the
power required using QPSK is now 100 mW.
(e) Arguing as in (d), we require a bandwidth of 25 MHz and a power of 250 mW for 16QAM,
using the results in (b) and (c).
Figure 4.16 shows a block diagram for a link using linear modulation, with the entire model
expressed in complexP baseband. The symbols {b[n]} are passed through the transmit filter to
obtain the waveform n b[n]gT X (t − nT ). This then goes through the channel filter gC (t), and
then the receive filter
P gRX (t). Thus, at the output of the receive filter, we have the linearly
modulated signal n b[n]p(t − nT ), where p(t) = (gT X ∗ gC ∗ gRX )(t) is the cascade of the
transmit, channel and receive filters. We would like the pulse p(t) to be Nyquist at rate 1/T , so
that, in the absence of noise, the symbol rate samples at the output of the receive filter equal
the transmitted symbols. Of course, in practice, we do not have control over the channel, hence
we often assume an ideal channel, and design such that the cascade of the transmit and receive
filter, given by (gT X ∗ gRX ) (t)GT X (f )GRX (f ) is Nyquist. One possible choice is to set GT X to
be a Nyquist pulse, and GRX to be a wideband filter whose response is flat over the band of
interest. Another choice that is even more popular is to set GT X (f ) and GRX (f ) to be square
roots of a Nyquist pulse. In particular, the square root raised cosine (SRRC) pulse is often used
in practice.
A framework for software simulations of linear modulated systems with raised cosine and SRRC
pulses, including Matlab code fragments, is provided in the appendix, and provides a foundation
for Software Lab 4.1.
Square root Nyquist pulses and their time domain interpretation: A pulse g(t) ↔ G(f )
is defined to be square root Nyquist at rate 1/T if |G(f )|2 is Nyquist at rate 1/T . Note that
P (f ) = |G(f )|2 ↔ p(t) = (g ∗ gM F )(t), where gM F (t) = g ∗ (−t). The time domain Nyquist
condition is given by
Z
p(mT ) = (g ∗ gM F )(mT ) = g(t)g ∗(t − mT )dt = δm0 (4.20)
That is, a square root Nyquist pulse has an autocorrelation function that vanishes at nonzero
integer multiples of T . In other words, the waveforms {g(t − kT, k = 0, ±1, ±2, ...} are orthonor-
mal, and can be used to provide a basis for constructing more complex waveforms, as we see in
Section 4.3.6.
Food for thought: True or False? Any pulse timelimited to [0, T ] is square root Nyquist at
rate 1/T .
172
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
N
X
s(t) = s[k]ψ(t − kTc )
k=0
Since {ψ(t − kTc )} are orthonormal (see (4.20)), we have simply expressed the code vector in a
continuous time basis. Thus, the continuous time inner product between two symbol waveforms
(which determines their geometric relationships and their performance in noise, as we see in
the next chapter) is equal to the discrete time inner product between the corresponding code
vectors. Specifically, suppose that s1 (t) and s2 (t) are two symbol waveforms corresponding to
code vectors s1 and s2 , respectively. Then their inner product satisfies
N
X −1 N
X −1 Z N
X −1
hs1 , s2 i = s1 [k]s∗2 [l] ∗
ψ(t − kTc )ψ (t − lTc )dt = s1 [k]s∗2 [k] = hs1 , s2 i
k=0 l=0 k=0
where we have use the orthonormality of the translates {ψ(t − kTc )}. This means that we can
design discrete time code vectors to have certain desired properties, and then linearly modulate
square root Nyquist chip waveforms to get symbol waveforms that have the same desired prop-
erties. For example, if s1 and s2 are orthogonal, then so are s1 (t) and s2 (t); we use this in the
next section when we discuss orthogonal modulation.
Examples of square root Nyquist chip waveforms include a rectangular pulse timelimited to an
interval of length Tc , as well as bandlimited pulses such as the square root raised cosine. From
Theorem 4.2.1, we see that the PSD of the modulated waveform is proportional to |Ψ(f )|2 (it is
typically a good approximation to assume that the chips {s[k]} are uncorrelated). That is, the
bandwidth occupancy is determined by that of the chip waveform ψ.
173
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Let us now understand how the tones should be chosen in order to ensure orthogonality. Recall
that the passband and complex baseband inner products are related as follows:
1
hup,k , up,li = Rehuk , ul i
2
so we can develop criteria for orthogonality working in complex baseband. Setting k = l, we see
that
||uk ||2 = T
For two adjacent tones, l = k + 1, we leave it as an exercise to show that
sin 2π∆f T
Rehuk , uk+1i =
2π∆f
We see that the minimum value of ∆f for which the preceding quantity is zero is given by
1
2π∆f T = π, or ∆f = 2T .
1
Thus, from the point of view of the receiver, a tone spacing of 2T ensures that when there is an
incoming wave at the kth tone, then correlating against the kth tone will give a large output, but
correlating against the (k + 1)th tone will give zero output (in the absence of noise). However,
this assumes a coherent system in which the tones we are correlating against are synchronized in
phase with the incoming wave. What happens if they are 90◦ out of phase? Then correlation of
the kth tone with itself yields
T
π
Z
cos (2π(f0 + k∆f )t) cos 2π(f0 + k∆f )t + dt = 0
0 2
(by orthogonality of the cosine and sine), so that the output we desire to be large is actually
zero! Robustness to such variations can be obtained by employing noncoherent reception, which
we describe next.
Noncoherent reception: Let us develop the concept of noncoherent reception in generality,
because it is a concept that is useful in many settings, not just for orthogonal modulation. Sup-
pose that we transmit a passband waveform, and wish to detect it at the receiver by correlating
it against the receiver’s copy of the waveform. However, the receiver’s local oscillator may not
be synchronized in phase with the phase of the incoming wave. Let us denote the receiver’s copy
of the signal as
up (t) = uc (t) cos 2πfc t − us (t) sin 2πfc t
and the incoming passband signal as
yp (t) = yc (t) cos 2πfc t − ys (t) sin 2πfc t = uc (t) cos (2πfc t + θ) − us (t) sin (2πfc t + θ)
Using the receiver’s local oscillator as reference, the complex envelope of the receiver’s copy is
u(t) = uc + jus (t), while that of the incoming wave is y(t) = u(t)ejθ . Thus, the inner product
1 1 1 ||u||2
hyp , up i = Rehy, ui = Rehuejθ , ui = Re ||u||2ejθ = cos θ
2 2 2 2
Thus, the output of the correlator is degraded by the factor cos θ, and can actually become zero,
as we have already observed, if the phase offset θ = π/2. In order to get around this problem,
let us look at the complex baseband inner product again:
174
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
We could ensure that this output remains large regardless of the value of θ if we took its magni-
tude, rather than the real part. Thus, noncoherent reception corresponds to computing |hy, ui|
or |hy, ui|2. Let us unwrap the complex inner product to see what this entails:
Z Z
∗
hy, ui = y(t)u (t)dt = (yc (t)+jys (t))(uc (t)−jus (t))dt = (hyc , uc i + hys , us i)+j (hys , uc i − hyc , us i)
That is, when the receiver LO is synchronized to the phase of the incoming wave, we can correlate
the I component of the received waveform with the I component of the receiver’s copy, and
similarly correlate the Q components, and sum them up. However, in the presence of phase
asynchrony, the I and Q components get mixed up, and we must compute the magnitude of the
complex inner product to recover all the energy of the incoming wave. Figure 4.17 shows the
receiver operations corresponding to coherent and noncoherent reception.
Coherent
) receiver output
− 2 sin 2π fc t
us (t)
Back to FSK: Going back to FSK, if we now use noncoherent reception, then in order to
ensure that we get a zero output (in the absence of noise) when receiving the kth tone with a
noncoherent receiver for the (k + 1)th tone, we must ensure that
|huk , uk+1i| = 0
We leave it as an exercise (Problem 4.18) to show that the minimum tone spacing for noncoherent
FSK is T1 , which is double that required for orthogonality in coherent FSK. The bandwidth for
M
coherent M-ary FSK is approximately 2T , which corresponds to a time-bandwidth product of
M
approximately 2 . This corresponds to a complex vector space of dimension M2 , or a real vector
space of dimension M, in which we can fit M orthogonal signals. On the other hand, M-ary
noncoherent signaling requires M complex dimensions, since the complex baseband signals must
remain orthogonal even under multiplication by complex-valued scalars.
Summarizing the concept of orthogonality: To summarize, when we say “orthogonal”
modulation, we must specify whether we mean coherent or noncoherent reception, because the
175
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
concept of orthogonality is different in the two cases. For a signal set {sk (t)}, orthogonality
requires that, for k 6= l, we have
Re(hsk , sl i) = 0 coherent orthogonality criterion
(4.21)
hsk , sl i = 0 noncoherent orthogonality criterion
Bandwidth efficiency: We conclude from the example of orthogonal FSK that the bandwidth
efficiency of orthogonal signaling is ηB = log2M(2M ) bits/complex dimension for coherent systems,
and ηB = logM 2 (M )
bits/complex dimension for noncoherent systems. This is a general observation
that holds for any realization of orthogonal signaling. In a signal space of complex dimension
D (and hence real dimension 2D), we can fit 2D signals satisfying the coherent orthogonality
criterion, but only D signals satisfying the noncoherent orthogonality criterion. As M gets large,
the bandwidth efficiency tends to zero. In compensation, as we see in Chapter 6, the power
efficiency of orthogonal signaling for large M is the “best possible.”
Orthogonal Walsh-Hadamard codes
Section 4.3.6 shows how to map vectors to waveforms while preserving inner products, by using
linear modulation with a square root Nyquist chip waveform. Applying this construction, the
problem of designing orthogonal waveforms {si } now reduces to designing orthogonal code vectors
{si }. Walsh-Hadamard codes are a standard construction employed for this purpose, and can
be constructed recursively as follows: at the nth stage, we generate 2n orthogonal vectors, using
the 2n−1 vectors constructed in the n − 1 stage. Let Hn denote a matrix whose rows are 2n
orthogonal codes obtained after the nth stage, with H0 = (1). Then
Hn−1 Hn−1
Hn =
Hn−1 −Hn−1
We therefore get
1 1 1 1
1 1 1 −1 1 −1
H1 = , 1 1 −1 −1 ,
H2 = etc.
1 −1
1 −1 −1 1
Figure 4.18 depicts the waveforms corresponding to the 4-ary signal set in H2 using a rectangular
timelimited chip waveform to go from sequences to signals, as described in Section 4.3.6.
The signals {si } obtained above can be used for noncoherent orthogonal signaling, since they
satisfy the orthogonality criterion hsi , sj i = 0 for i 6= j. However, just as for FSK, we can
fit twice as many signals into the same number of degrees of freedom if we used the weaker
notion of orthogonality required for coherent signaling, namely Re(hsi , sj i = 0 for i 6= j. It
is easy to check that for M-ary Walsh-Hadamard signals {si , i = 1, ..., M}, we can get 2M
orthogonal signals for coherent signaling: {si , jsi , i = 1, ..., M}. This construction corresponds
to independently modulating the I and Q components with a Walsh-Hadamard code; that is,
using passband waveforms si (t) cos 2πfc t and −si (t) sin 2πfc t (the negative sign is only to conform
to our convention for I and Q, and can be dropped, which corresponds to replacing jsi by −jsi
in complex baseband), i = 1, ..., M.
Biorthogonal modulation
Given an orthogonal signal set, a biorthogonal signal set of twice the size can be obtained by
including a negated copy of each signal. Since signals s and −s cannot be distinguished in a
noncoherent system, biorthogonal signaling is applicable to coherent systems. Thus, for an M-ary
Walsh-Hadamard signal set {si } with M signals obeying the noncoherent orthogonality criterion,
we can construct a coherent orthogonal signal set {si , jsi } of size 2M, and hence a biorthogonal
signal set of size 4M, e.g., {si , jsi , −si , −jsi }. These correspond to the 4M passband waveforms
±si (t) cos 2πfc t and ±si (t) sin 2πfc t, i = 1, ..., M.
176
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
Theorem 4.5.1 (Sampling and Aliasing): Consider a signal s(t), sampled at rate Ts
. Let
S(f ) denote the spectrum of s(t), and let
∞
1 X k
B(f ) = S(f + ) (4.22)
Ts Ts
k=−∞
denote the sum of translates of the spectrum. Then the following observations hold:
(a) B(f ) is periodic with period T1s .
(b) The samples {s(nTs )} are the Fourier series for B(f ), satisfying
Z 1
2Ts
s(nTs ) = Ts B(f )ej2πf nT s df (4.23)
− 2T1
s
∞
X
B(f ) = s(nTs )e−j2πf nTs (4.24)
n=−∞
Remark: Note that the signs of the exponents for the frequency domain Fourier series in the
theorem are reversed from the convention in the usual time domain Fourier series (analogous to
the reversal of the sign of the exponent for the inverse Fourier transform compared to the Fourier
transform).
177
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Proof of Theorem 4.5.1: The periodicity of B(f ) follows by its very construction. To prove
(b), apply the the inverse Fourier transform to obtain
Z ∞
s(nTs ) = S(f )ej2πf nTs df
−∞
We now write the integral as an infinite sum of integrals over segments of length 1/T
1
k+ 2
∞ Z
X Ts
s(nTs ) = k− 1
S(f )ej2πf nTs df
2
k=−∞ Ts
k
In the integral over the kth segment, make the substitution ν = f − Ts
and rewrite it as
1 1
k k j2πνnTs
Z Z
2Ts k 2Ts
S(ν + )ej2π(ν+ Ts )nTs dν = S(ν + )e dν
− 2T1 Ts − 2T1 Ts
s s
Now that the limits of all segments and the complex exponential in the integrand are the same
(i.e., independent of k), we can move the summation inside to obtain
R 2T1s P∞ k
j2πνnTs
s(nTs ) = − 1 k=−∞ S(ν + Ts ) e dν
2T
R 2T1s s
= Ts − 1 B(ν)ej2πνnTs dν
2Ts
proving (4.23). We can now recognize that this is just the formula for the Fourier series coefficients
of B(f ), from which (4.24) follows.
1/Ts 1/Ts
S(f + 1/Ts ) S(f −1/Ts ) S(f + 1/Ts ) S(f −1/Ts )
S(f) S(f)
f f
W W
Sampling rate not high enough Sampling rate high enough
to recover S(f) from B(f) to recover S(f) from B(f)
Figure 4.19: Recovering a signal from its samples requires a high enough sampling rate for
translates of the spectrum not to overlap.
Inferring Nyquist’s sampling theorem from Theorem 4.5.1: Suppose that s(t) is ban-
dlimited to [− W2 , W2 ]. The samples of s(t) at rate T1s can be used to reconstruct B(f ), since they
are the Fourier series for B(f ). But S(f ) can be recovered from B(f ) if and only if the translates
S(f − Tks ) do not overlap, as shown in Figure 4.19. This happens if and only if T1s ≥ W . Once
this condition is satisfied, T1s S(f ) can be recovered from B(f ) by passing it through an ideal
bandlimited filter H(f ) = I[−W/2.W/2] (f ). We therefore obtain that
∞
1 X
S(f ) = B(f )H(f ) = s(nTs )e−j2πf nTs I[−W/2.W/2] (f ) (4.25)
Ts n=−∞
178
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Taking inverse Fourier transforms, we get the interpolation formula
∞
1 X
s(t) = s(nTs )W sinc (W (t − nTs ))
Ts n=−∞
1
which reduces to (4.15) for Ts
= W . This completes the proof of the sampling theorem, Theorem
4.3.1.
Inferring Nyquist’s criterion for ISI avoidance from Theorem 4.5.1: A Nyquist pulse
p(t) at rate 1/T must satisfy p(nT ) = δn0 . Applying Theorem 4.5.1 with s(t) = p(t) and Ts = T ,
it follows immediately from (4.24) that p(nT ) = δn0 (i.e., the time domain Nyquist criterion
holds) if and only if
∞
1 X k
B(f ) = P (f + ) = 1
T k=−∞ Ts
In other words, if the Fourier series only has a DC term, then the periodic waveform it corresponds
to must be constant.
179
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
in the frequency domain to obtain faster decay in time. The raised cosine pulse is a popular
choice, giving a 1/t3 decay.
• If the receive filter is matched to the transmit filter, each has to be a square root Nyquist pulse,
with their cascade being Nyquist. The SRRC is a popular choice.
Power-bandwidth tradeoffs
• For an M-ary constellation, the bandwidth efficiency is log2 M bits per symbol, so that larger
constellations are more bandwidth-efficient.
• The power efficiency for a constellation is well characterized by the scale-invariant quantity
d2min /Eb . Large constellations are typically less power-efficient.
Beyond linear modulation
• Linear modulation using square root Nyquist pulses can be used to translate signal design from
discrete time to continuous time while preserving geometric relationships such as inner products.
This is because, if ψ(t) is square root Nyquist at rate 1/Tc , then {ψ(t − kT c)}, its translates by
integer multiples of Tc , form an orthonormal basis.
• Orthogonal modulation can be used with either coherent or noncoherent reception, but the
concept of orthogonality is more stringent (eating up more degrees of freedom) for noncoherent
orthogonal signaling. Waveforms for orthogonal modulation can be constructed in a variety
of ways, including FSK and Walsh-Hadamard sequences modulated onto square root Nyquist
pulses. Biorthogonal signaling doubles the signaling alphabet for coherent orthogonal signaling
by adding the negative of each signal to the constellation.
Sampling and aliasing
• Time domain sampling corresponds to frequency domain aliasing. Specifically, the samples of
a waveform x(t) at rate 1/T are the Fourier series for the periodic frequency domain waveform
1
P
T k X(f − k/T ) obtained by summing the frequency domain waveform and its aliases X(f −
k/T ) (k integer).
• The Nyquist sampling theorem corresponds to requiring that the aliased copies are far enough
apart (i.e., the sampling rate is high enough) that we can recover the original frequency domain
waveform by filtering the sum of the aliased waveforms.
• The Nyquist criterion for interference avoidance requires that the samples of the signaling
pulse form a discrete delta function, or that the corresponding sum of the aliased waveforms is
a constant.
4.7 Endnotes
While we use linear modulation in the time domain for our introduction to modulation, an
alternative frequency domain approach is to divide the available bandwidth into thin slices, or
subcarriers, and to transmit symbols in parallel on each subcarrier. Such a strategy is termed
Orthogonal Frequency Division Multiplexing (OFDM) or multicarrier modulation, and we discuss
it in more detail in Chapter 8. OFDM is also termed multicarrier modulation, while the time
domain linear modulation schemes covered here are classified as singlecarrier modulation. In
addition to the degrees of freedom provided by time and frequency, additional spatial degrees of
freedom can be obtained by employing multiple antennas at the transmitter and receiver, and
we provide a glimpse of such Multiple Input Multiple Output (MIMO) techniques in Chapter 8.
While the basic linear modulation strategies discussed here, in either singlecarrier or multicarrier
modulation formats, are employed in many existing and emerging communication systems, it is
worth mentioning a number of other strategies in which modulation with memory is used to shape
the transmitted waveform in various ways, including insertion of spectral nulls (e.g., line codes,
often used for baseband wireline transmission), avoidance of long runs of zeros and ones which
can disrupt synchronization (e.g., runlength constrained codes, often used for magnetic recording
channels), controlling variations in the signal envelope (e.g., constant phase modulation), and
180
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
controlling ISI (e.g., partial response signaling). Memory can also be inserted in the manner
that bits are encoded into symbols (e.g., differential encoding for alleviating the need to track
a time-varying channel), without changing the basic linear modulation format. The preceding
discussion, while not containing enough detail to convey the underlying concepts, is meant to
provide keywords to facilitate further exploration, with more advanced communication theory
texts such as [5, 7, 8] serving as a good starting point.
Problems
Timelimited pulses
Problem 4.1 (Sine pulse) Consider the sine pulse pulse p(t) = sin πtI[0,1] (t).
(a) Show that its Fourier transform is given by
2 cos(πf ) e−jπf
P (f ) =
π(1 − 4f 2 )
P
(b) Consider the linearly modulated signal u(t) = n b[n]p(t − n), where b[n] are independently
chosen to take values in a QPSK constellation (each point chosen with equal probability), and
the unit of time is in microseconds. Find the 95% power containment bandwidth (specify the
units).
where 0 ≤ a ≤ 12 .
(a) Sketch p(t) and find its Fourier transform P (f ).P
(b) Consider the linearly modulated signal u(t) = n b[n]p(t − n), where b[n] take values inde-
pendently and with equal probability in a 4-PAM alphabet {±1, ±3}. Find an expression for the
PSD of u as a function of the pulse shape parameter a.
(c) Numerically estimate the 95% fractional power containment bandwidth for u and plot it as a
function of 0 ≤ a ≤ 21 . For concreteness, assume the unit of time is 100 picoseconds and specify
the units of bandwidth in your plot.
181
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 4.4 Consider the pulse p(t) whose Fourier transform satisfies:
1, 0 ≤ |f | ≤ A
B−|f |
P (f ) = B−A
, A ≤ |f | ≤ B
0, else
P(f)
−b −a a b f
Figure 4.20: Signaling pulse for Problem 4.6.
Problem 4.6 Consider Nyquist signaling at 80 Mbps using a 16QAM constellation with 50%
excess bandwidth. The signaling pulse has spectrum shown in Figure 4.20.
(a) Find the values of a and b in the figure, making sure you specify the units.
(b) True or False The pulse is also Nyquist for signaling at 20 Mbps using QPSK. (Justify your
answer.)
Problem 4.7 Consider linear modulation with a signaling pulse p(t) = sinc(at)sinc(bt), where
a and b are to be determined.
(a) How should a and b be chosen so that p(t) is Nyquist with 50% excess bandwidth for a data
rate of 40 Mbps using 16QAM? Specify the occupied bandwidth.
(b) How should a and b be chosen so that p(t) can be used for Nyquist signaling both for a
16QAM system with 40 Mbps data rate, and for an 8PSK system with 18 Mbps data rate?
Specify the occupied bandwidth.
182
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 4.8 Consider a passband communication link operating at a bit rate of 16 Mbps using
a 256-QAM constellation.
(a) What must we set the unit of time as so that p(t) = sin πtI[0,1] (t) is square root Nyquist for
the system of interest, while occupying the smallest possible bandwidth?
(b) What must we set the unit of time as so that p(t) = sinc(t)sinc(2t) is Nyquist for the system
of interest, while occupying the smallest possible bandwidth?
Problem 4.9 Consider passband linear modulation with a pulse of the form p(t) = sinc(3t)sinc(2t),
where the unit of time is microseconds.
(a) Sketch the spectrum P (f ) versus f . Make sure you specify the units on the f axis.
(b) What is the largest achievable bit rate for Nyquist signaling using p(t) if we employ a 16QAM
constellation? What is the fractional excess bandwidth for this bit rate?
(c) (True or False) The pulse p(t) can be used for Nyquist signaling at a bit rate of 4 Mbps
using a QPSK constellation.
Problem 4.10 (True or False) Any pulse timelimited to duration T is square root Nyquist
(up to scaling) at rate 1/T .
Problem 4.11 (Raised cosine pulse) In this problem, we derive the time domain response of
the frequency domain raised cosine pulse. Let R(f ) = I[− 1 , 1 ] (f ) denote an ideal boxcar transfer
2 2
π
function, and let C(f ) = 2a cos( πa f )I[− a2 , a2 ] denote a cosine transfer function.
(a) Sketch R(f ) and C(f ), assuming that 0 < a < 1.
(b) Show that the frequency domain raised cosine pulse can be written as
S(f ) = (R ∗ C)(f )
(c) Find the time domain pulse s(t) = r(t)c(t). Where are the zeros of s(t)? Conclude that
s(t/T ) is Nyquist at rate 1/T .
(d) Sketch an argument that shows that, if the pulse s(t/T ) is used for BPSK signaling at rate
1/T , then the magnitude of the transmitted waveform is always finite.
Problem 4.12 (Software exercise for the raised cosine pulse) Code fragment 4.B.1 in the
appendix implements a discrete time truncated raised cosine pulse.
(a) Run the code fragment for 25%, 50% and 100% excess bandwidths and plot the time domain
waveforms versus normalized time t/T over the interval [−5T, 5T ], sampling fast enough (e.g.,
at rate 32/T or higher) to obtain smooth curves. Comment on the effect of varying the excess
bandwidth on these waveforms.
(b) For excess bandwidth of 50%, numerically explore the effect of time domain truncation on
frequency domain spillage. Specifically, compute the Fourier transform for two cases: truncation
to [−2T, 2T ] and truncation to [−5T, 5T ], using the DFT as described in code fragment 2.5.1 to
1
obtain a frequency resolution at least as good as 64T . Plot these Fourier transforms against the
normalized frequency f T , and comment on how much of increase in bandwidth, if any, you see
due to truncation in the two cases.
(c) Numerically compute the 95% bandwidth of the two pulses in (b), and compare it with the
nominal bandwidth without truncation.
Problem 4.13 (Software exercise for the SRRC pulse) (a) Write a function for generating
a sampled SRRC pulse, analogous to code fragment 4.B.1, where you can specify the sampling
183
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
rate, the excess bandwidth, and the truncation length. The time domain expression for the
SRRC pulse is given by (4.45) in the appendix.
Remark: The zero in the denominator can be handled either by analytical or numerical imple-
mentation of L’Hospital’s rule. See comments in code fragment 4.B.1.
(b) Plot the SRRC pulses versus normalized time t/T , for excess bandwidths of 25%, 50% and
100%. Comment on the effect of varying excess bandwidth on these waveforms.
(c) in the appendix implements a discrete time truncated raised cosine pulse.
(a) Run the code fragment for 25%, 50% and 100% excess bandwidths and plot the time domain
waveforms over [−5T, 5T ], sampling fast enough (e.g., at rate 32/T or higher) to obtain smooth
curves. Comment on the effect of varying the excess bandwidth on these waveforms.
(b) For excess bandwidth of 50%, numerically explore the effect of time domain truncation on
frequency domain spillage. Specifically, compute the Fourier transform for two cases: truncation
to [−2T, 2T ] and truncation to [−5T, 5T ], using the DFT as described in code fragment 2.5.1 to
1
obtain a frequency resolution at least as good as 64T . Plot these Fourier transforms against the
normalized frequency f T , and comment on how much of increase in bandwidth, if any, you see
due to truncation in the two cases.
(c) Numerically compute the 95% bandwidth of the two pulses in (b), and compare it with the
nominal bandwidth without truncation.
Problem 4.14 (Effect of timing errors) Consider digital modulation at rate 1/T using the
sinc pulse s(t) = sinc(2W t), with transmitted waveform
100
X
y(t) = bn s(t − (n − 1)T )
n=1
where 1/T is the symbol rate and {bn } is the bit stream being sent (assume that each bn takes
one of the values ±1 with equal probability). The receiver makes bit decisions based on the
samples rn = y((n − 1)T ), n = 1, ..., 100.
(a) For what value of T (as a function of W ) is rn = bn , n = 1, ..., 100?
Remark: In this case, we simply use the sign of the nth sample rn as an estimate of bn .
(b) For the choice of T as in (a), suppose that the receiver sampling times are off by .25 T. That
is, the nth sample is given by rn = y((n − 1)T + .25T ), n = 1, ..., 100. In this case, we do have ISI
of different degrees of severity, depending on the bit pattern. Consider the following bit pattern:
(−1)n−1 1 ≤ n ≤ 49
bn =
(−1)n 50 ≤ n ≤ 100
Numerically evaluate the 50th sample r50 . Does it have the same sign as the 50th bit b50 ?
Remark: The preceding bit pattern creates the worst possible ISI for the 50th bit. Since the sinc
pulse dies off slowly with time, the ISI contributions due to the 99 other bits to the 50th sample
sum up to a number larger in magnitude, and opposite in sign, relative to the contribution due
to b50 . A decision on b50 based on the sign of r50 would therefore be wrong. This sensitivity to
timing error is why the sinc pulse is seldom used in practice.
(c) Now, consider the digitally modulated signal in (a) with the pulse s(t) = sinc(2W t)sinc(W t).
For ideal sampling as in (a), what are the two values of T such that rn = bn ?
(d) For the smaller of the two values of T found in (c) (which corresponds to faster signaling,
since the symbol rate is 1/T ), repeat the computation in (b). That is, find r50 and compare its
sign with b50 for the bit pattern in (b).
(e) Find and sketch the frequency response of the pulse in (c). What is the excess bandwidth
relative to the pulse in (a), assuming Nyquist signaling at the same symbol rate?
(f) Discuss the impact of the excess bandwidth on the severity of the ISI due to timing mismatch.
184
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
−3 −1 1 3
−1
−3
Figure 4.21: 16QAM constellation with scaling chosen for convenient computation of power
efficiency.
Power-bandwidth tradeoffs
Problem 4.15 (Power efficiency of 16QAM) In this problem, we sketch the computation
of power efficiency for the 16QAM constellation shown in Figure 4.21.
(a) Note that the minimum distance for the particular scaling chosen in the figure is dmin = 2.
(b) Show that the constellation points divide into 3 categories based on their distance from the
origin, corresponding to squared distances, or energies, of 12 + 12 , 12 + 32 and 32 + 32 . Averaging
over these energies (weighting by the number of points in each category), show that the average
energy per symbol is Es = 10.
(c) Using (a) and (b), and accounting for the number of bits/symbol, show that the power
d2min
efficiency is given by ηP = E b
= 58 .
Problem 4.17 (OQPSK and MSK) Linear modulation with a bandlimited pulse can perform
poorly over nonlinear passband channels. For example, the output of a passband hardlimiter
(which is a good model for power amplifiers operating in a saturated regime) has constant
envelope, but a PSK signal employing a bandlimited pulse has an envelope that passes through
zero during a 180 degree phase transition, as shown in Figure 4.22. One way to alleviate this
problem is to not allow 180 degree phase transitions. Offset QPSK (OQPSK) is one example of
such a scheme, where the transmitted signal is given by
∞
X T
s(t) = bc [n]p(t − nT ) + jbs [n]p(t − nT − ) (4.26)
n=−∞
2
where {bc [n]}, bs [n] are ±1 BPSK symbols modulating the I and Q channels, with the I and Q
signals being staggered by half a symbol interval. This leads to phase transitions of at most 90
185
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Figure 4.22: The envelope of a PSK signal passes through zero during a 180 degree phase
transition, and gets distorted over a nonlinear channel.
degrees at integer multiples of the bit time Tb = T2 . Minimum Shift Keying (MSK) is a special
case of OQPSK with timelimited modulating pulse
√ πt
p(t) = 2 sin( )I[0,T ] (t) (4.27)
T
(a) Sketch the I and Q waveforms for a typical MSK signal, clearly showing the timing relationship
between the waveforms.
(b) Show that the MSK waveform has constant envelope (an extremely desirable property for
nonlinear channels).
(c) Find an analytical expression for the PSD of an MSK signal, assuming that all bits sent are
i.i.d., taking values ±1 with equal probability. Plot the PSD versus normalized frequency f T .
(d) Find the 99% power containment normalized bandwidth of MSK. Compare with the minimum
Nyquist bandwidth, and the 99% power containment bandwidth of OQPSK using a rectangular
pulse.
(e) Recognize that Figure 4.6 gives the PSD for OQPSK and MSK, and reproduce this figure,
normalizing the area under the PSD curve to be the same for both modulation formats.
Orthogonal signaling
Problem 4.18 (FSK tone spacing) Consider two real-valued passband pulses of the form
s0 (t) = cos(2πf0 t + φ0 ) 0 ≤ t ≤ T
s1 (t) = cos(2πf1 t + φ1 ) 0 ≤ t ≤ T
RT
where f1 > f0 ≫ 1/T . The pulses are said to be orthogonal if hs0 , s1 i = 0 s0 (t)s1 (t)dt = 0.
(a) If φ0 = φ1 = 0, show that the minimum frequency separation such that the pulses are
1
orthogonal is f1 − f0 = 2T .
(b) If φ0 and φ1 are arbitrary phases, show that the minimum separation for the pulses to be
orthogonal regardless of φ0 , φ1 is f1 − f0 = 1/T .
Remark: The results of this problem can be used to determine the bandwidth requirements for
coherent and noncoherent FSK, respectively.
186
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
a(t) b(t)
1 1
t t
0 2 0 1 2
−1
Problem 4.20 The two orthogonal baseband signals shown in Figure 4.23 are used as building
blocks for constructing passband signals as follows.
Problem 4.21 We wish to send at a rate of 10 Mbits/sec over a passband channel. Assum-
ing that an excess bandwidth of 50% is used, how much bandwidth is needed for each of the
following schemes: QPSK, 64-QAM, and 64-ary noncoherent orthogonal modulation using a
Walsh-Hadamard code.
Problem 4.22 Consider 64-ary orthogonal signaling using Walsh-Hadamard codes. Assuming
that the chip pulse is square root raised cosine with excess bandwidth 25%, what is the bandwidth
required for sending data at 20 Kbps over a passband channel assuming (a) coherent reception,
(b) noncoherent reception.
Background
Figure 4.24 shows block diagrams corresponding to a typical DSP-centric realization of a com-
munication transceiver employing linear modulation. In the labs, we model the core components
187
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
of such a system using the complex baseband representation, as shown in Figure 4.25. Given the
equivalence of passband and complex baseband, we are only skipping the modeling of finite pre-
cision effects due to digital-to-analog conversion (DAC) and analog-to-digital conversion (ADC).
These effects can easily be incorporated into Matlab models such as those we develop, but are
beyond our current scope.
I I
Two−dimensional Transmit DAC
symbols filter
(implemented Upconverter
rate 1/T Q Q
at rate 4/T) DAC
I I
DSP for Receive ADC
Estimated filter
receiver Dnconverter
(implemented Q
symbols functions at rate 4/T) Q
ADC
(includes coarse analog
(synchronization, Digital Analog passband filtering)
equalization, streams baseband
demodulation) rate 4/T waveforms
Figure 4.24: Typical DSP-centric transceiver realization. Our model does not include the blocks
shown in dashed lines. Finite precision effects such as DAC and ADC are not considered. The
upconversion and downconversion operations are not modeled. The passband channel is modeled
as an LTI system in complex baseband.
Sampler,
rate m/T
Symbols Transmit Receive
Channel Receiver Estimated
{b[n]} Filter Filter Filter Signal
Rate 1/T g (t) g (t) g RX(t) Processing symbols
TX C
(Synchronization,
Noise Equalization,
Demodulation)
Figure 4.25: Block diagram of a linearly modulated system, modeled in complex baseband.
188
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Noise model: Noise is introduced in a later lab (in Chapter 6).
Receive filter and sampler: The optimal choice of receive filter is actually a filter matched to
the cascade of the transmit filter and the channel. In this case, there is no information loss
in sampling the output of the receive filter at the symbol rate T1 . Often, however, we use a
suboptimal choice of receive filter (e.g., a wideband filter flat over the signal band, or a filter
matched to the transmit filter). In this case, it is typically advantageous to sample faster than
the symbol rate. In general, we assume that the sampler operates at rate m T
, where m is a positive
integer. The output of the sampler is then processed, typically using digital signal processing
(DSP), to perform receiver functions such as synchronization, equalization and demodulation..
The simulation of a linearly modulated system typically involves the following steps.
Step 1: Generating random symbols to be sent
We restrict attention in this lab to Binary Phase Shift Keying (BPSK). That is, the symbols
{bn } in Figure 1 take values ±1.
Step 2: Implementing the transmit, channel, and receive filters
Since the bandwidth of these filters is of the order of T1 , they can be accurately implemented in
DSP by using FIR filters operating on samples at a rate which is a suitable multiple of T1 . The
default choice of sampling rate in the labs is T4 , unless specified otherwise. If the filter is specified
in continuous time, typically, one simply samples the impulse response at rate T4 , taking a large
enough filter length to capture most of the energy in the impulse response. Code fragment 4.B.1
in the appendix illustrates generating a discrete time filter corresponding to a truncated raised
cosine pulse.
Step 3: Sending the symbols through the filters.
To send symbols at rate T1 through filters implemented at rate T4 , it is necessary to upsample
the symbols before convolving them with the filter impulse response determined in Step 2. Code
fragment 4.B.2 in the appendix illustrates this for a raised cosine pulse.
Step 4: Adding noise
Typically, we add white Gaussian noise (model to be specified in a later lab) at the input to the
receive filter.
Step 5: Processing at the receive filter output
If there is no intersymbol interference (ISI), the processing simply consists of sampling at rate T1
to get decision statistics for the symbols of interest. For BPSK, you might simply take the sign
of the decision statistic to make your bit decision.
If the ISI is significant, then channel equalization (discussed in a later lab) is required prior to
making symbol decisions.
Laboratory Assignment
0) Write a Matlab function analogous to Code Fragment 4.B.1 to generate an SRRC pulse (i.e.,
do Problem 4.13(a)) where you can specify the truncation length and the excess bandwidth.
1) Set the transmit filter to an SRRC pulse with excess bandwidth 22%, sampled at rate 4/T
and truncated to [−5T, 5T ]. Plot the impulse response of the transmit filter versus t/T .
If you have difficulty generating the SRRC pulse, use the following code fragment to generate
the transmit filter:
189
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
%first specify half of the filter
hhalf = [-0.025288315;-0.034167931;-0.035752323;-0.016733702;0.021602514;
0.064938487;0.091002137;0.081894974;0.037071157;-0.021998074;-0.060716277 ;
-0.051178658;0.007874526;0.084368728;0.126869306;0.094528345;-0.012839661;
-0.143477028;-0.211829088;-0.140513128;0.094601918;0.441387140;0.785875640;
1.0];
transmit_filter = [hhalf;flipud(hhalf)];
2) Using the DFT (as in Code Fragment 2.5.1 for Example 2.5.4), compute the magnitude of
the transfer function of the transmit filter versus the normalized frequency f T (make sure the
1
resolution in frequency is good enough to get a smooth plot, e.g., at least as good as 64T ). From
eyeballing the plot, check whether the normalized bandwidth (i.e., bandwidth as a multiple of
1
T
) is well predicted by the nominal excess bandwidth.
3) Use the transmit filter in the Code Fragment 4.B.2, which implements upsampling and allows
sending a programmable number of symbols through the system. Set the receive filter to be the
matched filter corresponding to the transmit filter, and plot the response at the output of the
receive filter to a single symbol. Is the cascade of the transmit and receive filters is Nyquist at
rate 1/T ?
4) Generate 100 random bits {a[n]} taking values in {0, 1}, and map them to symbols {b[n]}
taking values in {−1, +1}, with 0 mapped to +1 and 1 to −1.
5) Send the 100 symbols {b[n]} through the system. What is the length of the corresponding
output of the transmit filter? What is the length of the corresponding output of the receive
filter? Plot separately the input to the receive filter, and the output of the receive filter versus
time, with one unit of time on the x-axis equal to the symbol time T .
6) Do the best job you can in recovering the transmitted bits {a[n]} by directly sampling the
input to the receive filter, and add lines in the matlab code for implementing your idea. That
is, select a set of 100 samples, and estimate the 100 transmitted bits based on the sign of these
samples. (What sampling delay and spacing would you use?). Estimate the probability of error
(note: no noise has been added).
7) Do the best job you can in recovering the transmitted bits by directly sampling the output of
the receive filter, and add lines in the Matlab code for implementing your idea. That is, select a
set of 100 samples, estimate the 100 transmitted bits based on the sign of these samples. (What
sampling delay and spacing would you use?). Estimate the probability of error. Also estimate
the probability of error if you chose an incorrect delay, offset from the correct delay by T /2.
8) Suppose that the receiver LO used for downconversion is ahead in frequency and phase relative
1
to the incoming wave by ∆f = 40T and a phase of π/2. Modify your complex baseband model
to include the effects of the carrier phase and frequency offset. When you now sample at the
“correct” delay as determined in 7), do a scatter plot of the complex-valued samples {y[n], n =
1, ..., 100} that you obtain. Can you make correct decisions based on taking the sign of the real
part of the samples, as in 7)?
9) Now consider a differentially encoded system in which we send {a[n], n = 1, ..., 99}, where
a[n] ∈ {0, 1}, by sending the following ±1 bits: b[1] = +1, and for n = 2, ..., 100
b[n − 1], a[n] = 0,
b[n] =
−b[n − 1], a[n] = 1,
Devise estimates for the bits {a[n]} from the samples {y[n]} in 8), and estimate the probability
of error.
Hint: What does y[n]y ∗ [n − 1] look like?
190
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Lab Report: Your lab report should answer the preceding questions in order, and should
document the reasoning you used and the difficulties you encountered. Comment on whether
you get better error probability in 6) or 7), and why?
While we model the complex-valued symbol sequence {b[n]} as random, we do not need to
invoke concepts from probability and random processes to compute the PSD, but can simply
model time-averaged quantities for the symbol sequence. For example, the DC value, which is
typically designed to be zero, is defined by
N
1 X
b[n] = lim b[n] (4.28)
N →∞ 2N + 1
n=−N
We also define the time-averaged autocorrelation function Rb [k] = b[n]b∗ [n − k] for the symbol
sequence as the following limit:
N
1 X
Rb [k[= lim b[n]b∗ [n − k] (4.29)
N →∞ 2N
n=−N
Note that we are being deliberately sloppy about the limits of summation in n on the right-hand
side to avoid messy notation. Actually, since −N ≤ m = n − k ≤ N, we have the constraint
−N + k ≤ n ≤ N + k in addition to the constraint −N ≤ n ≤ N. Thus, the summation in
n should depend on the delay k at which we are evaluating the autocorrelation function, going
from n = −N to n = N + k for k < 0, and n = −N + k to n = N for k ≥ 0. However, we ignore
these edge effects, since become negligible when we let N get large while keeping k fixed.
We now compute the time-averaged PSD. As described in Section 4.2.1, the steps for computing
the PSD for a finite-power signal u(t) are as follows:
(a) timelimit to a finite observation interval of length To to get a finite energy signal uTo (t);
(b) compute the Fourier transform UTo (f ), and hence obtain the energy spectral density |UTo (f )|2 ;
|U (f )|2
(c) estimate the PSD as Ŝu (f ) = ToT0 , and take the limit T0 → ∞ to obtain Su (f ).
Consider the observation interval [−NT, NT ], which fits roughly 2N symbols. In general, the
modulation pulse p(t) need not be timelimited to the symbol duration T . However, we can
neglect the edge effects caused by this, since we eventually take the limit as the observation
interval gets large. Thus, we can write
N
X
uTo (t) ≈ b[n]p(t − nT )
n=−N
191
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The energy spectral density is therefore given by
N
X N
X
2
|UTo (f )| = UTo (f )UT∗o (f ) = −j2πf nT
b[n]P (f )e b∗ [m]P ∗ (f )ej2πf mT
n=−N m=−N
where we need to use two different dummy variables, n and m, for the summations corresponding
to UTo (f ) and UT∗o (f ), respectively. Thus,
N
X N
X
|UTo (f )|2 = |P (f )|2 b[n]b∗ [m]e−j2π(m−n)f T
m=−N n=−N
2
Thus, the PSD factors into two components: the first is a term |P (f
T
)|
that depends only on the
spectrum of the modulation pulse p(t), while the second term (in curly brackets) depends only
on the symbol sequence {b[n]}. Let us now work on simplifying the latter. Grouping terms of
the form m = n − k for each fixed k, we can rewrite this term as
N N N
1 X X X 1 X
b[n]b∗ [m]e−j2πf (n−m)T = b[n]b∗ [n − k]e−j2πf kT (4.31)
2N m=−N n=−N k
2N n=−N
Rb [k]e−j2πf kT . Substituting
P
From (4.29), we see that taking the limit N → ∞ in (4.31) yields k
into (4.30), we obtain that the PSD is given by
|P (f )|2 X
Su (f ) = Rb [k]e−j2πf kT (4.32)
T k
Thus, we see that the PSD depends both on the modulating pulse p(t) and on the properties
of the symbol sequence {b[n]}. We explore how the dependence on the symbol sequence can
be exploited for shaping the spectrum in the problems. However, for most systems, the symbol
sequence can be modeled as uncorrelated and zero mean, In this case, Rb [k] = 0 for k 6= 0.
Specializing to this important setting yields Theorem 4.2.1.
192
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Consider the raised cosine pulse, which is the most common choice for bandlimited Nyquist pulses.
Setting the symbol rate 1/T = 1 without loss of generality (this is equivalent to expressing all
results in terms of t/T or f T ), this pulse is given by
0 ≤ |f | ≤ 1−a
1, 2
1
1 + cos πa (|f | − 1−a ) , 1−a ≤ |f | ≤ 1+a
P (f ) = 2 2 2 2 (4.33)
0, else
%time domain pulse for raised cosine, together with time vector to plot it against
%oversampling factor= how much faster than the symbol rate we sample at
%length=where to truncate response (multiple of symbol time) on each side of peak
%a = excess bandwidth
function [rc,time_axis] = raised_cosine(a,m,length)
length_os = floor(length*m); %number of samples on each side of peak
%time vector (in units of symbol interval) on one side of the peak
z = cumsum(ones(length_os,1))/m;
A= sin(pi*z)./(pi*z); %term 1
B= cos(pi*a*z); %term 2
C= 1 - (2*a*z).^2; %term 3
zerotest = m/(2*a); %location of zero in denominator
%check whether any sample coincides with zero location
if (zerotest == floor(zerotest)),
B(zerotest) = pi*a;
C(zerotest) = 4*a;
%alternative is to perturb around the sample
%(find L’Hospital limit numerically)
%B(zerotest) = cos(pi*a*(z(zerotest)+0.001));
%C(zerotest) = 1-(2*a*(z(zerotest)+0.001))^2;
end
D = (A.*B)./C; %response to one side of peak
rc = [flipud(D);1;D]; %add in peak and other side
time_axis = [flipud(-z);0;z];
This can, for example, be used to generate a plot of the raised cosine pulse, as follows, where we
would typically oversample by a large factor (e.g., m = 32) in order to get a smooth plot.
193
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
a = 0.5; % desired excess bandwidth
m = 32; %oversample by a lot to get smooth plot
length = 10; % where to truncate the time domain response
%(one-sided, multiple of symbol time)
[rc,time] = raised_cosine(a,m,length);
plot(time,rc);
The code for the raised cosine function can also be used to generate the coefficients of a dis-
crete time transmit filter. Here, the oversampling factor would be dictated by our DSP-centric
implementation, and would usually be far less than what is required for a smooth plot: the
digital-to-analog converter would perform the interpolation required to provide a smooth analog
waveform for upconversion. A typical choice is m = 4, as in the Matlab code below for generating
a noiseless BPSK modulated signal.
Upsampling: As noted in our preview of digital modulation in Section 2.3.2, the symbols come
in every T seconds, while the samples of the transmit filter are spaced by T /m. For example,
the nth symbol contributes b[n]p(t − nT ) to the transmit filter output, and the (n + 1)st symbol
contributes b[n + 1]p(t − (n + 1)T ). Since p(t − nT ) and p(t − (n + 1)T ) are offset by T , they
must be offset by m samples when sampling at a rate of m/T . Thus, if the symbols are input to
a transmit filter whose discrete time impulse response is expressed at sampling rate m/T , then
successive symbols at the input to the filter must be spaced by m samples. That is, in order to
get the output as a convolution of the symbols with the transmit filter expressed at rate m/T ,
we must insert m − 1 zeros between successive symbols to convert them to a sampling rate of
m/T .
For completeness, we reproduce part of the upsampling Code Fragment 2.3.2 below in imple-
menting a raised cosine transmit filter.
oversampling_factor = 4;
m = oversampling_factor;
%parameters for sampled raised cosine pulse
a = 0.5;
length = 10;% (truncated outside [-length*T,length*T])
%raised cosine transmit filter (time vector set to a dummy variable which is not used)
[transmit_filter,dummy] = raised_cosine(a,m,length);
%NUMBER OF SYMBOLS
nsymbols = 100;
%BPSK SYMBOL GENERATION
symbols = sign(rand(nsymbols,1) -.5);
%UPSAMPLE BY m
nsymbols_upsampled = 1+(nsymbols-1)*m;%length of upsampled symbol sequence
symbols_upsampled = zeros(nsymbols_upsampled,1);%initialize
symbols_upsampled(1:m:nsymbols_upsampled)=symbols;%insert symbols with spacing m
%NOISELESS MODULATED SIGNAL
tx_output = conv(symbols_upsampled,transmit_filter);
194
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
and sketch its derivation. Noting that 1 + cos 2θ = 2 cos2 θ, we can rewrite the frequency domain
expression (4.33) for the raised cosine pulse as
0 ≤ |f | ≤ 1−a
1, 2
π
(|f | − 1−a ) , 1−a ≤ |f | ≤ 1+a
P (f ) = cos2 2a 2 2 2 (4.35)
0, else
We can now take the square root to get an analytical expression for the SRRC pulse in the
frequency domain as follows:
0 ≤ |f | ≤ 1−a
1, 2
π
(|f | − 1−a , 1−a ≤ |f | ≤ 1+a
G(f ) = cos 2a 2
) 2 2
Frequency domain SRRC pulse
0, else
(4.36)
Finding the time domain SRRC pulse is now a matter of computing the inverse Fourier transform.
Since it is also an interesting exercise in utilizing Fourier transform properties, we sketch the
derivation. First, we break up the frequency domain pulse into segments whose inverse Fourier
transforms are well known. Setting b = 1−a 2
, we have
G(f ) = G1 (f ) + G2 (f ) (4.37)
where
sin(2πbt) sin π(1 − a)t
G1 (f ) = I[−b,b] (f ) ↔ g1 (t) = 2b sinc(2bt) = = (4.38)
πt πt
and
G2 (f ) = U(f − b) + U(−f − b) (4.39)
with π 1 jπf π
U(f ) = cos f I[0,a] (f ) = e 2a + e−j 2a f I[0,a] (f ) (4.40)
2a 2
To evaluate g2 (t), note first that
195
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1−a
Plugging in (4.42), and substituting the value of b = 2
, we obtain upon simplification that
2a 2ejπ(1+a)t − j8atejπ(1−a)t
2u(t)ej2πbt =
π 1 − 16a2 t2
Taking the real part, we obtain
We leave it as an exercise to write Matlab code to generate a sampled version of the SRRC pulse
(analogous to Code Fragment 4.B.1), taking into account the zeros in the denominator. This
can then be used to generate a noiseless transmit waveform as in Code Fragment 4.B.2 simply
by replacing the transmit filter by an SRRC pulse.
196
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 5
Probability theory is fundamental to communication system design, especially for digital commu-
nication. Not only are there uncontrolled sources of uncertainty such as noise, interference, and
other channel impairments that are only amenable to statistical modeling, but the very notion of
information underlying digital communication is based on uncertainty. In particular, the receiver
in a communication system does not know a priori what the transmitter is sending (otherwise
the transmission would be pointless), hence the receiver designer must employ statistical models
for the transmitted signal. In this chapter, we review basic concepts of probability and random
variables with examples motivated by communications applications. We also introduce the con-
cept of random processes, which are used to model both signals and noise in communication
systems.
Chapter Plan: The goal of this chapter is to develop the statistical modeling tools required in
later chapters. For readers who are already comfortable with probability and random processes,
the shortest path to Chapter 6 is to review the material on Gaussian random variables in Section
5.6 and noise modeling in Section 5.8. Sections 5.1 through 5.5 provide a review of background
material on probability and random variables. Section 5.1 discusses basic concepts of probability:
the most important of these for our purpose are the concepts of conditional probability and Bayes’
rule. Sections 5.2 and 5.4 discuss random variables and functions of random variables. Multiple
random variables, or random vectors, are discussed in Section 5.3. Section 5.5 discusses various
statistical averages and their computation. Material which is not part of the assumed background
starts with Section 5.6; this section goes in depth into Gaussian random variables and vectors,
which play a critical role in the mathematical modeling of communication systems. Section 5.7
introduces random processes in sufficient depth that we can describe, and perform elementary
computations with, the classical white Gaussian noise (WGN) model in Section 5.8. At this
point, zealous followers of a “just in time” philosophy can move on to the discussion of optimal
receiver design in Chapter 6. However, many others might wish to go through one more section
Section 5.9, which provides a more general treatment of the effect of linear operations on random
processes. The results in this section allow us, for example, to model noise correlations and to
compute quantities such as signal-to-noise ratio (SNR). Material which we do not build on in
later chapters, but which may be of interest to some readers, is placed in the appendices: this
includes limit theorems, qualitative discussion of noise mechanisms, discussion of the structure
of passband random processes, and quantification, via SNR computations, of the effect of noise
on analog modulation.
197
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Sample Space: The starting point in probability is the notion of an experiment whose outcome
is not deterministic. The set of all possible outcomes from the experiment is termed the sample
space Ω. For example, the sample space corresponding to the throwing of a six-sided die is
Ω = {1, 2, 3, 4, 5, 6}. An analogous example which is well-suited to our purpose is the sequence
of bits sent by the transmitter in a digital communication system, modeled probabilistically by
the receiver. For example, suppose that the transmitter can send a sequence of seven bits, each
taking the value 0 or 1. Then our sample space consists of the 27 = 128 possible bit sequences.
Event: Events are sets of possible outcomes to which we can assign a probability. That is, an
event is a subset of the sample space. For example, for a six-sided die, the event {1, 3, 5} is the
set of odd-numbered outcomes.
Complement
111111111111111
000000000000000 Union
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
A
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111 A B
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111
000000000000000
111111111111111 Intersection
A B
We are often interested in probabilities of events obtained from other events by basic set opera-
tions such as complementation, unions and intersections; see Figure 5.1.
Complement of an Event (“NOT”): For an event A, the complement (“not A”), denoted
by Ac , is the set of outcomes that do not belong to A.
Union of Events (“OR”): The union of two events A and B, denoted by A ∪ B, is the set
of all outcomes that belong to either A or B. The term ”or” always refers to the inclusive or,
unless we specify otherwise. Thus, outcomes belonging to both events are included in the union.
Intersection of Events (“AND”): The intersection of two events A and B, denoted by A ∩ B,
is the set of all outcomes that belong to both A and B.
Mutually Exclusive, or Disjoint, Events: Events A and B are mutually exclusive, or disjoint,
if their intersection is empty: A ∩ B = ∅.
Difference of Events: The difference A \ B is the set of all outcomes that belong to A but not
to B. In other words, A \ B = A ∩ B c .
Probability Measure: A probability measure is a function that assigns probability to events.
Some properties are as follows.
Range of probability: For any event A, we have 0 ≤ P [A] ≤ 1. The probability of the empty
set is zero: P [∅] = 0. The probabilty of the entire sample space is one: P [Ω] = 1.
Probabilities of disjoint events add up: If two events A and B are mutually exclusive, then
198
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the probability of their union equals the sum of their probabilities.
By mathematical induction, we can infer that the probability of the union of a finite number of
pairwise disjoint events also adds up. It is useful to review the principle of mathematical induction
via this example. Specifically, suppose that we are given pairwise disjoint events A1 , A2 , A3 , ....
We wish to prove that, for any n ≥ 2,
where
B = A1 ∪ A2 ∪ ... ∪ Ak
and Ak+1 are disjoint. We can therefore conclude, using step (a), that
Probabilities of unions and intersections: We can use the property (5.1) to infer the fol-
lowing property regarding the union and intersection of arbitrary events:
Let us get a feel for how to use the probability axioms by proving this. We break A1 ∪ A2 into
disjoint events as follows:
A1 ∪ A2 = A2 ∪ (A1 \ A2 )
Applying (5.1), we have
P [A1 ∪ A2 ] = P [A2 ] + P [A1 \ A2 ] (5.5)
199
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Furthermore, since A1 can be written as the disjoint union A1 = (A1 ∩ A2 ) ∪ (A1 \ A2 ), we have
P [A1 ] = P [A1 ∩ A2 ] + P [A1 \ A2 ], or P [A1 \ A2 ] = P [A1 ] − P [A1 ∩ A2 ]. Plugging into (5.5), we
obtain (5.4).
Conditional probability: The conditional probability of A given B is the probability of A
assuming that we already know that the outcome of the experiment is in B. Outcomes corre-
sponding to this probability must therefore belong to the intersection A ∩ B. We therefore define
the conditional probability as
P [A ∩ B]
P [A|B] = (5.6)
P [B]
(We assume that P [B] > 0, otherwise the condition we are assuming cannot occur.)
Conditional probabilities behave just the same as regular probabilities, since all we are doing is
restricting the sample space to the event being conditioned on. Thus, we still have P [A|B] =
1 − P [Ac |B] and
P [A1 ∪ A2 |B] = P [A1 |B] + P [A2 |B] − P [A1 ∩ A2 |B]
Conditioning is a crucial concept in models for digital communication systems. A typical appli-
cation is to condition on the which of a number of possible transmitted signals is sent, in order
to describe the statistical behavior of the communication medium. Such statistical models then
form the basis for receiver design and performance analysis.
Transmitted Received
1−a
0 0
b
a
1 1
1−b
Figure 5.2 depicts the conditional probabilities for a noisy binary channel. On the left side are
the two possible values of the bit sent, and on the right are the two possible values of the bit
received. The labels on a given arrow are the conditional probability of the received bit, given
the transmitted bit. Thus, the binary channel is defined by means of the following conditional
probabilities:
These conditional probabilities are often termed the channel transition probabilities. The proba-
bilities a and b are called the crossover probabilities. When a = b, we obtain the binary symmetric
channel.
200
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
In the above, we have decomposed an event of interest, A, into a disjoint union of two events,
A ∩ B and A ∩ B c , so that (5.1) applies. The sets B and B c form a partition of the entire sample
space; that is, they are disjoint, and their union equals Ω. This generalizes to any partition of
the sample space; that is, if B1 , B2 , ... are mutually exclusive events such that their union covers
the sample space (actually, it is enough if the union contains A), then
X X
P [A] = P [A ∩ Bi ] = P [A|Bi ]P [Bi ] (5.8)
i i
Example 5.1.2 (Applying the law of total probability to the binary channel): For the
channel in Figure 5.2, set a = 0.1 and b = 0.25, and suppose that the probability of transmitting
0 is 0.6. This is called the prior, or a priori, probability of transmitting 0, because it is the
statistical information that the receiver has before it sees the received bit. Using (5.3), the prior
probability of 1 being transmitted is
(since sending 0 or 1 are our only options for this particular channel model, the two events are
complements of each other). We can now compute the probability that 0 is received using the
law of total probability, as follows.
P [0 received]
= P [0 received|0 transmitted]P [0 transmitted] + P [0 received|1 transmitted]P [1 transmitted]
= 0.9 × 0.6 + 0.25 × 0.4 = 0.64
We can also compute the probability that 1 is received using the same technique, but it is easier
to infer this from (5.3) as follows:
where we have used (5.7). Similarly, in the setting of (5.8), we can compute P [Bj |A] as follows:
Bayes’ rule is typically used as follows in digital communication. The event B might correspond
to which transmitted signal was sent. The event A may describe the received signal, so that
P [A|B] can be computed based on our model for the statistics of the received signal, given
the transmitted signal. Bayes’ rule can then be used to compute the conditional probability
P [B|A] of a given signal having been transmitted, given information about the received signal,
as illustrated in the example below.
Example 5.1.3 (Applying Bayes’ rule to the binary channel): Continuing with the binary
channel of Figure 5.2 with a = 0.1, b = 0.25, let us find the probability that 0 was transmitted,
given that 0 is received. This is called the posterior, or a posteriori, probability of 0 being
transmitted, because it is the statistical model that the receiver infers after it sees the received
201
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
bit. As in Example 5.1.2, we assume that the prior probability of 0 being transmitted is 0.6. We
now apply Bayes’ rule as follows:
P [0 received|0 transmitted]P [0 transmitted]
P [0 transmitted|0 received] = P [0 received]
0.9×0.6 27
= 0.64
= 32
where we have used the computation from Example 5.1.2, based on the law of total probability,
for the denominator. We can also compute the posterior probability of the complementary event
as follows:
5
P [1 transmitted|0 received] = 1 − P [0 transmitted|0 received] =
32
These results make sense. Since the binary channel in Figure 5.2 has a small probability of error,
it is much more likely that 0 was transmitted than that 1 was transmitted when we receive 0. The
situation would be reversed if 1 were received. The computation of the corresponding posterior
probabilities is left as an exercise. Note that, for this example, the numerical values for the
posterior probabilities may be different when we condition on 1 being received, since the channel
transition probabilities and prior probabilities are not symmetric with respect to exchanging the
roles of 0 and 1.
Two other concepts that we use routinely are independence and conditional independence.
Independence: Events A1 and A2 are independent if
P [A1 ∩ A2 ] = P [A1 ]P [A2 ] (5.11)
Example 5.1.4 (independent bits): Suppose we transmit three bits. Each time, the proba-
bility of sending 0 is 0.6. Assuming that the bits to be sent are selected independently each of
these three times, we can compute the probability of sending any given three-bit sequence using
(5.11).
P [000 transmitted] = P [first bit = 0, second bit = 0, third bit = 0]
= P [first bit = 0]P [second bit = 0]P [third bit = 0] = 0.63 = 0.216
Let us do a few other computations similarly, where we now use the shorthand P [x1 x2 x3 ] to
denote that x1 x2 x3 is the sequence of three bits transmitted.
P [101] = 0.4 × 0.6 × 0.4 = 0.096
and
P [two ones transmitted] = P [110] + P [101] + P [011] = 3 × (0.4)2 × 0.6 = 0.288
The number of ones is actually a binomial random variable (reviewed in Section 5.2).
Example 5.1.5 (independent channel uses): Now, suppose that we transmit three bits,
with each bit seeing the binary channel depicted in Figure 5.2. We say that the channel is mem-
oryless when the value of the received bit corresponding to a given channel use is conditionally
independent of the other received bits, given the transmitted bits. For the setting of Example
5.1.4, where we choose the transmitted bits independently, the following example illustrates the
computation of conditional probabilities for the received bits.
P [100 received|010 transmitted]
= P [1 received|0 transmitted]P [0 received|1 transmitted]P [0 received|0 transmitted]
= 0.1 × 0.25 × 0.9 = 0.0225
202
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
We end this section with a mention of two useful bounding techniques.
Union bound: The probability of a union of events is upper bounded by the sum of the
probabilities of the events.
P [A1 ∪ A2 ] ≤ P [A1 ] + P [A2 ] (5.13)
This follows from (5.4) by noting that P [A1 ∩ A2 ] ≥ 0. This property generalizes to a union of
a collection of events by mathematical induction:
"n # n
[ X
P Ai ≤ P [Ai ] (5.14)
i=1 i=1
X( )
ω
Sample space Ω
X( ω )
Figure 5.3: A random variable is a mapping from the sample space to the real line.
A random variable assigns a number to each outcome of a random experiment. That is, a
random variable is a mapping from the sample space Ω to the set of real numbers, as shown in
Figure 5.3. The underlying experiment that leads to the outcomes in the sample space can be
quite complicated (e.g., generation of a noise sample in a communication system may involve
the random movement of a large number of charge carriers, as well as the filtering operation
performed by the receiver). However, we do not need to account for these underlying physical
phenomena in order to specify the probabilistic description of the random variable. All we need
to do is to describe how to compute the probabilities of the random variable taking on a particular
set of values. In other words, we need to specify its probability distribution, or probability law.
Consider, for example, the Bernoulli random variable, which may be used to model random bits
sent by a transmitter, or to indicate errors in these bits at the receiver.
Bernoulli random variable: X is a Bernoulli random variable if it takes values 0 or 1. The
probability distribution is specified if we know P [X = 0] and P [X = 1]. Since X can take only
one of these two values, the events {X = 0} and {X = 1} constitute a partition of the sample
space, so that P [X = 0]+P [X = 1] = 1. We therefore can characterize the Bernoulli distribution
by a parameter p ∈ [0, 1], where p = P [X = 1] = 1 − P [X = 0]. We denote this distribution as
Bernoulli(p).
In general, if a random variable takes only a discrete set of values, then its distribution can be
specified simply by specifying the probabilities that it takes each of these values.
Discrete Random Variable, Probability Mass Function: X is a discrete random variable
if it takes a finite, or countably infinite, number of values. If X takes values x1 , x2 , ..., then its
203
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
probability distribution is characterized by its probability mass function (PMF),
P or the probabil-
ities pi = P [X = xi ], i = 1, 2, .... These probabilities must add up to one, i pi = 1, since the
events {X = xi }, i = 1, 2, ... provide a partition of the sample space.
For random variables that take values in a continuum, the probability of taking any particular
value is zero. Rather, we seek to specify the probability that the value taken by the random
variable falls in a given set of interest. By choosing these sets to be intervals whose size shrinks
to zero, we arrive at the notion of probability density function, as follows.
Continuous Random Variable, Probability Density Function: X is a continuous random
variable if the probability P [X = x] is zero for each x. In this case, we define the probability
density function (PDF) as follows:
P [x ≤ X ≤ x + ∆x]
pX (x) = lim (5.15)
∆x→0 ∆x
In other words, for small intervals, we have the approximate relationship:
P [x ≤ X ≤ x + ∆x] ≈ pX (x) ∆x
Expressing an event of interest as a disjoint union of such small intervals, the probability of the
event is the sum of the probabilities of these intervals; as we let the length of the intervals shrink,
the sum becomes an integral (with ∆x replaced by dx). Thus, the probability of X taking values
in a set A can be computed by integrating its PDF over A, as follows:
Z
P [X ∈ A] = pX (x)dx (5.16)
A
The PDF must integrate to one over the real line, since any value taken by X falls within this
interval: Z ∞
pX (x)dx = 1
−∞
Notation: We use the notation pX (x) to denote the density of a random variable X, evaluated
at the point x. Thus, the argument of the density is a dummy variable, and could be denoted
by some other letter: for example, we could use the notation pX (u) as notation for the density
of X, evaluated at the point u. Once we firmly establish these concepts, however, we plan to
allow ourselves to get sloppy. As discussed in the note at the end of Section 5.3, if there is no
scope for confusion, we plan to use the dummy variable to also denote the random variable we
are talking about. For example, we use p(x) as the notation for pX (x) and p(y) as the notation
for pY (y). But for now, we retain the subscripts in the introductory material in Sections 5.2 and
5.3.
Density: We use the generic term “density” to refer to both PDF and PMF (but more often
the PDF), relying on the context to clarify what we mean by the term.
The PMF or PDF cannot be used to describe mixed random variables that are neither discrete
nor continuous. We can get around this problem by allowing PDFs to contain impulses, but a
general description of the probability distribution of any random variable, whether it is discrete,
continuous or mixed, can be provided in terms of its cumulative distribution function, defined
below.
Cumulative distribution function (CDF): The CDF of a random variable X is defined as
FX (x) = P [X ≤ x]
and has the following general properties:
(1) FX (x) is nondecreasing in x.
204
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
This is because, for x1 ≤ x2 , we have {X ≤ x1 } ⊆ {X ≤ x2 }, so that P [X ≤ x1 ] ≤ P [X ≤ x2 ].
(2) FX (−∞) = 0 and FX (∞) = 1.
The event {X ≤ −∞} contains no allowable values for X, and is therefore the empty set, which
has probabilty zero. The event {X ≤ ∞} contains all allowable values for X, and is therefore
the entire sample space, which has probabilty one.
(3) FX (x) is right-continuous: FX (x) = limδ→0,δ>0 FX (x+δ). Denoting this right limit as FX (x+ ),
and can state the property compactly as FX (x) = FX (x+ ).
The proof is omitted, since it requires going into probability theory at a depth that is unnecessary
for our purpose.
Any function that satisfies (1)-(3) is a valid CDF. The CDFs for discrete and mixed random
variables exhibit jumps. At each of these jumps, the left limit F (x− ) is strictly smaller than the
right limit FX (x+ ) = FX (x). Noting that
P [X = x] = P [X ≤ x] − P [X < x] = FX (x) − FX (x− ) (5.17)
we note that the jumps correspond to the discrete set of points where nonzero probability mass
is assigned. For a discrete random variable, the CDF remains constant between these jumps.
The PMF is given by applying (5.17) for x = xi , i = 1, 2, ..., where {xi } is the set of values taken
by X.
For a continuous random variable, there are no jumps in the CDF, since P [X = x] = 0 for all x.
That is, a continuous random variable can be defined as one whose CDF is a continuous function.
From the definition (5.15) of PDF, it is clear that the PDF of a continuous random variable is
the derivative of the CDF; that is,
pX (x) = F ′ X (x) (5.18)
Actually, it is possible that the derivative of the CDF for a continuous random variable does not
exist at certain points (i.e., when the slopes of FX (x) approaching from the left and the right are
different). The PDF at these points can be defined as either the left or the right slope; it does not
make a difference in our probability computations, which involving integrating the PDF (which
washes away the effect of individual points). We therefore do not worry about this technicality
any further.
We obtain the CDF from the PDF by integrating the relationship (5.18):
Z x
FX (x) = pX (z) dz (5.19)
−∞
205
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0.2
0.18
0.16
0.14
0.12
p(x)
0.1
0.08
0.06
0.04
0.02
0
0 5 10 15 20 25
x
1
Figure 5.4: PDF of an exponential random variable with parameter λ = 1/5 (or mean λ
= 5).
See Figure 5.4 for an example PDF. We can write this more compactly using the indicator
function:
pX (x) = λe−λx I[0,∞) (x)
The CDF is given by
FX (x) = (1 − e−λx )I[0,∞)(x)
For x ≥ 0, the CCDF is given by
That is, the tail of an exponential distribution decays (as befits its name) exponentially.
0.1
0.09
0.08
0.07
0.06
p(x)
0.05
0.04
0.03
0.02
0.01
0
−15 −10 −5 0 5 10 15 20 25
x
Figure 5.5: PDF of a Gaussian random variable with parameters m = 5 and v 2 = 16. Note the
bell shape for the Gaussian density, with peak around its mean m = 5
206
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Gaussian (or normal) random variable: The random variable X has a Gaussian distribution
with parameters m and v 2 if its PDF is given by
(x − m)2
1
pX (x) = √ exp − (5.21)
2πv 2 2v 2
See Figure 5.5 for an example PDF. As we show in Section 5.5, m is the mean of X and v 2 is
its variance. The PDF of a Gaussian has a well-known bell shape, as shown in Figure 5.5. The
Gaussian random variable plays a very important role in communication system design, hence
we discuss it in far more detail in Section 5.6, as a prerequisite for the receiver design principles
to be developed in Chapter 6.
Example 5.2.1 (Recognizing a Gaussian density): Suppose that a random variable X has
PDF 2
pX (x) = ce−2x +x
where c is an unknown constant, and x ranges over the real line. Specify the distribution of X
and write down its PDF.
Solution: Any PDF with an exponential dependence on a quadratic can be put in the form (5.21)
by completing squares in the exponent.
2 2 1 2 1
−2x + x = −2(x − x/2) = −2 (x − ) −
4 16
1
Comparing with (5.21), we see that the PDF can be written as an N(m, v 2 ) PDF with m = 4
and 2v12 = 2, so that v 2 = 41 . Thus, X ∼ N( 14 , 41 ) and its PDF is given by specializing (5.21):
1 1 2 p 2 1
pX (x) = p e−2(x− 4 ) = 2/π e−2x +x− 8
2π/4
We usually do not really care about going back and specifyingp the constant c, since we already
1
know the form of the density. But it is easy to check that c = 2/πe− 8 .
Binomial random variable: We say that a random variable Y has a binomial distribution
with parameters n and p, and denote this by Y ∼ Bin(n, p), if Y takes integer values 0, 1, ..., n,
with probability mass function
n
pk = P [Y = k] = pk (1 − p)n−k , k = 0, 1, ..., n
k
Recall that ”n choose k” (the number of ways in which we can choose k items out of n identical
items, is given by the expression
n n!
=
k k!(n − k)!
with k! = 1 × 2 × ... × k denoting the factorial operation. The binomial distribution can be
thought of a discrete time analogue of the Gaussian distribution; as seen in Figure 5.6, the PMF
has a bell shape. We comment in more detail on this when we discuss the central limit theorem
in Appendix 5.B.
Poisson random variable: X is a Poisson random variable with parameter λ > 0 if it takes
values from the nonnegative integers, with pmf given by
λk −λ
P [X = k] = e , k = 0, 1, 2, ...
k!
As shown later, the parameter λ equals the mean of the Poisson random variable.
207
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0.2
0.18
0.16
0.14
0.12
p(k)
0.1
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10 12 14 16 18 20
k
X 1( )
ω X2( )
X 1( ω )
We are often interested in more than one random variable when modeling a particular scenario
of interest. For example, a model of a received sample in a communication link may involve a
randomly chosen transmitted bit, a random channel gain, and a random noise sample. In general,
we are interested in multiple random variables defined on a “common probability space,” where
the latter phrase means simply that we can, in principle, compute the probability of events
involving all of these random variables. Technically, multiple random variables on a common
probability space are simply different mappings from the sample space Ω to the real line, as
depicted in Figure 5.7. However, in practice, we do not usually worry about the underlying
sample space (which can be very complicated), and simply specify the joint distribution of these
random variables, which provides information sufficient to compute the probabilities of events
involving these random variables.
In the following, suppose that X1 , ..., Xn are random variables defined on a common probability
space; we can also represent them as an n-dimensional random vector X = (X1 , ..., Xn )T .
Joint Cumulative Distribution Function: The joint CDF is defined as
FX (x) = FX1 ,...,Xn (x1 , ..., xn ) = P [X1 ≤ x1 , ..., Xn ≤ xn ]
208
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Joint Probability Density Function: When the joint CDF is continuous, we can define the
joint PDF as follows:
∂ ∂
pX (x) = pX1 ,...,Xn (x1 , ..., xn ) = ... FX ,...,Xn (x1 , ..., xn )
∂x1 ∂xn 1
We can recover the joint CDF from the joint PDF by integrating:
Z x1 Z xn
FX (x) = FX1 ,...,Xn (x1 , ..., xn ) = ... pX1 ,...,Xn (u1 , ..., un )du1 ...dun
−∞ −∞
The joint PDF must be nonnegative and must integrate to one over n-dimensional space. The
probability of a particular subset of n-dimensional space is obtained by integrating the joint PDF
over the subset.
Joint Probability Mass Function (PMF): For discrete random variables, the joint PMF is
defined as
pX (x) = pX1 ,...,Xn (x1 , ..., xn ) = P [X1 = x1 , ..., Xn = xn ]
Marginal distributions: The marginal distribution for a given random variable (or set of
random variables) can be obtained by integrating or summing over all possible values of the
random variables that we are not interested in. For CDFs, this simply corresponds to setting
the appropriate arguments in the joint CDF to infinity. For example,
FX (x) = P [X ≤ x] = P [X ≤ x, Y ≤ ∞] = FX,Y (x, ∞)
For continuous random variables, the marginal PDF is obtained from the joint PDF by “inte-
grating out” the undesired random variable:
Z ∞
pX (x) = pX,Y (x, y)dy , − ∞ < x < ∞
−∞
For discrete random variables, we sum over the possible values of the undesired random variable:
X
pX (x) = pX,Y (x, y) , x ∈ X
y∈Y
where X and Y denote the set of possible values taken by X and Y , respectively.
Example 5.3.1 (Joint and marginal densities): Random variables X and Y have joint
density given by
c xy, 0 ≤ x, y ≤ 1
pX,Y (x, y) = 2c xy, −1 ≤ x, y ≤ 0
0, else
209
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
y
1
1
Density = Integrate density over shaded area
cxy
to find P[X+Y < 1]
−1
x
0 1 0 1
Density =
2cxy
x+y=1
−1
Thus, c = 4/3.
(b) The required probability is obtained by integrating the joint density over the shaded area in
Figure 5.8. We obtain
R 1 R 1−y R0 R0
P [X + Y < 1] = y=0 x=0 cxydxdy + y=−1 x=−1 2cdxdy
R1 2
1−y 0 2 0
2
= c y=0 x2 ydy + 2c x2 y2
0 −1 −1
R1 (1−y)2
= c y=0 y 2 dy + 2c/4 = c/24 + c/2 = 13c/24
= 13/18
We could have computed this probability more quickly in this example by integrating the joint
density over the unshaded area to find P [X + Y ≥ 1], since this area has a simpler shape:
R1 R1 R1 2
1
P [X + Y ≥ 1] = y=0 x=1−y cxydxdy = c y=0 x2 ydy
1−y
R1
= (c/2) y=0 y(2y − y 2 )dy = 5c/24 = 5/18
210
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
for ∆x, ∆y small. For discrete random variables, the conditional pmf is simply the following
conditional probability:
pY |X (y|x) = P [Y = y|X = x] (5.26)
Example 5.3.2 Continuing with Example 5.3.1, let us find the conditional density of Y given
X. For X = x ∈ [0, 1], we have pX,Y (x, y) = c xy, with 0 ≤ y ≤ 1 (the joint density is zero for
other values of y, under this conditioning on X). Applying (5.24), and substituting (5.22), we
obtain
pX,Y (x, y) cxy
pY |X (y|x) = = = 2y , 0 ≤ y ≤ 1 (for 0 ≤ x ≤ 1)
pX (x) cx/2
Similarly, for X = x ∈ [−1, 0], we obtain, using (5.23), that
We can now compute conditional probabilities using the preceding conditional densities. For
example,
Z −0.5 −0.5
P [Y < −0.5|X = −0.5] = (−2y)dy = −y 2 = 3/4
−1 −1
Bayes’ rule for conditional densities: Given the conditional density of Y given X, the
conditional density for X given Y is given by
pY |X (y|x)pX (x) pY |X (y|x)pX (x)
pX|Y (x|y) = = R , continuous random variables
pY (y) pY |X (y|x)pX (x)dx
pY |X (y|x)pX (x) p (y|x)pX (x)
pX|Y (x|y) = = P Y |X , discrete random variables
pY (y) x p Y |X (y|x)pX (x)
We can also mix discrete and continuous random variables in applying Bayes’ rule, as illustrated
in the following example.
Example 5.3.3 (Conditional probability and Bayes’ rule with discrete and continuous
random variables) A bit sent by a transmitter is modeled as a random variable X taking values
0 and 1 with equal probability. The corresponding observation at the receiver is modeled by a
real-valued random variable Y . The conditional distribution of Y given X = 0 is N(0, 4). The
conditional distribution of Y given X = 1 is N(10, 4). This might happen, for example, with
on-off signaling, where we send a signal to send 1, and send nothing when we want to send 0.
The receiver therefore sees signal plus noise if 1 is sent, and sees only noise if 0 is sent, and the
observation Y , presumably obtained by processing the received signal, has zero mean if 0 is sent,
and nonzero mean if 1 is sent.
(a) Write down the conditional densities of Y given X = 0 and X = 1, respectively.
(b) Find P [Y = 7|X = 0], P [Y = 7|X = 1] and P [Y = 7].
(c) Find P [Y ≥ 7|X = 0].
(d) Find P [Y ≥ 7|X = 1].
(e) Find P [X = 0|Y = 7].
Solution to (a): We simply plug in numbers into the expression (5.21) for the Gaussian density
to obtain:
1 2 1 2
p(y|x = 0) = √ e−y /8 , p(y|x = 1)dy = √ e−(y−10) /8
8π 8π
211
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Solution to (b): Conditioned on X = 0, Y is a continuous random variable, so the probability
of taking a particular value is zero. Thus, P [Y = 7|X = 0] = 0. By the same reasoning,
P [Y = 7|X = 1] = 0. The unconditional probability is given by the law of total probability:
We shall see in Section 5.6 how to express such probabilities involving Gaussian densities in com-
pact form using standard functions (which can be evaluated using built-in functions in Matlab),
but for now, we leave the desired probability in terms of the integral given above.
Solution to (d): This is analogous to (c), except that we integrate the conditional probability of
Y given X = 1:
Z ∞ Z ∞
1 2
P [Y ≥ 7|X = 1] = p(y|x = 1)dy = √ e−(y−10) /8 dy
7 7 8π
Solution to (e): Now we want to apply Bayes’ rule for find P [X = 0|Y = 7]. But we know from
(b) that the event {Y = 7} has zero probability. How do we condition on an event that never
happens? The answer is that we define P [X = 0|Y = 7] to be the limit of P [X = 0|Y ∈ (7 −
ǫ, 7 + ǫ)] as ǫ → 0. For any ǫ > 0, the event that we are conditioning on, {Y ∈ (7 − ǫ, 7 + ǫ)}, and
we can show by methods beyond our present scope that one does get a well-defined limit as ǫ
tends to zero. However, we do not need to worry about such technicalities when computing this
conditional probability: we can simply compute it (for an arbitrary value of Y = y) as
pY |X (y|0)P [X = 0] pY |X (y|0)P [X = 0]
P [X = 0|Y = y] = =
pY (y) pY |X (y|0)P [X = 0] + pY |X (y|1)P [X = 1]
Before seeing Y , we knew only that 0 or 1 were sent with equal probability. After seeing Y = 7,
however, our model tells us that 1 was far more likely to have been sent. This is of course what we
want in a reliable communication system: we begin by not knowing the transmitted information
at the receiver (otherwise there would be no point in sending it), but after seeing the received
signal, we can infer it with high probability. We shall see many more such computations in the
next chapter: conditional distributions and probabilities are fundamental to principled receiver
design.
212
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Independent Random Variables: Random variables X1 , ..., Xn are independent if
P [X1 ∈ A1 , ..., Xn ∈ An ] = P [X1 ∈ A1 ]...P [Xn ∈ An ]
for any subsets A1 , ..., An . That is, events defined in terms of values taken by these random vari-
ables are independent of each other. This implies, for example, that the conditional probability
of an event defined in terms of one of these random variables, conditioned on events defined in
terms of the other random variables, equals the unconditional probability:
P [X1 ∈ A1 |X2 ∈ A2 , ..., Xn ∈ An ] = P [X1 ∈ A1 ]
In terms of distributions and densities, independence means that joint distributions are products
of marginal distributions, and joint densities are products of marginal densities.
Joint distribution is product of marginals for independent random variables: If
X1 , ..., Xn are independent, then their joint CDF is a product of the marginal CDFs:
FX1 ,...,Xn (x1 , ..., xn ) = FX1 (x1 )...FXn (xn )
and their joint density (PDF or PMF) is a product of the marginal densities:
pX1 ,...,Xn (x1 , ..., xn ) = pX1 (x1 )...pXn (xn )
Independent and identically distributed (i.i.d.) random variables: We are often inter-
ested in collections of independent random variables in which each random variable has the same
marginal distribution. We call such random variables independent and identically distributed.
Example 5.3.4 (A sum of i.i.d. Bernoulli random variables is a Binomial random variable):
Let X1 , ..., Xn denote i.i.d. Bernoulli random variables with P [X1 = 1] = 1 − P [X1 = 0] = p, and
let Y = X1 +...+Xn denote their sum. We could think of Xi denoting whether the ith coin flip (of
a possibly biased coin, if p 6= 21 ) yield heads, where successive flips have independent outcomes,
so that Y is the number of heads obtained in n flips. For communications applications, Xi could
denote whether the ith bit in a sequence of n bits is incorrectly received, with successive bit
errors modeled as independent, so that Y is the total number of bit errors. The random variable
Y takes discrete values in {0, 1, ..., n}. Its PMF is given by
n
P [Y = k] = pk (1 − p)n−k , k = 0, 1, ..., n
k
That is, Y ∼ Bin(n, p). To see why, note that Y = k requires that exactly k of the {Xi } take
value 1, with the remaining n − k taking value 0. Let us compute the probability of one such
outcome, {X1 = 1, ..., Xk = 1, Xk+1 = 0, ..., Xn = 0}:
P [X1 = 1, ..., Xk = 1, Xk+1 = 0, ..., Xn = 0] = P [X1 = 1]...P [Xk = 1]P [Xk+1 = 0]...P [Xn = 0]
= pk (1 − p)n−k
Clearly, any other outcome with exactly k ones has the same probability, given the i.i.d. nature
of the {Xi }. We can now sum over the probabilities of these mutually exclusive events, noting
that there are exactly “n choose k” such outcomes (the number of ways in which we can choose
the k random variables {Xi } which take the value one) to obtain the desired PMF.
Density of sum of independent random variables: Suppose that X1 and X2 are indepen-
dent continuous random variables, and let Y = X1 + X2 . Then the PDF of Y is a convolution
of the PDFs of X1 and X2 :
Z ∞
pY (y) = (pX1 ∗ pX2 )(y) = pX1 (x1 )pX2 (y − x1 ) dx1
−∞
For discrete random variables, the same result holds, except that the PMF is given by a discrete-
time convolution of the PMFs of X1 and X2 .
213
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
1/2 1/2
* =
0 1 −1 1 −1 0 1 2
Figure 5.9: The sum of two independent uniform random variables has a PDF with trapezoidal
shape, obtained by convolving two boxcar-shaped PDFs.
Example 5.3.5 (Sum of two uniform random variables) Suppose that X1 is uniformly
distributed over [0, 1], and X2 is uniformly distributed over [−1, 1]. Then Y = X1 + X2 takes
values in the interval [−1, 2], and its density is the convolution shown in Figure 5.9.
Of particular interest to us are jointly Gaussian random variables, which we discuss in more
detail in Section 5.6.
Notational simplification: In the preceding definitions, we have distinguished between differ-
ent random variables by using subscripts. For example, the joint density of X and Y is denoted
by pX,Y (x, y), where X, Y denote the random variables, and x, y, are dummy variables that we
might, for example, integrate over when evaluating a probability. We could easily use some other
notation for the dummy variables, e.g., the joint density could be denoted as pX,Y (u, v). After
all, we know that we are talking about the joint density of X and Y because of the subscripts.
However, carrying around the subscripts is cumbersome. Therefore, from now on, when there
is no scope for confusion, we drop the subscripts and use the dummy variables to also denote
the random variables we are talking about. For example, we now use p(x, y) as shorthand for
pX,Y (x, y), choosing the dummy variables to be lower case versions of the random variables they
are associated with. Similarly, we use p(x) to denote the density of X, p(y) to denote the density
of Y , and p(y|x) to denote the conditional density of Y given X. Of course, we revert to the
subscript-based notation whenever there is any possibility of confusion.
Y( )
X( ) g( )
ω
Sample space Ω
Y(ω )= g(X(ω ))
X( ω )
214
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where A(y) = {x : g(x) ≤ y}. We can now use the CDF or density of X to evaluate the extreme
right-hand side. Once we find the CDF of Y , we can find the PMF or PDF as usual.
y = x2
p(x)
x
− y y
Range of X corresponding
to Y <= y
(The CDF and PDF are zero for y < 0, since Y only takes nonnegative values.)
215
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Method 2 (find the PDF directly): For differentiable g(x) and continuous random variables,
we can compute the PDF directly. Suppose that g(x) = y is satisfied for x = x1 , ..., xm . We√
can then express xi as a function of y: xi = hi (y) For g(x) = x2 , this corresponds to x1 = y
√
and x2 = − y. The probability of X lying in a small interval [xi , xi + ∆x] is approximately
pX (xi )∆x, where we take the increment ∆x > 0. For smooth g, this corresponds to Y lying in a
small interval around y, where we need to sum up the probabilities corresponding to all possible
values of x that get us near the desired value of y. We therefore get
m
X
pY (y)|∆y| = pX (xi )∆x
i=1
where we take the magnitude of the Y increment ∆y because a positive increment in x can cause
a positive or negative increment in g(x), depending on the slope at that point. We therefore
obtain
m
X pX (xi )
pY (y) = (5.27)
|dy/dx|
xi =hi (y)
i=1
Example 5.4.2 (application of Method 2) For the setting√of Example 5.4.1, we wish to find
the PDF using Method 2. For y = g(x) = x2 , we have x = ± y (we only consider y ≥ 0, since
the PDF is zero for y < 0), with derivative dy/dx = 2x. We can now apply (5.27) to obtain:
√ √ √
pX ( y) pX (− y) e− y
pY (y) = √ + √ = √ , y≥0
|2 y| | − 2 y| 2 y
as before.
Since Method 1 starts from the definition of CDF, it generalizes to multiple random variables
(i.e., random vectors) in a straightforward manner, at least in principle. For example, suppose
that Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ). Then the joint CDF of Y1 and Y2 is given by
FY1 ,Y2 (y1 , y2 ) = P [Y1 ≤ y1 , Y2 ≤ y2 ] = P [g1 (X1 , X2 ) ≤ y1 , g2 (X1 , X2 ) ≤ y2 ] = P [(X1 , X2 ) ∈ A(y1 , y2)]
where A(y1 , y2 ) = {(x1 , x2 ) : g1 (x1 , x2 ) ≤ y1 , g2 (x1 , x2 ) ≤ y2 }. In principle, we can now use the
joint distribution to compute the preceding probability for each possible value of (y1 , y2 ). In
general, Method 1 works for Y = g(X), where Y is an n-dimensional random vector which is
a function of an m-dimensional random vector X (in the preceding, we considered m = n =
2). However, evaluating probabilities involving m-dimensional random vectors can get pretty
complicated even for m = 2. A generalization of Method 2 is often preferred as a way of directly
obtaining PDFs when the functions involved are smooth enough, and when m = n. We review
this next.
Method 2 for random vectors: Suppose that Y = (Y1 , ..., Yn )T is an n × 1 random vector
which is a function of another n × 1 vector X = (X1 , ..., Xn )T . That is, Y = g(X), or Yk =
gk (X1 , ..., Xn ), k = 1, .., n. As before, suppose that y = g(x) has m solutions, x1 , ..., xm , with the
ith solution written in terms of y as xi = hi (y). The probability of Y lying in an infinitesimal
volume is now given by
Xm
pY (y) |dy| = pX (xi ) |dx|
i=1
216
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
In order to relate the lengths of the vector increments |dy| and |dx|, it no longer suffices to
consider a scalar derivative. We now need the Jacobian matrix of partial derivatives of y = g(x)
with respect to x, defined as:
∂y1 ∂y1
∂x1
· · · ∂xn
J(y; x) = ... .. (5.28)
.
∂yn ∂yn
∂x1
··· ∂xn
where det (M) denotes the determinant of a square matrix M. Thus, if y = g(x) has m solutions,
x1 , ..., xm , with the ith solution written in terms of y as xi = hi (y), then the density at y is
given by
m
X pX (xi )
pY (y) = (5.29)
|det(J(y; x))| xi =hi (y)
i=1
Depending on how the functional relationship between X and Y is specified, it might sometimes
be more convenient to find the Jacobian of x with respect to y:
∂x1 ∂x1
∂y1
· · · ∂y n
We can use this in (5.29) by noting the two Jacobian matrices for a given pair of values (x, y)
are inverses of each other:
J(x; y) = (J(y; x))−1
This implies that their determinants are reciprocals of each other:
1
det(J(x; y)) =
det(J(y; x))
x1 = r cos φ , x2 = r sin φ
217
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
so that !
∂x1 ∂x1
∂r ∂φ cos φ −r sin φ
J(rect; polar) = ∂x2 ∂x2 =
∂r ∂φ
sin φ r cos φ
We see that
det (J(rect; polar)) = r cos2 φ + sin2 φ = r
Noting that the rectangular-polar transformation is one-to-one, we have from (5.31) that
pR,Φ (r, φ) = pX1 ,X2 (x1 , x2 )|det(J(rect; polar))|
x1 =r cos φ,x2 =r sin φ (5.33)
= rpX1 ,X2 (r cos φ, r sin φ) , r ≥ 0, 0 ≤ φ ≤ 2π
and
1
, 0 ≤ φ ≤ 2π
pΦ (φ) =
2π
The amplitude R in this case follows a Rayleigh distribution, while the phase Φ is uniformly
distributed over [0, 2π].
5.5 Expectation
We now discuss computation of statistical averages, which are often the performance measures
based on which a system design is evaluated.
Expectation: The expectation, or statistical average, of a function of a random variable X is
defined as R
E[g(X)] = Pg(x)p(x)dx , continuous random variable
(5.34)
E[g(X)] = g(x)p(x) , discrete random variable
Note that the expectation of a deterministic constant, therefore, is simply the constant itself.
Expectation is a linear operator: We have
218
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Mean: The mean of a random variable X is E[X].
Variance: The variance of a random variable X is a measure of how much it fluctuates around
its mean:
var(X) = E (X − E[X])2
(5.35)
Expanding out the square, we have
Using the linearity of expectation, we can simplify to obtain the following alternative formula
for variance:
var(X) = E[X 2 ] − (E[X])2 (5.36)
The square root of the variance is called the standard deviation.
Effect of Scaling and Translation: For Y = aX + b, it is left as an exercise to show that
X−E[X]
Normalizing to zero mean and unit variance: We can specialize (5.37) to Y = √ , to
var(X)
see that E[Y ] = 0 and var(Y ) = 1.
Example 5.5.1 (PDF after scaling and translation): If X has density pX (x), then Y =
(X − a)/b has density
pY (y) = |b|pX (by + a) (5.38)
This follows from a straightforward application of Method 2 in Section 5.4. Specializing to a
Gaussian random variable X ∼ N(m, v 2 ) with mean m and variance v 2 (we review mean and
variance later), consider a normalized version Y = (X − m)/v. Applying (5.38) to the Gaussian
density, we obtain:
1 y2
pY (y) = √ e− 2
2π
which can be recognized as an N(0, 1) density. Thus, if X ∼ N(m, v 2 ), then Y = X−m
v
∼ N(0, 1)
is a standard Gaussian random variable. This enables us to express probabilities involving
Gaussian random variables compactly in terms of the CDF and CCDF of a standard Gaussian
random variable, as we see later when we deal extensively with Gaussian random variables when
modeling digital communication systems.
Moments: The nth moment of a random variable X is defined as E[X n ]. From (5.36), we see
that specifying the mean and variance is equivalent to specifying the first and second moments.
Indeed, it is worth rewriting (5.36) as an explicit reminder that the second moment is the sum
of the mean and variance:
E[X 2 ] = (E[X])2 + var(X) (5.39)
219
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Similarly, using integration by parts twice, we can show that
2
E[X 2 ] =
λ2
Using (5.36), we obtain
1
var(X) = E[X 2 ] − (E[X])2 = (5.41)
λ2
In general, we can use repeated integration by parts to evaluate higher moments of the exponential
random variable to obtain
Z ∞
n n!
E[X ] = xn λe−λx dx = n , n = 1, 2, 3, ...
0 λ
As a natural follow-up to the computations in the preceding example, let us introduce the gamma
function, which is useful for evaluating integrals associated with expectation computations for
several important random variables.
Gamma function: The Gamma function, Γ(x), is defined as
Z ∞
Γ(x) = tx−1 e−t dt , x > 0
0
Noting that Γ(1) = 1, we can now use induction to specify the Gamma function for integer
arguments.
Γ(n) = (n − 1)! , n = 1, 2, 3, ... (5.43)
This is exactly the same computation as we did in Example 5.5.2: Γ(n) equals the the (n − 1)th
moment of an exponential random variable with λ = 1 (and hence mean λ1 = 1).
The Gamma function can also be computed for non-integer arguments. Just an integer arguments
of the Gamma function are useful for exponential random variables, ”integer-plus-half” arguments
are useful for evaluating the moments of Gaussian random variables. We can evaluate these using
(5.42) given the value of the gamma function at x = 1/2.
Z ∞
1 √
Γ (1/2) = t− 2 e−t dt = π (5.44)
0
Example 5.5.3 (Mean and variance of a Gaussian random variable): We now show that
X ∼ N(m, v 2 ) has mean m and variance v 2 . The mean of X is given by the following expression:
Z ∞
1 (x−m)2
E[X] = x√ e− 2v2 dx
−∞ 2πv 2
220
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Let us first consider the change of variables t = (x − m)/v, so that dx = v dt. Then
Z ∞
1 2
E[X] = (tv + m) √ e−t /2 v dt
−∞ 2πv 2
2
Note that te−t /2 is an odd function, and therefore integrates out to zero over the real line. We
therefore obtain
Z ∞ Z ∞
1 −t2 /2 1 2
E[X] = m√ e v dt = m √ e−t /2 dt = m
2πv 2 2π
−∞ −∞
recognizing that the integral on the extreme right-hand side is the N(0, 1) PDF, which must
integrate to one. The variance is given by
Z ∞
1 (x−m)2
var(X) = E[(X − m) ] =2
(x − m)2 √ e− 2v2 dx
−∞ 2πv 2
With a change of variables t = (x − m)/v as before, we obtain
Z ∞ Z ∞
2 2 1 −t2 /2 2 1 2
var(X) = v t √ e dt = 2v t2 √ e−t /2 dt
−∞ 2π 0 2π
√
since the integrand is an even function of t. Substituting z = t2 /2, so that dz = tdt = 2zdt,
we obtain R∞ R∞
var(X) = 2v 2 0 2z √12π e−z √dz2z = 2v 2 √1π 0 z 1/2 e−z
= 2v 2 √1π Γ(3/2) = v 2
√
since Γ(3/2) = (1/2)Γ(1/2) = π/2.
The change of variables in the computations in the preceding example is actually equivalent
to transforming the N(m, v 2 ) random variable that we started with to a standard Gaussian
N(0, 1) random variable as in Example 5.5.1. As we mentioned earlier (this is important enough
to be worth repeating), when we handle Gaussian random variables more extensively in later
chapters, we prefer making the transformation up front when computing probabilities, rather
than changing variables inside integrals.
As a final example, we show that the mean of a Poisson random variable with parameter λ is
equal to λ.
where we have dropped the k = 0 term from the extreme right hand side, since it does not
contribute to the mean. Noting that k!k = (k−1)!
1
, we have
∞ ∞
X λk −λ −λ
X λk−1
E[X] = e = λe =λ
k=1
(k − 1)! k=1
(k − 1)!
since ∞ ∞
X λk−1 X λl
= = eλ
k=1
(k − 1)! l=0
l!
where we set l = k − 1 to get an easily recognized form for the series expansion of an exponential.
221
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
E[g1 (X1 )...gn (Xn )] = E[g1 (X1 )]...E[gn (Xn )] , X1 , ..., Xn independent (5.46)
E[(X1 + X2 )2 ] = 2 + 13 − 6 = 9
Variance is a measure of how a random variable fluctuates around its means. Covariance, defined
next, is a measure of how the fluctuations of two random variables around their means are
correlated.
Covariance: The covariance of X1 and X2 is defined as
Variance is the covariance of a random variable with itself: It is immediate from the
definition that
var(X) = cov(X, X)
222
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Uncorrelated random variables need not be independent: Consider X ∼ N(0, 1) and
Y = X 2 . We see that that E[XY ] = E[X 3 ] = 0 by the symmetry of the N(0, 1) density around
the origin, so that
cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0
Clearly, X and Y are not independent, since knowing the value of X determines the value of Y .
As we discuss in the next section, uncorrelated jointly Gaussian random variables are indeed
independent. The joint distribution of such random variables is determined by means and co-
variances, hence we also postpone more detailed discussion of covariance computation until our
study of joint Gaussianity.
(x − m)2
1
p(x) = √ exp − , −∞ <x< ∞ (5.49)
2πv 2 2v 2
where m = E[X] is the mean of X, and v 2 = var(X) is the variance of X. The Gaussian density
is therefore completely characterized by its mean and variance.
Notation for Gaussian distribution: We use N(m, v 2 ) to denote a Gaussian distribution
with mean m and variance v 2 , and use the shorthand X ∼ N(m, v 2 ) to denote that a random
variable X follows this distribution.
We have already noted the characteristic bell shape of the Gaussian PDF in the example plotted
in Figure 5.5: the bell is centered around the mean, and its width is determined by the variance.
We now develop a detailed framework for efficient computations involving Gaussian random
variables.
Standard Gaussian random variable: A zero mean, unit variance Gaussian random variable,
X ∼ N(0, 1), is termed a standard Gaussian random variable.
An important property of Gaussian random variables is that they remain Gaussian under scaling
and translation. Suppose that X ∼ N(m, v 2 ). Define Y = aX + b, where a, b are constants
(assume a 6= 0 to avoid triviality). The density of Y can be found as follows:
p(x)
p(y) = dy
| dx |
x=(y−b)/a
dy
Noting that dx
= a, and plugging in (5.49), we obtain
1 √ 1
exp − ((y − b)/a − m)2 /(2v 2 )
p(y) = |a| 2πv2
1
exp − (y − (am + b))2 /(2a2 v 2 )
= √2πa 2 v2
Comparing with (5.49), we can see that Y is also Gaussian, with mean mY = am+b and variance
vY2 = a2 v 2 . This is important enough to summarize and restate.
Gaussianity is preserved under scaling and translation
If X ∼ N(m, v 2 ), then Y = aX + b ∼ N(am + b, a2 v 2 ).
223
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
As a consequence of the preceding result, any Gaussian random variable can be scaled and
translated to obtain a “standard” Gaussian random variable with zero mean and unit variance.
For X ∼ N(m, v 2 ), Y = aX + b ∼ N(0, 1) if am + b = 0 and a2 v 2 = 1 to have a = v, b = −vm.
That is, Y = (X − m)/v ∼ N(0, 1).
Standard Gaussian random variable
A standard Gaussian random variable N(0, 1) has mean zero and variance one.
Conversion of a Gaussian random variable into standard form
If X ∼ N(m, v 2 ), then X−m
v
∼ N(0, 1).
As the following example illustrates, this enables us to express probabilities involving any Gaus-
sian random variable as probabilities involving a standard Gaussian random variable.
√
Example 5.6.1 Suppose that X ∼ N(5, 9). Then (X − 5)/ 9 = (X − 5)/3 ∼ N(0, 1). Any
probability involving X can now be expressed as a probability involving a standard Gaussian
random variable. For example,
P [X > 11] = P [(X − 5)/3 > (11 − 5)/3] = P [N(0, 1) > 2]
We therefore set aside special notation for the cumulative distribution function (CDF) Φ(x) and
complementary cumulative distribution function (CCDF) Q(x) of a standard Gaussian random
variable. By virtue of the standard form conversion, we can now express probabilities involving
any Gaussian random variable in terms of the Φ or Q functions. The definitions of these functions
are illustrated in Figure 5.12, and the corresponding formulas are specified below.
p(u)
x u
Φ(x) Q(x)
Figure 5.12: The Φ and Q functions are obtained by integrating the N(0, 1) density over appro-
priate intervals.
x 2
1 t
Z
Φ(x) = P [N(0, 1) ≤ x] = √ exp − dt (5.50)
−∞ 2π 2
Z ∞ 2
1 t
Q(x) = P [N(0, 1) > x] = √ exp − dt (5.51)
x 2π 2
See Figure 5.13 for a plot of these functions. By definition, Φ(x)+Q(x) = 1. Furthermore, by the
symmetry of the Gaussian density around zero, Q(−x) = Φ(x). Combining these observations,
we note that Q(−x) = 1 − Q(x), so that it suffices to consider only positive arguments for the
Q function in order to compute probabilities of interest.
Let us now consider a few more Gaussian probability computations.
224
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Q(x) Φ(x)
0 x
Example 5.6.2 X is a Gaussian random variable with mean m = −5 and variance v 2 = 4. Find
expressions in terms of the Q function with positive arguments for the following probabilities:
P [X > 3], P [X < −8], P [X < −1], P [3 < X < 6], P [X 2 − 2X > 15].
Solution: We solve this problem by normalizing X to a standard Gaussian random variable
X−m
v
= X+5
2
X +5 3+5
P [X > 3] = P [ > = 4] = Q(4)
2 2
X +5 −8 + 5
P [X < −8] = P [ < = −1.5] = Φ(−1.5) = Q(1.5)
2 2
X +5 −1 + 5
P [X < −1] = P [ < = 2] = Φ(2) = 1 − Q(2)
2 2
P [3 < X < 6] = P [4 = 3+5
2
< X+52
< 6+5
2
= 5.5]
= Φ(5.5) − Φ(4) = ((1 − Q(5.5)) − (1 − Q(4))) = Q(4) − Q(5.5)
Computation of the probability that X 2 − 2X > 15 requires that we express this event in terms
of simpler events by factorization:
X 2 − 2X − 15 = X 2 − 5X + 3X − 15 = (X − 5)(X + 3)
This shows that X 2 − 2X > 15, or X 2 − 2X − 15 > 0, if and only if X − 5 > 0 and X + 3 > 0,
or X − 5 < 0 and X + 3 < 0. The first event simplifies to X > 5, and the second to X < −3, so
that the desired probability is a union of two mutually exclusive events. We therefore have
225
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the observation that the probability of an infinitesimal interval [x, x + ∆x] depends only on its
normalized distance from the mean, x−mv
, and its normalized length ∆x v
:
2 !
1 x−m ∆x
P [x ≤ X ≤ x + ∆x] ≈ p(x) ∆x = √ exp − /2
2π v v
Relating the Q function to the error function: Mathematical software packages such as
Matlab often list the error function and the complementary error function, defined for x ≥ 0 by
Rx 2
erf(x) = √2π 0 e−t dt
R∞ 2
erfc(x) = 1 − erf(x) = √2π x e−t dt
2
Recognizing the form of the N(0, 12 ) density, given by √1 e−t ,
π
we see that
We can invert this to compute the Q function for positive arguments in terms of the complemen-
tary error function, as follows:
1 x
Q(x) = erfc √ , x≥0 (5.52)
2 2
For x < 0, we can compute Q(x) = 1 − Q(−x) using the preceding equation to evaluate the
right-hand side. While the Communications System Toolbox in Matlab has the Q function built
in as qf unc(·), we provide a Matlab code fragment for computing the Q function based on the
complementary error function (available without subscription to separate toolboxes) below.
Example 5.6.3 (Binary on-off keying in Gaussian noise) A received sample Y in a com-
munication system is modeled as follows: Y = m + N if 1 is sent, and Y = N if 0 is sent, where
N ∼ N(0, v 2 ) is the contribution of the receiver noise to the sample, and where |m| is a measure
of the signal strength. Assuming that m > 0, suppose that we use the simple decision rule that
splits the difference between the average values of the observation under the two scenarios: say
that 1 is sent if Y > m/2, and say that 0 is sent if Y ≤ m/2. Assuming that both 0 and 1
are equally likely to be sent, the signal power is (1/2)m2 + (1/2)02 = m2 /2. The noise power is
226
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
m2
E[N 2 ] = v 2 . Thus, SNR = 2v2 .
(a) What is the conditional probability of error, conditioned on 0 being sent.
(b) What is the conditional probability of error, conditioned on 1 being sent.
(c) What is the (unconditional) probability of error if 0 and 1 are equally likely to have been
sent.
(d) What is the error probability for SNR of 13 dB?
Solution:
(a) Since Y ∼ N(0, v 2 ) given that 0 is sent, the conditional probability of error is given by
m/2 − 0 m
Pe|0 = P [say 1|0 sent] = P [Y > m/2|0 sent] = Q =Q
v 2v
(b) Since Y ∼ N(m, v 2 ) given that 1 is sent, the conditional probability of error is given by
0 − m/2 m m
Pe|1 = P [say 0|1 sent] = P [Y ≤ m/2|1 sent] = Φ =Φ − =Q
v 2v 2v
(c) If π0 is the probability of sending 0, then the unconditional error probability is given by
m p
Pe = π0 Pe|0 + (1 − π0 )Pe|1 = Q =Q SNR/2
2v
regardless of π0 for this particular decision rule.
(d) For SNR of 13 dB, we have√SNR(raw) = 10SN R(db)/10 = 101.3 ≈ 20, so that the error
probability evaluates to Pe = Q( 10) = 7.8 × 10−4 .
Figure 5.14 shows the probability of error on a log scale, plotted against the SNR in dB. This
0
10
−1
10
Error Probability
−2
10
−3
10
−4
10
−5
10
−5 0 5 10 15
SNR (dB)
is the first example of the many error probability plots that we will see in this chapter.
A Matlab code fragment (cosmetic touches omitted) for generating Figure 5.14 in Example 5.6.3
is as below.
227
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
%Plot of error probability versus SNR for on-off keying
snrdb = -5:0.1:15; %vector of SNRs (in dB) for which to evaluate error prob
snr = 10.^(snrdb/10); %vector of raw SNRs
pe = qfunction(sqrt(snr/2)); %vector of error probabilities
%plot error prob on log scale versus SNR in dB
semilogy(snrdb,pe);
ylabel(’Error Probability’);
xlabel(’SNR (dB)’);
The preceding example illustrates a more √ general observation for signaling in AWGN: the proba-
bility of error involves terms such as Q( a SNR), where the scale factor a depends on properties
of the signal constellation, and SNR is the signal-to-noise ratio. It is therefore of interest to un-
derstand how the error probability decays with SNR. As shown in Appendix 5.A, there are tight
analytical bounds for the Q function which can be used to deduce that it decays exponentially
with its argument, as stated in the following.
Asymptotics of Q(x) for large arguments: For large x > 0, the exponential decay of the Q
function dominates. We denote this by
. 2
Q(x) = e−x /2 , x→∞ (5.53)
which is shorthand for the following limiting result:
log Q(x)
lim =1 (5.54)
x→∞ −x2 /2
228
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where we use the linearity of the expectation operator to pull out constants.
Uncorrelatedness: X1 and X2 are said to be uncorrelated if cov(X1 , X2 ) = 0.
Independent random variables are uncorrelated: If X1 and X2 are independent, then
cov(X1 , X2 ) = E[X1 X2 ] − E[X1 ]E[X2 ] = E[X1 ]E[X2 ] − E[X1 ]E[X2 ] = 0
The converse is not true in general; that is, uncorrelated random variables need not be inde-
pendent. However, we shall see that jointly Gaussian uncorrelated random variables are indeed
independent.
Variance: Note that the variance of a random variable is its covariance with itself:
var(X) = cov(X, X) = E (X − E[X])2 = E[X 2 ] − (E[X])2
The use of matrices and vectors provides a compact way of representing and manipulating means
and covariances, especially using software programs such as Matlab. Thus, for random variables
X1 , ..., Xm , we define the random vector X = (X1 , ..., Xm )T , and arrange the means and pairwise
covariances in a vector and matrix, respectively, as follows.
Mean vector and covariance matrix: Consider an arbitrary m-dimensional random vector
X = (X1 , ..., Xm )T . The m × 1 mean vector of X is defined as mX = E[X] = (E[X1 ], ..., E[Xm ])T .
The m × m covariance matrix CX has (i, j)th entry given by the covariance between the ith and
jth random variables:
CX (i, j) = cov(Xi , Xj ) = E [(Xi − E[Xi ])(Xj − E[Xj ])] = E [Xi Xj ] − E[Xi ]E[Xj ]
More compactly,
CX = E[(X − E[X])(X − E[X])T ] = E[XXT ] − E[X](E[X])T
By Property 1, it is clear that we can always consider zero mean, or centered, versions of random
variables when computing the covariance. An example that frequently arises in performance
analysis of communication systems is a random variable which is a sum of a deterministic term
(e.g., due to a signal), and a zero mean random term (e.g. due to noise). In this case, dropping
the signal term is often convenient when computing variance or covariance.
Affine transformations: For a random vector X, the analogue of scaling and translating a
random variable is a linear transformation using a matrix, together with a translation. Such a
transformation is called an affine transformation. That is, Y = AX+b is an affine transformation
of X, where A is a deterministic matrix and b a deterministic vector.
229
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 5.6.4 (Mean and variance after an affine transformation): Let Y = X1 −
2X2 + 4, where X1 has mean -1 and variance 4, X2 has mean 2 and variance 9, and the covariance
cov(X1 , X2 ) = −3. Find the mean and variance of Y .
Solution: The mean is given by
where the constant drops out because of Property 1. We therefore obtain that
Computations such as those in the preceding example can be compactly represented in terms
of matrices and vectors, which is particularly useful for computations for random vectors. In
general, an affine transformation maps one random vector into another (of possibly different
dimension), and the mean vector and covariance matrix evolve as follows.
Mean and covariance evolution under affine transformation
If X has mean m and covariance C, and Y = AX + b,
then Y has mean mY = Am + b and covariance CY = ACAT .
To see this, first compute the mean vector of Y using the linearity of the expectation operator:
Note that the dimensions of X and Y can be different: X can be m × 1, A can be n × m, and Y,
b can be n × 1, where m, n are arbitrary. We also note below that mean and covariance evolve
separately under such transformations.
Mean and covariance evolve separately under affine transformations: The mean of Y
depends only on the mean of X, and the covariance of Y depends only on the covariance of X.
Furthermore, the additive constant b in the transformation does not affect the covariance, since
it influences only the mean of Y.
Example 5.6.4 redone: We can check that we get the same result as before by setting
−1 4 −3
mX = , CX = (5.57)
2 −3 9
A = (1 − 2) , b=4
and applying (5.55) and (5.56).
Jointly Gaussian random variables, or Gaussian random vectors: Random variables
X1 , ..., Xm defined on a common probability space are said to be jointly Gaussian, or the m × 1
230
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
random vector X = (X1 , ..., Xm )T is termed a Gaussian random vector, if any linear combination
of these random variables is a Gaussian random variable. That is, for any scalar constants
a1 , ..., am , the random variable a1 X1 + ... + am Xm is Gaussian.
A Gaussian random vector is completely characterized by its mean vector and co-
variance matrix: This is a generalization of the observation that a Gaussian random variable is
completely characterized by its mean and variance. We derive this in Problem 5.47, but provide
an intuitive argument here. The definition of joint Gaussianity only requires us to characterize
the distribution of an arbitrarily chosen linear combination of X1 , ..., Xm . For a Gaussian random
vector X = (X1 , ..., Xm )T , consider Y = a1 X1 + ... + am Xm , where a1 , ..., am can be any scalar
constants. By definition, Y is a Gaussian random variable, and is completely characterized by
its mean and variance. We can compute these in terms of mX and CX using (5.55) and (5.56)
by noting that Y = aT X, where a = (a1 , ..., am )T . Thus,
mY = aT mX
CY = var(Y ) = aT CX a
We have therefore shown that we can characterize the mean and variance, and hence the density,
of an arbitrarily chosen linear combination Y if and only if we know the mean vector mX and
covariance matrix CX . As we see in Problem 5.47, this is the basis for the desired result that
the distribution of Gaussian random vector X is completely characterized by mX and CX .
Notation for joint Gaussianity: We use the notation X ∼ N(m, C) to denote a Gaussian
random vector X with mean vector m and covariance matrix C.
The preceding definitions and observations regarding joint Gaussianity apply even when the
random variables involved do not have a joint density. For example, it is easy to check that,
according to this definition, X1 and X2 = 4X1 −1 are jointly Gaussian. However, the joint density
of X1 and X2 is not well-defined (unless we allow delta functions), since all of the probability
mass in the two-dimensional (x1 , x2 ) plane is collapsed onto the line x2 = 4x1 − 1. Of course,
since X2 is completely determined by X1 , any probability involving X1 , X2 can be expressed in
terms of X1 alone. In general, when the m-dimensional joint density does not exist, probabilities
involving X1 , ..., Xm can be expressed in terms of a smaller number of random variables, and
can be evaluated using a joint density over a lower-dimensional space. A necessary and sufficient
condition for the joint density to exist is that the covariance matrix is invertible.
Joint Gaussian density exists if and only if the covariance matrix is invertible: We
do not prove this result, but discuss it in the context of the two-dimensional density in Example
5.6.5.
Joint Gaussian density: For X = (X1 , ..., Xm ) ∼ N(m, C), if C is invertible, the joint density
exists and takes the following form (we skip the derivation, but see Problem 5.47):
1 1 T −1
p(x1 , ..., xm ) = p(x) = p exp − (x − m) C (x − m) (5.58)
(2π)m |C| 2
Example 5.6.5 (Two-dimensional joint Gaussian density) In order to visualize the joint
Gaussian density (this is not needed for the remainder of the development, hence this example
can be skipped), let us consider two jointly Gaussian random variables X and Y . In this case, it
is convenient to define the normalized correlation between X and Y as
cov(X, Y )
ρ(X, Y ) = p (5.59)
var(X)var(Y )
231
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
5
3
0.1
2
Joint Gaussian Density 0.08
1
0.06
y
0
0.04
−1
0.02
−2
0
5 −3
5
0 −4
0
−5
−5 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5
y x x
2
Figure 5.15: Joint Gaussian density and its contours for σX = 1, σY2 = 4 and ρ = −0.5.
2
Thus, cov(X, Y ) = ρσX σY , where var(X) = σX , var(Y ) = σY2 , and the covariance matrix for the
T
random vector (X, Y ) is given by
2
σX ρσX σY
C= (5.60)
ρσX σY σY2
It is shown in Problem 5.46 that |ρ| ≤ 1. For |ρ| = 1, it is easy to check that the covariance matrix
has determinant zero, hence the joint density formula (5.58) cannot be applied. As shown in
Problem 5.46, this has a simple geometric interpretation: |ρ| = 1 corresponds to a situation when
X and Y are affine functions of each other, so that all of the probability mass is concentrated
on a line, hence a two-dimensional density does not exist. Thus, we need the strict inequality
|ρ| < 1 for the covariance matrix to be invertible. Assuming that |ρ| < 1, we plug (5.60) into
(5.58), setting the mean vector to zero without loss of generality (a nonzero mean vector simply
2
shifts the density). We get the joint density shown in Figure 5.15 for σX = 1, σY2 = 4 and
ρ = −0.5. Since Y has larger variance, the density decays more slowly in Y than in X. The
negative normalized correlation leads to contour plots given by tilted ellipses, corresponding to
setting quadratic function xT C−1 x in the exponent of the density to different constants.
Exercise: Show that the ellipses shown in Figure 5.15(b) can be described as
x2 + ay 2 + bxy = c
specifying the values of a and b.
While we hardly ever integrate the joint Gaussian density to compute probabilities, we use its
form to derive many important results. One such result is stated below.
Uncorrelated jointly Gaussian random variables are independent: This follows from
the form of the joint Gaussian density (5.58). If X1 , ..., Xm are pairwise uncorrelated, then the
off-diagonal entries of the covariance matrix C are zero: C(i, j) = 0 for i 6= j. Thus, C and
C−1 are both diagonal matrices, with diagonal entries given by C(i, i) = vi2 , C−1 (i, i) = v12 ,
i
i = 1, ..., m, and determinant |C| = v12 ...vm
2
. In this case, we see that the joint density (5.58)
decomposes into a product of marginal densities:
(x1 −m1 )2 2
1 − 2 1 (x −m )
− 1 2m
p(x1 , ..., xm ) = p e 2v1
... p e 2vm
= p(x1 )...p(xm )
2πv12 2
2πvm
232
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
so that X1 , ..., Xm are independent.
Recall that, while independent random variables are uncorrelated, the converse need not be true.
However, when we put the additional restriction of joint Gaussianity, uncorrelatedness does imply
independence.
We can now characterize the distribution of affine transformations of jointly Gaussian random
variables. If X is a Gaussian random vector, then Y = AX + b is also Gaussian. To see this,
note that any linear combination of Y1 , ..., Yn equals a linear combination of X1 , ..., Xm (plus a
constant), which is a Gaussian random variable by the Gaussianity of X. Since Y is Gaussian,
its distribution is completely characterized by its mean vector and covariance matrix, which we
have just computed. We can now state the following result.
Joint Gaussianity is preserved under affine transformations
We can now apply (5.61) to obtain the mean vector and covariance matrix for Y:
−4
mY = AmX + b =
−1
T 108 −9
CY = ACX A =
−9 7
Solution to (b): Since Y1 = 3X1 − 2X2 + 3 ∼ N(−4, 108), the required probability can be written
as √ √
8 − (−4)
P [3X1 − 2X2 < 5] = P [Y1 < 8] = Φ √ = Φ 2/ 3 = 1 − Q 2/ 3
108
Solution to (c): Since Z = aX1 + X2 and X1 are jointly Gaussian, they are independent if they
are uncorrelated. The covariance is given by
Discrete time WGN: The noise model N ∼ N(0, σ 2 I) is called discrete time white Gaussian
noise (WGN). The term white refers to the noise samples being uncorrelated and having equal
variance. We will see how such discrete time WGN arises from continuous-time WGN, which we
discuss during our coverage of random processes later in this chapter.
233
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 5.6.7 (Binary on-off keying in discrete time WGN) Let us now revisit on-off
keying, explored for scalar observations in Example 5.6.3, for vector observations. The receiver
processes a vector Y = (Y1 , ..., Yn )T of samples modeled as follows: Y = s + N if 1 is sent, and
Y = N is 0 is sent, where s = (s1 , ..., sn )T is the signal, and the noise N = (N1 , ..., Nn )T ∼
N(0, σ 2 I). That is, the noise samples N1 , ..., Nn are i.i.d. N(0, σ 2 ) random variables. Suppose
we use the following correlator-based decision statistic:
n
X
Z = sT Y = sk Y k
k=1
Thus, we have reduced the vector observation to a single number based on which we will make
our decision. The hypothesis framework developed in Chapter 6 will be used to show that this
decision statistic is optimal, in a well-defined sense. For now, we simply accept it as given.
(a) Find the conditional distribution of Z given that 0 is sent.
(b) Find the conditional distribution of Z given that 1 is sent.
(c) Observe from (a) and (b) that we are now back to the setting of Example 5.6.3, with Z now
m2
playing the role of Y . Specify the values of m and v 2 , and the SNR = 2v 2 , in terms of s and σ .
2
(d) As in Example 5.6.3, consider the simple decision rule that 1 is sent if Z > m/2, and say
that 0 is sent if Z ≤ m/2. Find the error probability (in terms of the Q function) as a function
of s and σ 2 .
(e) Evaluate the error probability for s = (−2, 2, 1)T and σ 2 = 1/4.
Solution:
(a) If 0 is sent, then Y = N =∼ N(0, σ 2 I). Applying (5.61) with m = 0, A = sT , C = σ 2 I, we
obtain Z = sT Y ∼ N(0, σ 2 ||s||2).
(b) If 1 is sent, then Y = s + N ∼ N(s, σ 2 I). Applying (5.61) with m = s, A = sT , C = σ 2 I,
we obtain Z = sT Y ∼ N(||s||2 , σ 2 ||s||2 ). Alternatively, sT Y = sT (s + N) = ||s||2 + sT N. Since
sT N ∼ N(0, σ 2 ||s||2) from (a), we simply translate the mean by ||s||2.
m2 ||s||2
(c) Comparing with Example 5.6.3, we see that m = ||s||2, v 2 = σ 2 ||s||2 , and SNR = 2v 2 = 2σ 2 .
(d) From Example 5.6.3, we know that the decision rule that splits the difference between the
means has error probability
m ||s||
Pe = Pe|0 = Pe|1 = Q =Q
2v 2σ
plugging in the expressions for m and v 2 from (c). (e) We have ||s||2 = 9. Using (d), we obtain
Pe = Q(3) = 0.0013.
Noise is termed colored when it is not white; that is, when the noise samples are correlated and/or
have different variances. We will see later how colored noise arises from linear transformations
on white noise. Let us continue our sequence of examples regarding on-off keying, but now with
colored noise.
Example 5.6.8 (Binary on-off keying in discrete time colored Gaussian noise) As in
the previous example, we have a vector observation Y = (Y1 , ..., Yn )T , with Y = s + N if 1 is
sent, and Y = N is 0 is sent, where s = (s1 , ..., sn )T is the signal. However, we now allow the
noise covariance matrix to be arbitrary: N = (N1 , ..., Nn )T ∼ N(0, CN ).
(a) Consider the decision statistic Z1 = sT Y. Find the conditional distributions of Z1 given 0
sent, and given 1 sent.
(b) Show that Z1 follows the scalar on-off keying model Example 5.6.3, specifying the parameters
m2
m1 and v12 , and SNR1 = 2v12 , in terms of s and CN .
1
(c) Find the error probability of the simple decision rule comparing Z1 to the threshold m1 /2.
234
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
||s||2
m1
Pe1 = Q =Q √ T
2v1 2 s CN s
(e) For the given example, we find Z1 = sT Y = 4Y1 − 2Y2 and Z2 = sT C−1 2
N Y = 3 (7Y1 + Y2 ).
We can see that the relative weights of the two observations are quite different in the two cases.
Numerical computations using the Matlab script below yield SNRs of 6.2 dB and 9.4 dB, and
error probabilities of 0.07 and 0.02 in the two cases, so that Z2 provides better performance than
Z1 . It can be shown, using the methods of Chapter 6, that Z2 is actually the optimal decision
statistic, both in terms of maximizing SNR and minimizing error probability.
A Matlab code fragment for generating the numerical results in Example 5.6.8(e) is given below.
235
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
m2 = s’*inv(Cn)*s; %mean if 1 sent
variance2=s’*inv(Cn)*s; %variance=mean in this case
v2=sqrt(variance2); %standard deviation
SNR2 = m2^2/(2*variance2); %reduces to SNR2= m2/2 in this case
Pe2 = qfunction(m2/(2*v2)); %error prob for "split the difference" rule using Z2
%Compare performance of the two rules
10*log10([SNR1 SNR2]) %SNRs in dB
[Pe1 Pe2] %error probabilities
where fc > 0 is a fixed frequency. The waveform X(t) is not a deterministic signal, since
X1 and X2 can take random values on the real line. Indeed, for each time t, X(t) is a random
variable, since it is a linear combination of two random variables X1 and X2 defined on a common
probability space. Moreover, if we pick a number of times t1 , t2 , ..., then the corresponding
samples X(t1 ), X(t2 ), ... are random variables on a common probability space.
Another interpretation of X(t) is obtained by converting (X1 , X2 ) to polar form:
X1 = A cos Θ , X2 = A sin Θ
For X1 , X2 i.i.d. N(0, 1), we know from Problem 5.21 that A is Rayleigh, Θ is uniform over
[0, 2π], and A, Θ are independent. The random process X(t) can be rewritten as
236
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
random variable. Its distribution is therefore specified by computing its mean and variance, as
follows:
E [X(t)] = E[X1 ] cos 2πfc t − E[X2 ] sin 2πfc t = 0 (5.64)
var (X(t)) = cov (X1 cos 2πfc t − X2 sin 2πfc t, X1 cos 2πfc t − X2 sin 2πfc t)
= cov(X1 , X1 ) cos2 2πfc t + cov(X2 , X2 ) sin2 2πfc t − 2cov(X1 , X2 ) cos 2πfc t sin 2πfc t (5.65)
= cos2 2πfc t + sin2 2πfc t = 1
using cov(Xi , Xi ) = var(Xi ) = 1, i = 1, 2, and cov(X1 , X2 ) = 0 (since X1 , X2 are independent).
Thus, we have X(t) ∼ N(0, 1) for any t.
In this particular example, we can also easily specify the joint distribution of any set of n samples,
X(t1 ), ..., X(tn ), where n can be arbitrarily chosen. The samples are jointly Gaussian, since they
are linear combinations of the jointly Gaussian random variables X1 , X2 . Thus, we only need to
specify their means and pairwise covariances. We have just shown that the means are zero, and
that the diagonal entries of the covariance matrix are one. More generally, the covariance of any
two samples can be computed as follows:
cov (X(ti ), X(tj )) = cov (X1 cos 2πfc ti − X2 sin 2πfc ti , X1 cos 2πfc tj − X2 sin 2πfc tj )
= cov(X1 , X1 ) cos 2πfc ti cos 2πfc tj + cov(X2 , X2 ) sin 2πfc ti sin 2πfc tj
− 2cov(X1 , X2 ) cos 2πfc ti sin 2πfc tj (5.66)
= cos 2πfc ti cos 2πfc tj + sin 2πfc ti sin 2πfc tj
= cos 2πfc (ti − tj )
While we have so far discussed the random process X(t) from a statistical point of view, for
fixed values of X1 and X2 , we see that X(t) is actually a deterministic signal. Specifically, if the
random vector (X1 , X2 ) is defined over a probability space Ω, a particular outcome ω ∈ Ω maps
to a particular realization (X1 (ω), X2(ω)). This in turn maps to a deterministic “realization,” or
“sample path,” of X(t), which we denote as X(t, ω):
To see what these sample paths look like, it is easiest to refer to the polar form (5.63):
Thus, as shown in Figure 5.16, different sample paths have different amplitudes, drawn from a
Rayleigh distribution, along with phase shifts drawn from a uniform distribution.
237
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1.5
0.5
−0.5
−1
−1.5
0 0.5 1 1.5 2 2.5 3 3.5 4
t
Figure 5.16: Two sample paths for a sinusoid with random amplitude and phase.
For the sinusoid with random amplitude and phase, the sample space only needs to be rich
enough to support the two random variables X1 and X2 (or A and Θ), from which we can create
a continuum of random variables X(t, ω), −∞ < t < ∞:
ω → (X1 (ω), X2(ω)) → X(t, ω)
In general, however, the source of randomness can be much richer. Noise in a receiver circuit is
caused by random motion of a large number of charge carriers. A digitally modulated waveform
depends on a sequence of randomly chosen bits. The preceding conceptual framework is general
enough to cover all such scenarios.
Sample paths: We can also interpret a random process as a signal drawn at random from an
ensemble, or collection, of possible signals. The signal we get at a particular random draw is
called a sample path, or realization, of the random process. Once we fix a sample path, it can be
treated like a deterministic signal. Specifically, for each fixed outcome ω ∈ Ω, the sample path
is X(t, ω), which varies only with t. We have already seen examples of samples paths for our
running example in Figure 5.16.
Finite-dimensional distributions: As indicated in Figure 5.17, the samples X(t1 ), ..., X(tn )
from a random process X are mappings from a common sample space to the real line, with
X(ti , ω) denoting the value of the random variable X(ti ) for outcome ω ∈ Ω. The joint distribution
of these random variables depends on the underlying probability measure on the sample space
Ω. We say that we “know” the statistics of a random process if we know the joint statistics of
an arbitrarily chosen finite collection of samples. That is, we know the joint distribution of the
samples X(t1 ), ..., X(tn ), regardless of the number of samples n, and the sampling times t1 , ..., tn .
These joint distributions are called the finite-dimensional distributions of the random process,
with the joint distribution of n samples called an nth order distribution. Thus, while a random
process may be comprised of infinitely many random variables, when we specify its statistics, we
focus on a finite subset of these random variables.
For our running example (5.62), we observed that the samples are jointly Gaussian, and specified
the joint distribution by computing the means and covariances. This is a special case of a broader
class of Gaussian random processes (to be defined shortly) for which it is possible to characterize
finite-dimensional distributions compactly in this fashion. Often, however, it is not possible to
explicitly specify such distributions, but we can still compute useful quantities averaged across
238
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
X (t 1 )
ω
X (t 2 )
X (t 1 , ω )
Sample space Ω X (t 2 , ω )
X (t n )
X (t n , ω )
Figure 5.17: Samples of a random process are random variables defined on a common probability
space.
sample paths.
Ensemble averages: Knowing the finite-dimensional distributions enables us to compute sta-
tistical averages across the collection, or ensemble, of sample paths. Such averages are called
ensemble averages. We will be mainly interested in “second order” statistics (involving expecta-
tions of products of at most two random variables), such as means and covariances. We define
these quantities in sufficient generality that they apply to complex-valued random processes, but
specialize to real-valued random processes in most of our computations.
Note that RX (t, t) = E[|X(t)|2 ] is the instantaneous power at time t. The autocovariance function
of X is the autocorrelation function of the zero mean version of X, and is given by
CX (t1 , t2 ) = E[(X(t1 ) − E[X(t1 )])(X(t2 ) − E[X(t2 )])∗ ] = RX (t1 , t2 ) − mX (t1 )m∗X (t2 ) (5.69)
Second order statistics for running example: We have from (5.64) and (5.66) that
It is interesting to note that the mean function does not depend on t, and that the autocorrelation
and autocovariance functions depend only on the difference of the times t1 − t2 . This implies
that if we shift X(t) by some time delay d, the shifted process X̃(t) = X(t − d) would have the
same mean and autocorrelation functions. Such translation invariance of statistics is interesting
and important enough to merit a formal definition, which we provide next.
239
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Second order statistics for running example (new notation): With this new notation,
we have
mX ≡ 0 , RX (τ ) = CX (τ ) = cos 2πfc τ (5.73)
A WSS random process has shift-invariant second order statistics. An even stronger notion of
shift-invariance is stationarity.
Stationary random process: A random process X(t) is said to be stationary if it is statistically
indistinguishable from a delayed version of itself. That is, X(t) and X(t − d) have the same
statistics for any delay d ∈ (−∞, ∞).
Running example: The sinusoid with random amplitude and phase in our running example is
stationary. To see this, it is convenient to consider the polar form in (5.63): X(t) = A cos(2πfc t+
Θ), where Θ is uniformly distributed over [0, 2π]. Note that
Y (t) = X(t − d) = A cos(2πfc (t − d) + Θ) = A cos(2πfc t + Θ′ )
where Θ′ = Θ − 2πfc d modulo 2π is uniformly distributed over [0, 2π]. Thus, X and Y are
statistically indistinguishable.
Stationarity implies wide sense stationarity: For a stationary random process X, the mean
function satisfies
mX (t) = mX (t − d)
for any t, regardless of the value of d. Choosing d = t, we infer that
mX (t) = mX (0) (5.74)
That is, the mean function is a constant. Similarly, the autocorrelation function satisfies
RX (t1 , t2 ) = RX (t1 − d, t2 − d)
for any t1 , t2 , regardless of the value of d. Setting d = t2 , we have that
RX (t1 , t2 ) = RX (t1 − t2 , 0) (5.75)
Thus, a stationary process is also WSS.
While our running example was easy to analyze, in general, stationarity is a stringent requirement
that is not easy to verify. For our needs, the weaker concept of wide sense stationarity typically
suffices. Further, we are often interested in Gaussian random processes (defined shortly), for
which wide sense stationarity actually implies stationarity.
240
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
H(f)
∆f
1
x(t) Power Meter S x( ν) ∆ f
For deterministic finite-energy signals, we introduced the concept of energy spectral density,
which specifies how the energy in a signal is distributed in different frequency bands, in Chapter
2. Similarly, we defined power spectral density (PSD) for finite-power deterministic signals in
Chapter 4, “just in time” to characterize the spectral occupancy of digital communication signals.
This deterministic framework directly applies to a given sample path of a random process, and
indeed, this is what we did when we computed the PSD of linearly modulated signals in Chapter
4. While we did not mention the term “random process” then (for the good reason that we had
not introduced it yet), if we model the information encoded into a digitally modulated signal as
random, then the latter is indeed a random process. Let us now begin by restating the definition
of PSD in Chapter 4.
Power Spectral Density: The power spectral density (PSD), Sx (f ), for a finite-power signal
x(t), which we can now think of as a sample path of a random process, is defined through the
conceptual measurement depicted in Figure 5.18. Pass x(t) through an ideal narrowband filter
with transfer function
1, ν − ∆f < f < ν + ∆f
Hν (f ) = 2 2
0, else
The PSD evaluated at ν, Sx (ν), is defined as the measured power at the filter output, divided
by the filter width ∆f (in the limit as ∆f → 0).
The power meter in Figure 5.18 is averaging over time to estimate the power in a frequency slice
of a particular sample path. Let us review how this is done before discussing how to average
across sample paths to define PSD in terms of an ensemble average.
Periodogram-based PSD estimation: The PSD can be estimated by computing Fourier
transform over a finite observation interval, and dividing its magnitude squared (which is the
energy spectral density) by the length of the observation interval. The time-windowed version of
x is defined as
xTo (t) = x(t)I[− To , To ] (t) (5.76)
2 2
where To is the length of the observation interval. The Fourier transform of xTo (t) is denoted as
XTo (f ) = F (xTo )
The energy spectral density of xTo is therefore |XTo (f )|2, and the PSD estimate is given by
|XTo (f )|2
Ŝx (f ) = (5.77)
To
PSD for a sample path: Formally, we define the PSD for a sample path in the limit of large
time windows as follows:
|XTo (f )|2
Sx (f ) = lim PSD for sample path (5.78)
To →∞ To
241
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The preceding definition involves time averaging across a sample path, and can be related to the
time-averaged autocorrelation function, defined as follows.
Time-averaged autocorrelation function for a sample path: For a sample path x(t), we
define the time-averaged autocorrelation function as
To
1
Z
2
Rx (τ ) = x(t)x∗ (t − τ ) = lim x(t)x∗ (t − τ ) dt
To →∞ To − T2o
242
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
This result is called the Wiener-Khintchine theorem, and can be proved under mild conditions
on the autocorrelation function (the area under |RX (τ )| must be finite and its Fourier transform
must exist). The proof requires advanced probability concepts beyond our scope here, and is
omitted.
Ensemble-averaged PSD for running example: For our running example, the PSD is
obtained by taking the Fourier transform of (5.73):
1 1
SX (f ) = δ(f − fc ) + δ(f + fc ) (5.84)
2 2
That is, the power in X is concentrated at ±fc , as we would expect for a sinusoidal signal at
frequency fc .
Power: It follows from the Wiener-Khintchine theorem that the power of X can be obtained
either by integrating the PSD or evaluating the autocorrelation function at τ = 0:
Z ∞
2
PX = E |X(t)| = RX (0) = SX (f )df (5.85)
−∞
A2 A2
Sx (f ) = δ(f − fc ) + δ(f + fc ) (5.86)
2 2
Comparing with (5.84), we see that the time-averaged PSD varies across sample paths due to
amplitude variations, with A2 replaced by its expectation in the ensemble-averaged PSD.
Intuitively speaking, ergodicity requires sufficient richness of variation across time and sample
paths. While this is not present in our simple running example (a randomly chosen amplitude
which is fixed across the entire sample path is the culprit), it is often present in the more
complicated random processes of interest to us, including receiver noise and digitally modulated
signals (under appropriate conditions on the transmitted symbol sequences). When ergodicity
holds, we have our choice of using either time averaging or ensemble averaging for computations,
depending on which is most convenient or insightful.
The autocorrelation function and PSD must satisfy the following structural properties (these
apply to ensemble averages for WSS processes, as well as to time averages, although our notation
corresponds to ensemble averages).
Structural properties of PSD and autocorrelation function
(P1) SX (f ) ≥ 0 for all f .
This follows from the sample path based definition in Figure 5.18, since the output of the power
meter is always nonnegative. Averaging across sample paths preserves this property.
243
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
∗
(P2a) The autocorrelation function is conjugate symmetric: RX (τ ) = RX (−τ ).
This follows quite easily from the definition (5.71). By setting t = u + τ , we have
RX (τ ) = E[X(u + τ )X ∗ (u)] = (E[X(u)X ∗ (u + τ )])∗ = RX
∗
(−τ )
(P2b) For real-valued X, both the autocorrelation function and PSD are symmetric and real-
valued. SX (f ) = SX (−f ) and RX (τ ) = RX (−τ ).
(This is left as an exercise.)
Any function g(τ ) ↔ G(f ) must satisfy these properties in order to be a valid autocorrelation
function/PSD.
Example 5.7.1 (Which function is an autocorrelation?) For each of the following func-
tions, determine whether it is a valid autocorrelation function.
(a) g1 (τ ) = sin(τ ), (b) g2 (τ ) = I[−1,1] (τ ), (c) g3 (τ ) = e−|τ |
Solution
(a) This is not a valid autocorrelation function, since it is not symmetric and violates property
(P2b).
(b) This satisfies Property (P2b). However, I[−1,1] (τ ) ↔ 2sinc(2f ), so that Property (P1) is
violated, since the sinc function can take negative values. Hence, the boxcar function cannot be
a valid autocorrelation function. This example shows that non-negativity Property P1 places a
stronger constraint on the validity of a proposed function as an autocorrelation function than
the symmetry Property P2.
(c) The function g3 (τ ) is symmetric and satisfies Property (P2b). It is left as an exercise to check
that G3 (f ) ≥ 0, hence Property (P1) is also satisfied.
Units for PSD: Power per unit frequency has the same units as power multiplied by time, or
energy. Thus, the PSD is expressed in units of Watts/Hertz, or Joules.
H(f)
∆f ∆f
1
x(t) Power Meter S+x( ν ) ∆ f
real−valued = Sx (−ν )∆ f + Sx ( ν ) ∆ f
−ν ν
One-sided PSD: The PSD that we have talked about so far is the two-sided PSD, which spans
both positive and negative frequencies. For a real-valued X, we can restrict attention to positive
frequencies alone in defining the PSD, by virtue of property (P2b). This yields the one-sided
+
PSD SX (f ), defined as
+
SX (f ) = SX (f ) + SX (−f ) = 2SX (f ) , f ≥ 0, (X(t) real) (5.87)
It is useful to interpret this in terms of the sample path based operational definition shown in
Figure 5.19. The signal is passed through a physically realizable filter (i.e., with real-valued
impulse response) of bandwidth ∆f , centered around ν. The filter transfer function must be
conjugate symmetric, hence
∆f ∆f
1, ν − 2 < f < ν + 2
Hν (f ) = 1, −ν − ∆f 2
< f < −ν + ∆f
2
0, else
244
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The one-sided PSD is defined as the limit of the power of the filter output, divided by ∆f , as
∆f → 0. Comparing Figures 5.18 and 5.19, we have that the sample path based one-sided PSD
is simply twice the two-sided PSD: Sx+ (f ) = (Sx (f ) + Sx (−f )) I{f ≥0} = 2Sx (f )I{f ≥0} .
One-sided PSD for running example: From (5.84), we obtain that
+
SX (f ) = δ(f − fc ) (5.88)
with all the power concentrated at fc , as expected.
Power in terms of PSD: We can express the power of a real-valued random process in terms
of either the one-sided or two-sided PSD:
Z ∞ Z ∞
2 +
E[X (t)] = RX (0) = SX (f )df = (for X real) SX (f )df (5.89)
−∞ 0
Baseband and passband random processes: A random process X is baseband if its PSD
is baseband, and is passband if its PSD is passband. Thinking in terms of time averaged PSDs,
which are based on the Fourier transform of time windowed sample paths, we see that a random
process is baseband if its sample paths, time windowed over a large enough observation interval,
are (approximately) baseband. Similarly, a random process is passband if its sample paths,
time windowed over a large enough observation interval, are (approximately) passband. The
caveat of “large enough observation interval” is inserted because of the following consideration:
timelimited signals cannot be strictly bandlimited, but as long as the observation interval is large
enough, the time windowing (which corresponds to convolving the spectrum with a sinc function)
does not spread out the spectrum of the signal significantly. Thus, the PSD (which is obtained
taking the limit of large observation intervals) also defines the frequency occupancy of the sample
paths over large enough observation intervals. Note that these intuitions, while based on time
averaged PSDs, also apply when bandwidth occupancy is defined in terms of ensemble-averaged
PSDs, as long as the random process is ergodic in PSD.
Su p (f)
Sm (f)
C 2W
C/4
f
~
~
~
~
W −f0 f0 f
−W
Figure 5.20: The relation between the PSDs of a message and the corresponding DSB-SC signal.
245
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Thus, we start with the formula (5.90) relating the Fourier transform for a given sample path,
which is identical to what we had in Chapter 2 (except that we now need to time limit the finite
power message to obtain a finite energy signal), and obtain the relation (5.91) relating the PSDs.
An example is shown in Figure 5.20. We can now integrate the PSDs to get
1 Pm
Pu = (Pm + Pm ) =
4 2
It therefore also follows that a Gaussian random process is completely specified by its mean and
autocorrelation functions.
WSS Gaussian random processes are stationary: We know that a stationary random
process is WSS. The converse is not true in general, but Gaussian WSS processes are indeed
stationary. This is because the statistics of a Gaussian random process are characterized by its
first and second order statistics, and if these are shift-invariant (as they are for WSS processes),
the random process is statistically indistinguishable under a time shift.
Example 5.7.2 Suppose that Y is a Gaussian random process with mean function mY (t) = 3t
and autocorrelation function RY (t1 , t2 ) = 4e−|t1 −t2 | + 9t1 t2 .
(a) Find the probability that Y (2) is bigger than 10.
(b) Specify the joint distribution of Y (2) and Y (3).
(c) True or False Y is stationary.
(d) True or False The random process Z(t) = Y (t) − 3t is stationary.
Solution: (a) Since Y is a Gaussian random process, the sample Y (2) is a Gaussian random
246
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
variable with mean mY (2) = 6 and variance CY (2, 2) = RY (2, 2) −(mY (2))2 = 4. More generally,
note that the autocovariance function of Y is given by
CY (t1 , t2 ) = RY (t1 , t2 ) − mY (t1 )mY (t2 ) = 4e−|t1 −t2 | + 9t1 t2 − (3t1 )(3t2 ) = 4e−|t1 −t2 |
so that var(Y (t)) = CY (t, t) = 4 for any sampling time t.
We have shown that Y (2) ∼ N(6, 4), so that
10 − 6
P [Y (2) > 10] = Q √ = Q(2)
4
(b) Since Y is a Gaussian random process, Y (2) and Y (3) are jointly Gaussian, with distribution
specified by the mean vector and covariance matrix given by
mY (2) 6
m= =
mY (3) 9
4e−1
CY (2, 2) CY (2, 3) 4
C= =
CY (3, 2) CY (3, 3) 4e−1 4
(c) Y has time-varying mean, and hence is not WSS. This implies it is not stationary. The
statement is therefore False.
(d) Z(t) = Y (t) − 3t = Y (t) − mY (t) is zero mean version of Y . It inherits the Gaussianity of
Y . The mean function mZ (t) ≡ 0 and the autocorrelation function, given by
RZ (t1 , t2 ) = E [(Y (t1 ) − mY (t1 )) (Y (t2 ) − mY (t2 ))] = CY (t1 , t2 ) = 4e−|t1 −t2 |
depends on the time difference t1 − t2 alone. Thus, Z is WSS. Since it also Gaussian, this implies
that Z is stationary. The statement is therefore True.
Snp (f) +
Snp (f) B
B N0
B
N0 /2
~
~
~
~
~
~
f f
−fc fc fc
Figure 5.21: The PSD of passband white noise is flat over the band of interest.
We now have the background required to discuss mathematical modeling of noise in communi-
cation systems. A generic model for receiver noise is that it is a random process with zero DC
value, and with PSD which is flat, or white, over a band of interest. The key noise mechanisms
in a communication receiver, thermal and shot noise, are both white, as discussed in Appendix
5.C. For example, Figure 5.21 shows the two-sided PSD of passband white noise np (t), which is
given by
N0 /2 , |f − fc | ≤ B/2
Snp (f ) = N0 /2 , |f + fc | ≤ B/2
0, else
247
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Since np (t) is real-valued, we can also define the one-sided PSD as follows:
+ N0 , |f − fc | ≤ B/2
Snp (f ) =
0, else
That is, white noise has two-sided PSD N20 , and one-sided PSD N0 , over the band of interest.
The power of the white noise is given by
Z ∞
2
Pnp = np = Snp (f )df = (N0 /2)2B = N0 B
−∞
f f
−B B B
Similarly, Figure 5.22 shows the one-sided and two-sided PSDs for real-valued white noise in a
physical baseband system with bandwidth B. The power of this baseband white noise is again
N0 B. As we discuss in Section 5.D, as with deterministic passband signals, passband random
processes can also be represented in terms of I and Q components. We note in Section 5.D that
the I and Q components of passband white noise are baseband white noise processes, and that
the corresponding complex envelope is complex-valued white noise.
Noise Figure: The value of N0 summarizes the net effects of white noise arising from various
devices in the receiver. Comparing the noise power N0 B with the nominal figure of kT B for
thermal noise of a resistor with matched impedance, we define the noise figure as
N0
F =
kTroom
where k = 1.38 × 10−23 Joules/Kelvin is Boltzmann’s constant, and the nominal “room temper-
ature ” is taken by convention to be Troom = 290 Kelvin (the product kTroom ≈ 4 × 10−21 Joules,
so that the numbers work out well for this slightly chilly choice of room temperature at 62.6◦
Fahrenheit). Noise figure is usually expressed in dB.
The noise power for a bandwidth B is given by
dBW and dBm: It is customary to express power on the decibel (dB) scale:
248
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
On the dB scale, the noise power over 1 Hz is therefore given by
Example 5.8.1 (Noise power computation) A 5 GHz Wireless Local Area Network (WLAN)
link has a receiver bandwidth B of 20 MHz. If the receiver has a noise figure of 6 dB, what is
the receiver noise power Pn ?
Solution: The noise power
The noise power is often expressed in dBm, which is obtained by converting the raw number in
milliWatts (mW) into dB. We therefore get
Let us now redo this computation in the “dB domain,” where the contributions to the noise
power due to the various system parameters simply add up. Using (5.93), the noise power in our
system can be calculated as follows:
S n (f)
p
B B
N0 /2
simplify
~
~
~
~
−f0 f0 f
SWGN (f)
PSD of white Gaussian noise in a passband system of bandwidth B
N0 /2
... ...
S n (f)
b
f
N0 /2
simplify Infinite power WGN
f
−B B
PSD of white Gaussian noise in a baseband system of bandwidth B
Figure 5.23: Since receiver processing always involves some form of band limitation, it is not
necessary to impose band limitation on the WGN model.
We now add two more features to our noise model that greatly simplify computations. First, we
assume that the noise is a Gaussian random process. The physical basis for this is that noise
249
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
arises due to the random motion of a large number of charge carriers, which leads to Gaussian
statistics based on the central limit theorem (see Section 5.B). The mathematical consequence
of Gaussianity is that we can compute probabilities based only on knowledge of second order
statistics. Second, we remove band limitation, implicitly assuming that it will be imposed later
by filtering at the receiver. That is, we model noise n(t) (where n can be real-valued passband
or baseband white noise) as a zero mean WSS random process with PSD flat over the entire
real line, Sn (f ) ≡ N20 . The corresponding autocorrelation function is Rn (τ ) = N20 δ(τ ). This
model is clearly physically unrealizable, since the noise power is infinite. However, since receiver
processing in bandlimited systems always involves filtering, we can assume that the receiver noise
prior to filtering is not bandlimited and still get the right answer. Figure 5.23 shows the steps
we use to go from receiver noise in bandlimited systems to infinite-power White Gaussian Noise
(WGN), which we formally define below.
White Gaussian Noise: Real-valued WGN n(t) is a zero mean, WSS, Gaussian random process
with Sn (f ) ≡ N0 /2 = σ 2 . Equivalently, Rn (τ ) = N20 δ(τ ) = σ 2 δ(τ ). The quantity N0 /2 = σ 2
is often termed the two-sided PSD of WGN, since we must integrate over both positive and
negative frequencies in order to compute power using this PSD. The quantity N0 is therefore
referred to as the one-sided PSD, and has the dimension of Watts/Hertz, or Joules.
The following example provides a preview of typical computations for signaling in WGN, and
illustrates why the model is so convenient.
Example 5.8.2 (On-off keying in continuous time): A receiver in an on-off keyed system
receives the signal y(t) = s(t) + n(t) if 1 is sent, and receives y(t) = n(t) if 0 is sent, where n(t)
is WGN with PSD σ 2 = N20 . The receiver computes the following decision statistic:
Z
Y = y(t)s(t)dt
(We shall soon show that this is actually the best thing to do.)
(a) Find the conditional distribution of Y if 0 is sent.
(b) Find the conditional distribution of Y if 1 is sent.
(c) Compare with the on-off keying model in Example 5.6.3.
Solution: R
(a) Conditioned on 0 being sent, y(t) = n(t) and hence Y = n(t)s(t)dt. Since n is Gaussian,
and Y is obtained from it by linear processing, Y is a Gaussian random variable (conditioned on
0 being sent). Thus, the conditional distribution of Y is completely characterized by its mean
and variance, which we now compute.
Z Z
E[Y ] = E Y = n(t)s(t)dt = s(t)E[n(t)]dt = 0
where we can interchange expectation and integration because both are linear operations. Actu-
ally, there are some mathematical conditions (beyond our scope here) that need to be satisfied for
such “natural” interchanges to be permitted, but these conditions are met for all the examples
that we consider in this text. Since the mean is zero, the variance is given by
Z Z
2
var(Y ) = E[Y ] = E n(t)s(t)dt n(u)s(u)du
Notice that we have written out Y 2 = Y × Y as the product of two identical integrals, but
with the “dummy” variables of integration chosen to be different. This is because we need to
consider all possible cross terms that could result from multiplying the integral with itself. We
250
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
now interchange expectation and integration again, noting that all random quantities must be
grouped inside the expectation. This gives us
Z Z
var(Y ) = E[n(t)n(u)] s(t)s(u) dt du (5.96)
Now this is where the WGN model makes our life simple. The autocorrelation function
E[n(t)n(u)] = σ 2 δ(t − u)
Plugging into (5.96), the delta function collapses the two integrals into one, and we obtain
Z Z Z
2 2
var(Y ) = σ δ(t − u) s(t)s(u) dt du = σ s2 (t) dt = σ 2 ||s||2
We already know that the second term on the extreme right hand side has distribution N(0, σ 2 ||s||2 ).
The distribution remains Gaussian when we add a constant to it, with the mean being translated
by this constant. We therefore conclude that Y ∼ N(||s||2 , σ 2 ||s||2), conditioned on 1 being sent.
(c) The decision statistic Y obeys exactly the same model as in Example 5.6.3, with m = ||s||2
and v 2 = σ 2 ||s||2. Applying the intuitive decision rule in that example, we guess that 1 is sent if
Y > ||s||2 /2, and that 0 is sent otherwise. The probability of error for that decision rule equals
||s||2
m ||s||
Pe|0 = Pe|1 = Pe = Q =Q =Q
2v 2σ||s|| 2σ
Remark: The preceding example illustrates that, for linear processing of a received signal
corrupted by WGN, the signal term contributes to the mean, and the noise term to the variance, of
the resulting decision statistic. The resulting Gaussian distribution is a conditional distribution,
because it is conditioned on which signal is actually sent (or, for on-off keying, whether a signal
is sent).
Complex baseband WGN: Based on the definition of complex envelope that we have used
so far (in Chapters 2 through 4), the complex envelope has twice the energy/power of the
corresponding passband signal (which may be a sample path of a passband random process). In
order to get a unified description of WGN, however, let us now divide the complex envelope of
both signal and noise by √12 . This cannot change the performance of the system, but leads to
the complex envelope now having the same energy/power as the corresponding passband signal.
Effectively, we are switching from defining the complex envelope via up (t) = Re u(t)ej2πfc t , to
√
defining it via up (t) = Re 2u(t)ej2πfc t . This convention reduces the PSDs of the I and Q
component by a factor of two: we now model them as independent real WGN processes, with
Snc (f ) = Sns (f ) ≡ N0 /2 = σ 2 . The steps in establishing this model are shown in Figure 5.24.
We now have the noise modeling background needed for Chapter 6, where we develop a framework
for optimal reception, based on design criteria such as the error probability. The next section
discusses linear processing of random processes, which is useful background for our modeling
the effect of filtering on noise, as well as for computing quantities such as signal-to-noise ratio
(SNR). It can be skipped by readers anxious to get to Chapter 6, since the latter includes a
self-contained exposition of the effects of the relevant receiver operations on WGN.
251
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
S n (f)
p
B B
N0 /2
~
~
~
~
−f0 f0 f
downconvert to I, Q
f f f
−B/2 B/2 −B/2 B/2
S n c ,n s (f)=0
S n c ,n s (f)=0 S n c ,n s (f)=0
Figure 5.24: We scale the complex envelope for both signal and noise by √12 , so that the I and Q
components of passband WGN can be modeled as independent WGN processes with PSD N0 /2.
5.9.1 Filtering
Suppose that a random process x(t) is passed through a filter, or an LTI system, with transfer
function G(f ) and impulse response g(t), as shown in Figure 5.25.
The PSD of the output y(t) is related to that of the input as follows:
252
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
LTI System
x(t) Transfer function G(f) y(t)
Impulse response g(t)
This follows immediately from the operational definition of PSD in Figure 5.18, since the power
gain due to the filter at frequency f is |G(f )|2. Now,
where gM F (t) = g ∗(−t). Thus, taking the inverse Fourier transform on both sides of (5.97), we
obtain the following relation between the input and output autocorrelation functions:
Let us now derive analogous results for ensemble averages for filtered WSS processes.
Suppose that a WSS random process X is passed through an LTI system with impulse response
g(t) (which we allow to be complex-valued) to obtain an output Y (t) = (X ∗ g)(t). We wish to
characterize the joint second order statistics of X and Y .
Defining the crosscorrelation function of Y and X as
RY X (t + τ, t) = E[Y (t + τ )X ∗ (t)]
we have
Z Z
∗
RY X (t + τ, t) = E X(t + τ − u)g(u)du X (t) = RX (τ − u)g(u)du (5.99)
interchanging expectation and integration. Thus, RY X (t + τ, t) depends only on the time differ-
ence τ . We therefore denote it by RY X (τ ). From (5.99, we see that
RY X (τ ) = (RX ∗ g)(τ )
253
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Finally, we note that the mean function of Y is a constant given by
Z
mY = mX ∗ g = mX g(u)du = mX G(0)
Thus, X and Y are jointly WSS: X is WSS, Y is WSS, and their crosscorrelation function depends
on the time difference. The formulas for the second order statistics, including the corresponding
power spectral densities obtained by taking Fourier transforms, are collected below:
Example 5.9.1 (white noise through an LTI system–general formulas) White noise
with PSD Sn (f ) ≡ N20 is passed through an LTI system with impulse response g(t). We wish to
find the PSD, autocorrelation function, and power of the output y(t) = (n ∗ g)(t). The PSD is
given by
N0
Sy (f ) = Sn (f )|G(f )|2 = |G(f )|2 (5.102)
2
We can compute the autocorrelation function directly or take the inverse Fourier transform of
the PSD to obtain
N0 N0 ∞
Z
Ry (τ ) = (Rn ∗ g ∗ gmf )(τ ) = (g ∗ gmf )(τ ) = g(s)g ∗(s − τ )ds (5.103)
2 2 −∞
The output power is given by
Z ∞ Z ∞ Z ∞
N0 2 N0 N0
y2 = Sy (f )df = |G(f )| df = |g(t)|2dt = ||g||2 (5.104)
−∞ 2 −∞ 2 −∞ 2
where the time domain expression follows from Parseval’s identity, or from setting τ = 0 in
(5.103). Thus, the output noise power equals the noise PSD times the energy of the filter impulse
response. It is worth noting that the PSD of y is the same as what we would have gotten if the
input were bandlimited white noise, as long as the band is large enough to encompass frequencies
where G(f ) is nonzero. Even if G(f ) is not strictly bandlimited, we get approximately the right
answer if the input noise bandwidth is large enough so that most of the energy in G(f ) falls
within it.
When the input random process is Gaussian as well as WSS, the output is also WSS and Gaus-
sian, and the preceding computations of second order statistics provide a complete statistical
characterization of the output process. This is illustrated by the following example, in which
WGN is passed through a filter.
Example 5.9.2 (WGN through a boxcar impulse response) Suppose that WGN n(t)
with PSD σ 2 = N20 = 14 is passed through an LTI system with impulse response g(t) = I[0,2] (t) to
obtain the output y(t) = (n ∗ g)(t).
(a) Find the autocorrelation function and PSD of y.
254
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(b) Find E[y 2 (100)].
(c) True or False y is a stationary random process.
(d) True or False: y(100) and y(102) are independent random variables.
(e) True or False: y(100) and y(101) are independent random variables.
(f) Compute the probability P [y(100) − y(101) + y(102) > 5] .
(g) Which of the preceding results rely on the Gaussianity of n?
Solution
(a) Since n is WSS, so is y. The filter matched to g is a boxcar as well: gmf
(t) = I[−2,0] (t). Their
|τ |
convolution is a triangular pulse centered at the origin: (g ∗ gmf )(τ ) = 2 1 − 2
I[−2,2] (τ ). We
therefore have
N0 1 |τ |
Ry (τ ) = (g ∗ gmf )(τ ) = 1− I[−2,2] (τ ) = Cy (τ )
2 2 2
(since y is zero mean). The PSD is given by
N0
Sy (f ) = |G(f )|2 = sinc2 (2f )
2
since |G(f )| = |2sinc2f |. Note that these results do not rely on Gaussianity.
(b) The power E[y 2 (100)] = Ry (0) = 21 .
(c) The output y is a Gaussian random process, since it obtained by a linear transformation of
the Gaussian random process n. Since y is WSS and Gaussian, it is stationary. True.
(d) The random variables y(100) and y(102) are jointly Gaussian with zero mean and covariance
cov(y(100), y(102)) = Cy (2) = Ry (2) = 0. Since they are jointly Gaussian and uncorrelated,
they are independent. True.
(e) In this case, cov(y(100), y(101)) = Cy (1) = Ry (1) = 14 6= 0, so that y(100) and y(101) are not
independent. False.
(f) The random variable Z = y(100) − 2y(101) + 3y(102) is zero mean and Gaussian, with
var(Z) = cov (y(100) − 2y(101) + 3y(102), y(100) − 2y(101) + 3y(102))
= cov (y(100), y(100)) + 4cov (y(101), y(101)) + 9cov (y(102), y(102))
− 4cov (y(100), y(101)) + 6cov (y(100), y(102)) − 12cov (y(100), y(101))
= Cy (0) + 4Cy (0) + 9Cy (0) − 4Cy (1) + 6Cy (2) − 12Cy (1)
= 14Cy (0) − 16Cy (1) + 6Cy (2) = 3
substituting Cy (0) = 21 , Cy (1) = 41 , Cy (2) = 0. Thus, Z ∼ N(0, 3), and the required probability
can be evaluated as
5−0
P [Z > 5] = Q √ = 0.0019
3
(g) We invoke Gaussianity in (c), (d), and (f).
5.9.2 Correlation
As we shall see in Chapter 6, a typical operation in a digital communication receiver is to correlate
a noisy received waveform against one or more noiseless templates. Specifically, the correlation
of y(t) (e.g., a received signal) against g(t) (eg., a noiseless template at the receiver) is defined
as the inner product between y and g, given by
Z ∞
hy, gi = y(t)g ∗(t)dt (5.105)
−∞
(We restrict attention to real-valued signals in example computations provided here, but the
preceding notation is general enough to include complex-valued signals.)
255
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Signal-to-Noise Ratio and its Maximization
If y(t) is a random process, we can compute the mean and variance of hy, gi given the second
order statistics (i.e., mean function and autocorrelation function) of y, as shown in Problem 5.50.
However, let us consider here a special case of particular interest in the study of communication
systems:
y(t) = s(t) + n(t)
where we now restrict attention to real-valued signals for simplicity, with s(t) denoting a deter-
ministic signal (e.g., corresponding to a specific choice of transmitted symbols) and n(t) zero
mean white noise with PSD Sn (f ) ≡ N20 . The output of correlating y against g is given by
Z ∞ Z ∞
Z = hy, gi = hs, gi + hn, gi = s(t)g(t)dt + n(t)g(t)dt
−∞ −∞
Since both the signal and noise terms scale up by identical factors if we scale up g, a performance
metric of interest is the ratio of the signal power to the noise power at the output of the correlator,
defined as follows
|hs, gi|2
SNR =
E[|hn, gi|2]
How should we choose g in order to maximize SNR? In order to answer this, we need to compute
the noise power in the denominator. We can rewrite it as
Z Z
2
E|hn, gi| ] = E n(t)g(t)dt n(s)g(s)ds
where we need to use two different dummy variables of integration to make sure we capture all
the cross terms in the two integrals. Now, we take the expectation inside the integrals, grouping
all random together inside the expectation:
Z Z Z Z
2
E|hn, gi| ] = E[n(t)n(s)]g(t)g(s)dtds = Rn (t − s)g(t)g(s)dtds
This is where the infinite power white noise model becomes useful: plugging in Rn (t − s) =
N0
2
δ(t − s), we find that the two integrals collapse into one, and obtain that
N0 N0 N0
Z Z Z
2
E|hn, gi| ] = δ(t − s)g(t)g(s)dtds = |g(t)|2dt = ||g||2 (5.106)
2 2 2
Thus, the SNR can be rewritten as
|hs, gi|2 2 g 2
SNR = N0
= |hs, i|
2
||g|| 2 N0 ||g||
Drawing on the analogy between signals and vectors, note that g/||g|| is the “unit vector” pointing
along g. We wish to choose g such that the size of the projection of the signal s along this unit
vector is maximized. Clearly, this is accomplished by choosing the unit vector along the direction
of s. (A formal proof using the Cauchy-Schwarz inequality is provided in Problem 5.49.) That
is, we must choose g to be a scalar multiple of s (any scalar multiple will do, since SNR is a
scale-invariant quantity). In general, for complex-valued signals in complex-valued white noise
(useful for modeling in complex baseband), it can be show sthat g must be a scalar multiple
of s∗ (t). When we plug this in, the maximum SNR we obtain is 2||s||2/N0 . These results are
important enough to state formally, and we do this below.
256
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Theorem 5.9.1 For linear processing of a signal s(t) corrupted by white noise, the output SNR
is maximized by correlating against s(t). The resulting SNR is given by
2||s||2
SNRmax = (5.107)
N0
The expression (5.106) for the noise power at the output of a correlator is analogous to the
expression (5.104) (Example 5.9.1) for the power of white noise through a filter. This is no
coincidence. Any correlation operation can be implemented using a filter and sampler, as we
discuss next.
Matched Filter
Correlation with a waveform g(t) can be achieved using a filter h(t) = g(−t) and sampling at
time t = 0. To see this, note that
Z ∞ Z ∞
z(0) = (y ∗ h)(0) = y(τ )h(−τ )dτ = y(τ )g(τ )dτ
−∞ −∞
Comparing with the correlator output (5.105), we see that Z = z(0). Now, applying Theorem
5.9.1, we see that the SNR is maximized by choosing the filter impulse response as s∗ (−t). As
we know, this is called the matched filter for s, and we denote its impulse response as sM F (t) =
s∗ (−t). We can now restate Theorem 5.9.1 as follows.
Theorem 5.9.2 For linear processing of a signal s(t) corrupted by white noise, the output SNR
is maximized by employing a matched filter with impulse response sM F (t) = s∗ (−t), sampled at
time t = 0.
s(t) s mf (t)
t t t
0 T −T 0 −T 0 T
Figure 5.26: A signal passed through its matched filter gives a peak at time t = 0. When the
signal is delayed by t0 , the peak occurs at t = t0 .
The statistics of the noise contribution to the matched filter output do not depend on the
sampling time (WSS noise into an LTI system yields a WSS random process), hence the optimum
sampling time is determined by the peak of the signal contribution to the matched filter output.
The signal contribution to the output of the matched filter at time t is given by
Z Z
z(t) = s(τ )sM F (t − τ )dτ = s(τ )s∗ (τ − t)dτ
This is simply the correlation of the signal with itself at delay t. Thus, the matched filter enables
us to implement an infinite bank of correlators, each corresponding to a version of our signal
template at a different delay. Figure 5.26 shows a rectangular pulse passed through its matched
filter. For received signal y(t) = s(t) + n(t), we have observed that the optimum sampling time
257
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(i.e. the correlator choice maximizing SNR) is t = 0. More generally, when the received signal is
given by y(t) = s(t − t0 ) + n(t), the peak of the signal contribution to the matched filter shifts
to t = t0 , which now becomes the optimum sampling time.
While the preceding computations rely only on second order statistics, once we invoke the Gaus-
sianity of the noise, as we do in Chapter 6, we will be able to compute probabilities (a preview
of such computations is provided by Examples 5.8.2 and 5.9.2(f)). This will enable us to develop
a framework for receiver design for minimizing the probability of error.
258
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
of the noise PSD over the band of interest. In complex baseband, noise is modeled as I and Q
components which are independent real-valued WGN processes.
• A WSS random process X through an LTI system with impulse response g(t) yields a WSS
random process Y . X and Y are also jointly WSS. We have SY (f ) = SX (f )|G(f )|2 ↔ RY (τ ) =
(RX ∗ g ∗ gmf )(τ ).
• The statistics of WGN after linear operations such as correlation and filtering are easy to
compute because of its impulsive autocorrelation function.
• When the received signal equals signal plus WGN, the SNR is maximized by matched filtering
against the signal.
5.11 Endnotes
There are a number of textbooks on probability and random processes for engineers that can
be used to supplement the brief communications-centric exposition here, including Yates and
Goodman [25], Woods and Stark [26], Leon-Garcia [27], and Papoulis and Pillai [28].
A more detailed treatment of the noise analysis for analog modulation provided in Appendix 5.E
can be found in a number of communication theory texts, with Ziemer and Tranter [4] providing
a sound exposition.
As a historical note, thermal noise, which plays such a crucial role in communications systems
design, was first experimentally characterized in 1928 by Johnson [29]. Johnson discussed his
results with Nyquist, who quickly came up with a theoretical characterization [30]. See [31] for a
modern re-derivation of Nyquist’s formula, and [32] for a discussion of noise in transistors. These
papers and the references therein are good resources for further exploration into the physical basis
for noise, which we can only hint at here in Appendix 5.C. Of course, as discussed in Section
5.8, from a communication systems designer’s point of view, it typically suffices to abstract away
from such physical considerations, using the noise figure as a single number summarizing the
effect of receiver circuit noise.
5.12 Problems
Problem 5.2 A student who studies for an exam has a 90% chance of passing. A student who
does not study for the exam has a 90% chance of failing. Suppose that 70% of the students
studied for the exam.
(a) What is the probability that a student fails the exam?
(b) What is the probability that a student who fails studied for the exam?
(c) What is the probability that a student who fails did not study for the exam?
(d) Would you expect the probabilities in (b) and (c) to add up to one?
259
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 5.3 A receiver decision statistic Y in a communication system is modeled as expo-
nential with mean 1 if 0 is sent, and as exponential with mean 10 if 1 is sent. Assume that we
send 0 with probability 0.6.
(a) Find the conditional probability that Y > 5, given that 0 is sent.
(b) Find the conditional probability that Y > 5, given that 1 is sent.
(c) Find the unconditional probability that Y > 5.
(d) Given that Y > 5, what is the probability that 0 is sent?
(e) Given that Y = 5, what is the probability that 0 is sent?
Problem 5.4 Channel codes are constructed by introducing redundancy in a structured fashion.
A canonical means of doing this is by introducing parity checks. In this problem, we see how
one can make inferences based on three bits b1 , b2 , b3 which satisfy a parity check equation:
b1 ⊕ b2 ⊕ b3 = 0. Here ⊕ denotes an exclusive or (XOR) operation.
(a) Suppose that we know that P [b1 = 0] = 0.8 and P [b2 = 1] = 0.9, and model b1 and b2 as
independent. Find the probability P [b3 = 0].
(b) Define the log likelihood ratio (LLRs) for a bit b as LLR(b) = log PP [b=1]
[b=0]
. Setting Li =
LLR(bi ), i = 1, 2, 3, find an expression for L3 in terms of L1 and L2 , again modeling b1 and b2
as independent.
Problem 5.5 A bit X ∈ {0, 1} is repeatedly transmitted using n independent uses of a binary
symmetric channel (i.e., the binary channel in Figure 5.2 with a = b) with crossover probability
a = 0.1. The receiver uses a majority rule to make a decision on the transmitted bit. Derive
general expressions as a function of n (assume that n is odd, so there are no ties in the majority
rule), and substitute n = 5 for numerical results and plots.
(a) Let Z denote the number of ones at the channel output. (Z takes values 0, 1, ..., n.) Specify
the probability mass function of Z, conditioned on X = 0.
(b) Conditioned on X = 0, what is the probability of deciding that one was sent (i.e., what is
the probability of making an error)?
(c) Find the posterior probabilities P [X = 0|Z = m], m = 0, 1, ..., n, assuming that 0 or 1 are
equally likely to be sent. Do a stem plot against m.
(d) Repeat (c) assuming that the 0 is sent with probability 0.9.
(e) As an alternative visualization, plot the LLR log PP [X=0|Z=m]
[X=1|Z=m]
versus m for (c) and (d).
Received
Transmitted 1−p−q−r +3
0 r
q +1
p
p q
r −1
1
1−p−q−r −3
Figure 5.27: Two-input four-output channel for Problem 5.6.
Problem 5.6 Consider the two-input, four-output channel with transition probabilities shown
in Figure 5.27. In your numerical computations, take p = 0.05, q = 0.1, r = 0.3. Denote the
260
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
channel input by X and the channel output by Y .
(a) Assume that 0 and 1 are equally likely to be sent. Find the conditional probability of 0
being sent, given each possible value of the output. That is, compute P [X = 0|Y = y] for each
y ∈ {−3, −1, +1, +3}.
(b) Express the results in (a) as log likelihood ratios (LLRs). That is, compute L(y) = log PP [X=0|Y =y]
[X=1|Y =y]
for each y ∈ {−3, −1, +1, +3}.
(c) Assume that a bit X, chosen equiprobably from {0, 1}, is sent repeatedly, using three indepen-
dent uses of the channel. The channel outputs can be represented as a vector Y = (Y1 , Y2 , Y3 )T .
For channel outputs y = (+1, +3, −1)T , find the conditional probabilities P [Y = y|X = 0] and
P [Y = y|X = 1].
(d) Use Bayes’ rule and the result of (c) to find the posterior probability P [X = 0|Y = y] for
y = (+1, +3, −1)T . Also compute the corresponding LLR L(y) = log PP [X=1|Y=y]
[X=0|Y=y]
.
(e) Would you decide 0 or 1 was sent when you see the channel output y = (+1, +3, −1)T ?
Random variables
Problem 5.7 Let X denote an exponential random variable with mean 10.
(a) What is the probability that X is bigger than 20?
(b) What is the probability that X is smaller than 5?
(c) Suppose that we know that X is bigger than 10. What is the conditional probability that it
is bigger than 20?
(d) Find E[e−X ].
(e) Find E[X 3 ].
Problem 5.8 Let U1 , ..., Un denote i.i.d. random variables with CDF FU (u). (a) Let X =
max (U1 , ..., Un ). Show that
P [X ≤ x] = FUn (x)
(b) Let Y = min (U1 , ..., Un ). Show that
P [Y ≤ y] = 1 − (1 − FU (y))n
(c) Suppose that U1 , ...Un are uniform over [0, 1]. Plot the CDF of X for n = 1, n = 5 and
n = 10, and comment on any trends that you notice.
(d) Repeat (c) for the CDF of Y .
Problem 5.9 True or False The minimum of two independent exponential random variables
is exponential.
True or False: The maximum of two independent exponential random variables is exponential.
Problem 5.10 Let U and V denote independent and identically distributed random variables,
uniformly distributed over [0, 1].
(a) Find and sketch the CDF of X = min(U, V ).
Hint: It might be useful to consider the complementary CDF.
(b) Find and sketch the CDF of Y = V /U. Make sure you specify the range of values taken by
Y.
Hint: It is helpful to draw pictures in the (u, v) plane when evaluating the probabilities of interest.
Problem 5.11 (Relation between Gaussian and exponential) Suppose that X1 and X2
are i.i.d. N(0, 1).
261
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(a) Show that Z = X12 + X22 is exponential with mean 2.
(b) True or False: Z is independent of Θ = tan−1 X 2
.
X1 √
Hint: Use the results from Example 5.4.3, which tells us the joint distribution of Z and Θ.
Problem 5.12 (The role of the uniform random variable in simulations) Let U denote
a uniform random variable which is uniformly distributed over [[0, 1]. (a) Let F (x) denote an
arbitrary CDF (assume for simplicity that it is continuous). Defining X = F −1 (U), show that
X has CDF F (x).
Remark: This gives us a way of generating random variables with arbitrary distributions, assum-
ing that we have a random number generator for uniform random variables. The method works
even if X is a discrete or mixed random variable, as long as F −1 is defined appropriately.
(b) Find a function g such that Y = g(U) is exponential with mean 2, where U is uniform over
[0, 1].
(c) Use the result in (b) and Matlab’s rand() function to generate an i.i.d. sequence of 1000
exponential random variables with mean 2. Plot the histogram and verify that it has the right
shape.
Problem 5.13 (Generating Gaussian random variables) Suppose that U1 , U2 are i.i.d.
and uniform over [0, 1].
−2 ln U1 and Θ = 2πU2 ?
(a) What is the joint√distribution of Z = √
(b) Show that X1 = Z cos Θ and X2 = Z sin Θ are i.i.d. N(0, 1) random variables.
Hint: Use Example 5.4.3 and Problem 5.11.
(c) Use the result of (b) to generate 2000 i.i.d. N(0, 1) random variables from 2000 i.i.d. ran-
dom variables uniformly distributed over [0, 1], using Matlab’s rand() function. Check that the
histogram has the right shape.
(d) Use simulations to estimate E[X 2 ], where X ∼ N(0, 1), and compare with the analytical
result.
(e) Use simulations to estimate P [X 3 + X > 3], where X ∼ N(0, 1).
Problem 5.14 (Generating discrete random variables) Let U1 , ..., Un denote i.i.d. random
variables uniformly distributed over [0, 1] (e.g., generated by the rand() function in Matlab).
Define, for i = 1, ..., n,
1, Ui > 0.7
Yi =
0, Ui ≤ 0.7
(a) Sketch the CDF of Y1 .
(b) Find (analytically) and plot the PMF of Z = Y1 + ... + Yn , for n = 20.
(c) Use simulation to estimate and plot the histogram of Z, and compare against the PMF in
(b).
(d) Estimate E[Z] by simulation and compare against the analytical result.
(e) Estimate E[Z 3 ] by simulation.
(a) Find K.
(b) Show that X and Y are each Gaussian random variables.
262
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(c) Express the probability P [X 2 + X > 2] in terms of the Q function.
(d) Are X and Y jointly Gaussian?
(e) Are X and Y independent?
(f) Are X and Y uncorrelated?
(g) Find the conditional density pX|Y (x|y). Is it Gaussian?
Problem 5.16 (computations involving joint Gaussianity) The random vector X = (X1 X2 )T
is Gaussian with mean vector m = (2, 1)T and covariance matrix C given by
1 −1
C=
−1 4
(a) Let Y1 = X1 + 2X2 , Y2 = −X1 + X2 . Find cov(Y1 , Y2 ).
(b) Write down the joint density of Y1 and Y2 .
(c) Express the probability P [Y1 > 2Y2 + 1] in terms of the Q function.
Problem 5.17 (computations involving joint Gaussianity) The random vector X = (X1 X2 )T
is Gaussian with mean vector m = (−3, 2)T and covariance matrix C given by
4 −2
C=
−2 9
(a) Let Y1 = 2X1 − X2 , Y2 = −X1 + 3X2 . Find cov(Y1 , Y2 ).
(b) Write down the joint density of Y1 and Y2 .
(c) Express the probability P [Y2 > 2Y1 − 1] in terms of the Q function with positive arguments.
(d) Express the probability P [Y12 > 3Y1 + 10] in terms of the Q function with positive arguments.
Problem 5.18 (plotting the joint Gaussian density) For jointly Gaussian random variables
X and Y , plot the density and its contours as in Figure 5.15 for the following parameters:
2
(a) σX = 1, σY2 = 1, ρ = 0.
(b) σX = 1, σY2 = 1, ρ = 0.5.
2
2
(c) σX = 4, σY2 = 1, ρ = 0.5.
(d) Comment on the differences between the plots in the three cases.
Problem 5.19 (computations involving joint Gaussianity) In each of the three cases in
Problem 5.18,
(a) specify the distribution of X − 2Y ;
(b) determine whether X − 2Y is independent of X?
Problem 5.20 (computations involving joint Gaussianity) X and Y are jointly Gaussian,
each with variance one, and with normalized correlation − 34 . The mean of X equals one, and
the mean of Y equals two.
(a) Write down the covariance matrix.
(b) What is the distribution of Z = 2X + 3Y ?
(c) Express the probability P [Z 2 − Z > 6] in terms of Q function with positive arguments, and
then evaluate it numerically.
Problem 5.21 (From Gaussian to Rayleigh, Rician, and Exponential Random Vari-
ables) Let X1 , X2 be iid Gaussian random variables, each with mean zero and variance v 2 .
Define (R, Φ) as the polar representation of the point (X1 , X2 ), i.e.,
X1 = R cos Φ, X2 = R sin Φ
263
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where R ≥ 0 and Φ ∈ [0, 2π].
(a) Find the joint density of R and Φ.
(b) Observe from (a) that R, Φ are independent. Show that Φ is uniformly distributed in [0, 2π],
and find the marginal density of R.
(c) Find the marginal density of R2 .
(d) What is the probability that R2 is at least 20 dB below its mean value? Does your answer
depend on the value of v 2 ?
Remark: The random variable R is said to have a Rayleigh distribution. Further, you should
recognize that R2 has an exponential distribution.
Random Processes
Problem 5.22 Let X(t) = 2 sin (20πt + Θ), where Θ takes values with equal probability in the
set {0, π/2, π, 3π/2}.
(a) Find the ensemble-averaged mean function and autocorrelation function of X.
(b) Is X WSS?
(c) Is X stationary?
(d) Find the time-averaged mean and autocorrelation function of X. Do these depend on the
realization of Θ?
(e) Is X ergodic in mean and autocorrelation?
Problem 5.23 For each of the following functions, sketch it and state whether it can be a valid
autocorrelation function. Give reasons for your answers.
(a) f1 (τ ) = (1 − |τ |) I[−1,1] (τ ).
(b) f2 (τ ) = f1 (τ − 1).
(c) f3 (τ ) = f1 (τ ) − 21 (f1 (τ − 1) + f1 (τ + 1)).
Problem 5.24 Consider the random process Xp (t) = Xc (t) cos 2πfc t − Xs (t) sin 2πfc t, where
Xc , Xs are random processes defined on a common probability space.
(a) Find conditions on Xc and Xs such that Xp is WSS.
(b) Specify the (ensemble averaged) autocorrelation function and PSD of Xp under the conditions
in (a).
(c) Assuming that the conditions in (a) hold, what are the additional conditions for Xp to be a
passband random process?
P∞ n
Problem 5.26 Consider again the square wave x(t) = n=−∞ (−1) p(t − n), where p(t) =
I[−1/2,1/2] (t). Define the random process X(t) = x(t − D), where D is a random variable which
is uniformly distributed over the interval [0, 1].
(a) Find the ensemble averaged autocorrelation function of X.
(b) Is X WSS?
(c) Is X stationary?
(d) Is X ergodic in mean and autocorrelation function?
264
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 5.27 Let n(t) denote a zero mean baseband random process with PSD Sn (f ) =
I[−1,1] (f ). Find and sketch the PSD of the following random processes.
(a) x1 (t) = dndt
(t).
n(t)−n(t−d)
(b) x2 (t) = d
, for d = 21 .
(c) Find the powers of x1 and x2 .
Problem 5.28 Consider a WSS random process with autocorrelation function RX (τ ) = e−a|τ | ,
where a > 0.
(a) Find the output power when X is passed through an ideal LPF of bandwidth W .
(b) Find the 99% power containment bandwidth of X. How does it scale with the parameter a?
Channel Equalizer
2 2
Estimated
Message 1
1
Message
f
f 1 2
1 2
Noise
Problem 5.29 Consider the baseband communication systemdepicted in Figure 5.28, where the
|f |
message is modeled as a random process with PSD Sm (f ) = 2 1 − 2 I[−2,2] (f ). Receiver noise
is modeled as bandlimited white noise with two-sided PSD Sn (f ) = 14 I[−3,3] (f ). The equalizer
removes the signal distortion due to the channel.
(a) Find the signal power at the channel input.
(b) Find the signal power at the channel output.
(c) Find the SNR at the equalizer input.
(d) Find the SNR at the equalizer output.
Problem 5.30 A zero mean WSS random process X has power spectral density SX (f ) = (1 −
|f |)I[−1,1](f ).
(a) Find E[X(100)X(100.5], leaving your answer in as explicit a form as you can.
(b) Find the output power when X is passed through a filter with impulse response h(t) = sinct.
Problem 5.31 A signal s(t) in a communication system is modeled as a zero mean random
process with PSD Ss (f ) = (1 − |f |)I[−1,1] (f ). The received signal is given by y(t) = s(t) + n(t),
where n is WGN with PSD Sn (f ) ≡ 0.001. The received signal is passed through an ideal lowpass
filter with transfer function H(f ) = I[−B,B] (f ).
(a)Find the SNR (ratio of signal power to noise power) at the filter input.
(b) Is the SNR at the filter output better for B = 1 or B = 21 ? Give a quantitative justification
for your answer.
Problem 5.32 White noise n with PSD N20 is passed through an RC filter with impulse response
h(t) = e−t/T0 I[0,∞) (t), where T0 is the RC time constant, to obtain the output y = n ∗ h.
(a) Find the autocorrelation function, PSD and power of y.
(b) Assuming now that the noise is a Gaussian random process, find a value of t0 such that
y(t0 ) − 21 y(0) is independent of y(0), or say why such a t0 cannot be found.
265
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 5.33 Find the noise power at the output of the filter for the following two scenarios:
(a) Baseband white noise with (two-sided) PSD N20 is passed through a filter with impulse
response h(t) = sinc2 t.
(b) Passband white noise with (two-sided) PSD N20 is passed through a filter with impulse response
h(t) = sinc2 t cos 100πt.
Problem 5.34 Suppose that WGN n(t) with PSD σ 2 = N20 = 1 is passed through a filter with
impulse response h(t) = I[−1,1] (t) to obtain the output y(t) = (n ∗ h)(t).
(a) Find and sketch the output power spectral density Sy (f ), carefully labeling the axes.
(b) Specify the joint distribution of the three consecutive samples y(1), y(2), y(3).
(c) Find the probability that y(1) − 2y(2) + y(3) exceeds 10.
Problem 5.35 (computations involving deterministic signal plus WGN) Consider the
noisy received signal
y(t) = s(t) + n(t)
where s(t) = I[0,3] (t) and n(t) is WGN with PSD σ 2 = N0 /2 = 1/4. The receiver computes the
following statistics:
Z 2 Z 3
Y1 = y(t)dt , Y2 = y(t)dt
0 1
(a) Specify the joint distribution of Y1 and Y2 .
(b) Compute the probability P [Y1 +Y2 < 2], expressing it in terms of the Q function with positive
arguments.
Problem 5.36 (filtered WGN) Let n(t) denote WGN with PSD Sn (f ) ≡ σ 2 . We pass n(t)
through a filter with impulse response h(t) = I[0,1] (t) − I[1,2] (t) to obtain z(t) = (n ∗ h)(t).
(a) Find and sketch the autocorrelation function of z(t).
(b) Specify the joint distribution of z(49) and z(50).
(c) Specify the joint distribution of z(49) and z(52).
(d) Evaluate the probability P [2z(50) > z(49) + z(51)]. Assume σ 2 = 1.
(e) Evaluate the probability P [2z(50) > z(49) + z(51) + 2]. Assume σ 2 = 1.
Problem 5.37 (filtered WGN) Let n(t) denote WGN with PSD Sn (f ) ≡ σ 2 . We pass n(t)
through a filter with impulse response h(t) = 2I[0,2] (t) − I[1,2] (t) to obtain z(t) = (n ∗ h)(t).
(a) Find and sketch the autocorrelation function of z(t).
(b) Specify the joint distribution of z(0), z(1), z(2).
(c) Compute the probability P [z(0) − z(1) + z(2) > 4] (assume σ 2 = 1).
Problem 5.38 (filtered and sampled WGN) Let n(t) denote WGN with PSD Sn (f ) ≡ σ 2 .
We pass n(t) through a filter with impulse response h(t) to obtain z(t) = (n ∗ h)(t), and then
sample it at rate 1/Ts to obtain the sequence z[n] = z(nTs ), where n takes integer values.
(a) Show that
N0
Z
∗
cov(z[n], z[m]) = E[z[n]z [m]] = h(t)h∗ (t − (n − m)Ts )
2
(We are interested in real-valued impulse responses, but we continue to develop a framework
general enough to encompass complex-valued responses.)
(b) For h(t) = I[0,1] (t), specify the joint distribution of (z[1], z[2], z[3])T for a sampling rate of 2
(Ts = 21 ).
(c) Repeat (b) for a sampling rate of 1.
(d) For a general h sampled at rate 1/Ts , show that the noise samples are independent if h(t) is
square root Nyquist at rate 1/Ts .
266
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 5.39 Consider the signal s(t) = I[0,2] (t) − 2I[1,3] (t).
(a) Find and sketch the impulse response smf (t) of the matched filter for s.
(b) Find and sketch the output when s(t) is passed through its matched filter.
(c) Suppose that, instead of the matched filter, all we have available is a filter with impulse
response h(t) = I[0,1] (t). For an arbitrary input signal x(t), show how z(t) = (x ∗ smf )(t) can be
synthesized from y(t) = (x ∗ h)(t).
Problem 5.40 (Correlation via filtering and sampling) A signal x(t) is passed through a
filter with impulse response h(t) = I[0,2] (t) to obtain an output y(t) = (x ∗ h)(t).
(a) Find and sketch a signal g1 (t) such that
Z
y(2) = hx, g1 i = x(t)g1 (t)dt
Problem 5.41 (Correlation via filtering and sampling) Let us generalize the result we
were hinting at in Problem 5.40. Suppose an arbitrary signal x is passed through an arbitrary
filter h(t) to obtain output y(t) = (x ∗ h)(t).
(a) Show that taking a linear combination of samples at the filter output is equivalent to a
correlation operation on u. That is, show that
n
X Z
αi y(ti ) = hx, gi = x(t)g(t)dt
i=1
where n n
X X
g(t) = αi h(ti − t) = αi hmf (t − ti ) (5.108)
i=1 i=1
That is, taking a linear combination of samples is equivalent to correlating against a signal which
is a linear combination of shifted versions of the matched filter for h.
(b) The preceding result can be applied to approximate a correlation operation by taking linear
combinations at the output of a filter. Suppose that we wish to perform a correlation against a
triangular pulse g(t) = (1 − |t|)I[−1,1] (t). How would you approximate this operation by taking a
linear combination of samples at the output of a filter with impulse response h(t) = I[0,1] (t).
(b) Can you improve the SNR by modifying the integration in (a), while keeping the processing
linear? If so, say how. If not, say why not.
(c) Now, suppose that y(t) is passed through a filter with impulse response h(t) = I[0,1] (t) to
267
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
obtain z(t) = (y ∗ h)(t). If you were to sample the filter output at a single time t = t0 , how
would you choose t0 so as to maximize the SNR?
(d) In the setting of (c), if you were now allowed to take two samples at times t1 , t2 and t3 and
generate a linear combination a1 z(t1 ) + a2 z(t2 ) + a3 z(t3 ), how would you choose {ai }, {ti }, to
improve the SNR relative to (c). (We are looking for intuitively sensible answers rather than a
provably optimal choice.)
Hint: See Problem 5.41. Taking linear combinations of samples at the output of a filter is
equivalent to correlation with an appropriate waveform, which we can choose to approximate the
optimal correlator.
Mathematical derivations
Problem 5.43 (Bounds on the Q function) We derive the bounds (5.117) and (5.116) for
Z ∞
1 2
Q(x) = √ e−t /2 dt (5.109)
x 2π
(a) Show that, for x ≥ 0, the following upper bound holds:
1 2
Q(x) ≤ e−x /2
2
2
Hint: Try pulling out a factor of e−x /2 from (5.109), and then bounding the resulting integrand.
Observe that t ≥ x ≥ 0 in the integration interval.
(b) For x ≥ 0, derive the following upper and lower bounds for the Q function:
2 2
1 e−x /2 e−x /2
(1 − 2 ) √ ≤ Q(x) ≤ √
x 2πx 2πx
2
Hint: Write the integrand in (5.109) as a product of 1/t and te−t /2 and then integrate by parts
to get the upper bound. Integrate by parts once more using a similar trick to get the lower
bound. Note that you can keep integrating by parts to get increasingly refined upper and lower
bounds.
Problem 5.44 (Geometric derivation of Q function bound) Let X1 and X2 denote inde-
pendent standard Gaussian random variables.
(a) For a > 0, express P [|X1| > a, |X2 | > a] in terms of the Q function.
(b) Find P [X12 + X22 > 2a2 ].
Hint: Transform to polar coordinates. Or use the results of Problem 5.21.
(c) Sketch the regions in the (x1 , x2 ) plane corresponding to the events considered in (a) and (b).
2
(d) Use (a)-(c) to obtain an alternative derivation of the bound Q(x) ≤ 21 e−x /2 for x ≥ 0 (i.e.,
the bound in Problem 5.43(a)).
Problem 5.45 (Cauchy-Schwarz inequality for random variables) For random variables
X and Y defined on a common probability space, define the mean squared error in approximating
X by a multiple of Y as
J(a) = E (X − aY )2
where a is a scalar. Assume that both random variables are nontrivial (i.e., neither of them is
zero with probability one).
(a) Show that
J(a) = E[X 2 ] + a2 E[Y 2 ] − 2aE[XY ]
268
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(b) Since J(a) is quadratic in a, it has a global minimum (corresponding to the best approxima-
]
tion of X by a multiple of Y ). Show that this is achieved for aopt = E[XY
E[Y 2 ]
.
(c) Show that the mean squared error in the best approximation found in (b) can be written as
2 (E[XY ])2
J(aopt ) = E[X ] −
E[Y 2 ]
(d) Since the approximation error is nonnegative, conclude that
(E[XY ])2 ≤ E[X 2 ]E[Y 2 ] Cauchy − Schwarz inequality for random variables (5.110)
This is the Cauchy-Schwarz inequality for random variables.
(e) Conclude also that equality is achieved in (5.110) if and only if X and Y are scalar multiples
of each other.
Hint: Equality corresponds to J(aopt ) = 0.
Problem 5.46 (Normalized correlation) (a) Apply the Cauchy-Schwarz inequality in the
previous problem to “zero mean” versions of the random variables, X1 = X −E[X], Y1 = Y −E[Y ]
to obtain that p
|cov(X, Y )| ≤ var(X)var(Y ) (5.111)
(b) Conclude that the normalized correlation ρ(X, Y ) defined in (5.59) lies in [−1, 1].
(c) Show that |ρ| = 1 if and only if we can write X = aY + b. Specify the constants a and b in
terms of the means and covariances associated with the two random variables.
269
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 5.48 Consider a zero mean WSS random process X with autocorrelation function
RX (τ ). Let Y1 (t) = (X ∗ h1 )(t) and Y2 (t) = (X ∗ h2 )(t) denote random processes obtained by
passing X through LTI systems with impulse responses h1 and h2 , respectively.
(a) Find the crosscorrelation function RY1 ,Y2 (t1 , t2 ).
Hint: You can use the approach employed to obtain (5.101), first finding RY1 ,X and then RY1 ,Y2 .
(b) Are Y1 and Y2 jointly WSS?
(c) Suppose that X is white noise with PSD SX (f ) ≡ 1, h1 (t) = I[0,1] (t) and h2 (t) = e−t I[0,∞) (t).
Find E[Y1 (0)Y2 (0)] and E[Y1 (0)Y2 (1)].
Problem 5.49 (Cauchy-Schwarz inequality for signals) Consider two signals (assume real-
valued for simplicity, although the results we are about to derive apply for complex-valued signals
as well) u(t) and v(t).
(a) We wish to approximate u(t) by a scalar multiple of v(t) so as to minimize the norm of the
error. Specifically, we wish to minimize
Z
J(a) = |u(t) − av(t)|2 dt = ||u − av||2 = hu − av, u − avi
Show that
J(a) = ||u||2 + a2 ||v||2 − 2ahu, vi
(b) Show that the quadratic function J(a) is minimized by choosing a = aopt , given by
hu, vi
aopt =
||v||2
Show that the corresponding approximation aopt v can be written as a projection of u along a
unit vector in the direction of v:
v v
aopt v = hu, i
||v|| ||v||
(c) Show that the error due to the optimal setting is given by
|hu, vi|2
J(aopt ) = ||u||2 −
||v||2
(d) Since the minimum error is non-negative, conclude that
||u||||v|| ≤ |hu, vi| , Cauchy − Schwarz inequality for signals (5.115)
This is the Cauchy-Schwarz inequality, which applies to real- and complex-valued signals or
vectors.
(e) Conclude also that equality in (5.115) occurs if and only if u is a scalar multiple of v or if v
is a scalar multiple of u. (We need to say it both ways in case one of the signals is zero.)
270
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Z Z
var(Z) = CX (t1 , t2 )g(t1 )g(t2 )dt1 dt2
(b) Suppose now that X is zero mean and WSS with autocorrelation RX (τ ). Show that the
variance of the correlator output can be written as
Z
var(Z) = RX (τ )Rg (τ ) dτ = hRX , Rg i
R
where Rg (τ ) = (g ∗ gM F )(τ ) = g(t)g(t − τ ) dt is the “autocorrelation” of the waveform g.
Hint: An alternative to doing this from scratch is to use the equivalence of correlation and
matched filtering. You can then employ (5.101), which gives the output autocorrelation function
when a WSS process is sent through an LTI system, evaluate it at zero lag to find the power,
and use the symmetry of autocorrelation functions.
where n(t) is WGN with power spectral density N20 = 10−5 , and φs (t) is the instantaneous phase
deviation of the noiseless FM signal. Assume that the bandwidth of the noiseless FM signal is
100 KHz.
(a) The noisy signal v(t) is passed through an ideal BPF which exactly spans the 100 KHz
frequency band occupied by the noiseless signal. What is the SNR at the output of the BPF?
(b) The output of the BPF is passed through an ideal phase detector, followed by a differentiator
which is normalized to give unity gain at 10 KHz, and an ideal (unity gain) LPF of bandwidth
10 KHz.
(i) Sketch the noise PSD at the output of the differentiator.
(ii) Find the noise power at the output of the LPF.
Problem 5.52 An FM signal of bandwidth 210 KHz is received at a power of -90 dBm, and is
corrupted by bandpass AWGN with two-sided PSD 10−22 watts/Hz. The message bandwidth is
5 KHz, and the peak-to-average power ratio for the message is 10 dB.
(a) What is the SNR (in dB) for the received FM signal? (Assume that the noise is bandlimited
to the band occupied by the FM signal.)
(b) Estimate the peak frequency deviation.
(c) The noisy FM signal is passed through an ideal phase detector. Estimate and sketch the
noise PSD at the output of the phase detector, carefully labeling the axes.
(d) The output of the phase detector is passed through a differentiator with transfer function
H(f ) = jf , and then an ideal lowpass filter of bandwidth 5 kHz. Estimate the SNR (in dB) at
the output of the lowpass filter.
Problem 5.53 A message signal m(t) is modeled as a zero mean random process with PSD
Sm (f ) = |f |I[−2,2] (f )
We generate an SSB signal as follows:
u(t) = 20[m(t) cos 200πt − m̌(t) sin 200πt]
271
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where m̌ denotes the Hilbert transform of m.
(a) Find the power of m and the power of u.
(b) The noisy received signal is given by y(t) = u(t)+n(t), where n is passband AWGN with PSD
N0
2
= 1, and is independent of u. Draw the block diagram for an ideal synchronous demodulator
for extracting the message m from y, specifying the carrier frequency as well as the bandwidth
of the LPF, and find the SNR at the output of the demodulator.
(c) Find the signal-to-noise-plus-interference ratio if the local carrier for the synchronous demod-
ulator has a phase error of π8 .
1 2
Q(x) ≤ e−x /2 , x≥0 (5.117)
2
2
10
0
10
−2
10
−4
10
−6
10
−8 Q(x)
10
Asymp tight upper bnd
Asymp tight lower bnd
Upper bnd for small x
−10
10
0 1 2 3 4 5 6
x
Figure 5.29 plots Q(x) and its bounds for positive x. A logarithmic scale is used for the values
of the function in order to demonstrate the rapid decay with x. The bounds (5.116) are seen to
272
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
be tight even at moderate values of x (say x ≥ 2), while the bound (5.117) shows the right rate
of decay for large x, while also remaining useful for small x.
Notice that the sum Sn = X1 + ... + Xn has mean nm and variance nv 2 . Thus, the CLT is telling
us that Yn a normalized, zero mean, unit variance version of Sn , has a distribution that tends to
N(0, 1) as n gets large. In practical terms, this translates to using the CLT to approximate Sn
as a Gaussian random variable with mean nm and variance nv 2 , for “large enough” n. In many
scenarios, the CLT kicks in rather quickly, and the Gaussian approximation works well for values
of n as small as 6-10.
0.2
Binomial pmf
Gaussian approximation
0.18
0.16
0.14
0.12
p(k)
0.1
0.08
0.06
0.04
0.02
0
−5 0 5 10 15 20
k
Figure 5.30: A binomial pmf with parameters n = 20 and p = 0.3, and its N(6, 4.2) Gaussian
approximation.
273
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
mial random variable with parameters n and p. We know that we can write it as Sn = X1 +...+Xn ,
where Xi are i.i.d. Bernoulli(p). Note that E[Xi ] = p and var(Xi ) = p(1−p), so that Sn has mean
np and variance np(1 − p). We can therefore approximate Binomial(n, p) by N(np, np(1 − p))
according to the CLT. The CLT tells us that we can approximate the CDF of a binomial by a
Gaussian: thus, the integral of the Gaussian density from (−∞, k] should approximate the sum
of the binomial pmf from 0 to k. The plot in Figure 5.30 shows that the Gaussian density itself
(with mean np = 6 and variance np(1 − p) = 4.2) approximates the binomial pmf quite well
around the mean, so that we do expect the corresponding CDFs to be close.
vn2 = 4RkT B
Now, if we connect the noise source to a matched load of impedance R, the mean squared power
delivered to the load is
(vn /2)2
Pn2 = = kT B (5.121)
R
274
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The preceding calculation provides a valuable benchmark, giving the communication link designer
a ballpark estimate of how much noise power to expect in a receiver operating over a bandwidth B.
Of course, the noise for a particular receiver is typically higher than this benchmark, and must be
calculated based on detailed modeling and simulation of internal and external noise sources, and
the gains, input impedances, and output impedances for various circuit components. However,
while the circuit designer must worry about these details, once the design is complete, he or she
can supply the link designer with a single number for the noise power at the receiver output,
referred to the benchmark (5.121).
Shot noise: Shot noise occurs because of the discrete nature of the charge carriers. When a
voltage applied across a device causes current to flow, if we could count the number of charge
carriers going from one point in the device to the other (e.g., from the source to the drain of
a transistor) over a time period τ , we would see a random number N(τ ), which would vary
independently across disjoint time periods. Under rather general assumptions, N(τ ) is well
modeled as a Poisson random variable with mean λτ , where λ scales with the DC current. The
variance of a Poisson random variable equals its mean, so that the variance of the rate of charge
carrier flow equals
N(τ ) 1 λ
var( ) = 2 var(N(τ )) =
τ τ τ
We can think of this as the power of the shot noise. Thus, increasing the observation interval
τ smooths out the variations in charge carrier flow, and reduces the shot noise power. If we
now think of the device being operated over a bandwidth B, we know that we are effectively
observing the device at a temporal resolution τ ∼ B1 . Thus, shot noise power scales linearly with
B.
The preceding discussion indicates that both thermal noise and shot noise are white, in that
their power scales linearly with the system bandwidth B, independent of the frequency band of
operation. We can therefore model the aggregate system noise due to these two phenomena as a
single white noise process. Indeed, both phenomena involve random motions of a large number
of charge carriers, and can be analyzed together in a statistical mechanics framework. This is
well beyond our scope, but for our purpose, we can simply model the aggregate system noise due
to these phenomena as a single white noise process.
Flicker noise: Another commonly encountered form of noise is 1/f noise, also called flicker
noise, whose power increases as the frequency of operation gets smaller. The sources of 1/f
noise are poorly understood, and white noise dominates in the typical operating regimes for
communication receivers. For example, in an RF system, the noise in the front end (antenna,
low noise amplifier, mixer) dominates the overall system noise, and 1/f noise is negligible at
these frequencies. We therefore ignore 1/f noise in our noise modeling.
275
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the band of interest, any sample path xp (t) for a passband random process can be written as
xp (t) = Re x(t)ej2πfc t
xp (t) = xc (t) cos 2πfc t − xs (t) sin 2πfc t
xp (t) = e(t) cos (2πfc t + θ(t))
where x(t) = xc (t) + jxs (t) = e(t)ejθ(t) is the complex envelope, xc (t), xs (t) are the I and Q
components, respectively, and e(t), θ(t) are the envelope and phase, respectively.
PSD of complex envelope: Applying the standard frequency domain relationship to the
time-windowed sample paths, we have the frequency domain relationship
1 1
Xp,To (f ) = XTo (f − fc ) + XT∗o (−f − fc )
2 2
We therefore
1 1 1 1
|Xp,To (f )|2 = |XTo (f − fc )|2 + |XT∗o (−f − fc )|2 = |XTo (f − fc )|2 + |XTo (−f − fc )|2
4 4 4 4
Dividing by To and letting To → ∞, we obtain
1 1
Sxp (f ) = Sx (f − fc ) + Sx (−f − fc ) (5.122)
4 4
where Sx (f ) is baseband. Using (5.87), the one-sided passband PSD is given by
1
Sx+p (f ) = Sx (f − fc ) (5.123)
2
Similarly, we can go from passband to complex baseband using the formula
Sx (f ) = 2Sx+p (f + fc ) (5.124)
What about the I and Q components? Consider the complex envelope x(t) = xc (t) + jxs (t). Its
autocorrelation function is given by
which yields
Rx (τ ) = (Rxc (τ ) + Rxs (τ )) + j (Rxs ,xc (τ ) − Rxc ,xs (τ ))
(5.125)
= (Rxc (τ ) + Rxs (τ )) + j (Rxs ,xc (τ ) − Rxs ,xc (−τ ))
Taking the Fourier transform, we obtain
which simplifies to
Sx (f ) = Sxc (f ) + Sxs (f ) − 2Im (Sxs ,xc (f )) (5.126)
For simplicity, we henceforth consider situations in which Sxs ,xc (f ) ≡ 0 (i.e., the I and Q com-
ponents are uncorrelated). Actually, for a given passband random process, even if the I and
Q components for a given frequency reference are uncorrelated, we can make them correlated
by shifting the frequency reference. However, such subtleties are not required for our purpose,
which is to model digitally modulated signals and receiver noise.
276
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Sn (f ) = 2N0 , |f | ≤ B/2
N
0
f f
−B/2 B/2 −B/2 B/2
Figure 5.31: PSD of I and Q components, and complex envelope, of passband white noise.
Note that the power of the complex envelope is 2N0 B, which is twice the power of the correspond-
ing passband noise np . This is consistent with the convention in Chapter 2 for deterministic,
finite-energy signals, where the complex envelope has twice the energy of the corresponding pass-
band signal. Later, when we discuss digital communication receivers and their performance in
Chapter 6, we find it convenient to scale signals and noise in complex baseband such that we get
rid of this factor of two. In this case, we obtain that the PSD of the I and Q components PSDs
are given by Snc (f ) = Sns (f ) = N0 /2.
2cos(2 π fc t−θ )
Figure 5.32: Circular symmetry implies that the PSD of the baseband noise nb (t) is independent
of θ.
An important property of passband white noise is its circular symmetry: the statistics of the I
and Q components are unchanged if we change the phase reference. To understand what this
means in practical terms, consider the downconversion operation shown in Figure 5.32, which
277
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
yields a baseband random process nb (t). Circular symmetry corresponds to the assumption that
the PSD of nb does not depend on θ. Thus, it immediately implies that
where θr is the phase offset between the incoming carrier and the LO. The received signal power
is given by
Pr = (Ac m(t) cos(2πfc t + θr ))2 = A2c Pm /2
278
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
A coherent demodulator extracts the I component, which is given by
where θr is the phase offset between the incoming carrier and the LO. The received signal power
is given by
This coincides with the baseband benchmark (5.128) for ideal coherent demodulation (i.e., θr =
0). However, for θr 6= 0, even when the received signal power Pr gets arbitrarily large relative to
the noise power, the SINR cannot be larger than tan12 θr , which shows the importance of making
the phase error as small as possible.
SNR for AM: Now, consider conventional AM. While we would typically use envelope detection
rather than coherent demodulation in this setting, it is instructive to compute SNR for both
methods of demodulation. For message bandwidth Bm , the bandwidth of the passband received
signal is B = 2Bm . The received signal given by
279
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where mn (t) is the normalized version of the message (with mint mn (t) = −1), and where θr is
the phase offset between the incoming carrier and the LO. The received signal power is given by
where Pmn = m2n (t) is the power of the normalized message. A coherent demodulator extracts
the I component, which is given by
The power of the information-bearing part of the signal (the DC term due to the carrier carries
no information, and is typically rejected using AC coupling) is given by
Recall that the AM power efficiency is defined as the power of the message-bearing part of the
signal to the power of the overall signal (which includes an unmodulated carrier), and is given
by
a2mod Pmn
ηAM =
1 + a2mod Pmn
We can therefore write the signal power (5.133) at the output of the coherent demodulator in
terms of the received power in (5.132) as:
Thus, even with ideal coherent demodulation (θr = 0), the SNR obtained is AM is less than that
of the baseband benchmark, since ηAM < 1 (typically much smaller than one). Of course, the
reason we incur this power inefficiency is to simplify the receiver, by message recovery using an
envelope detector. Let us now compute the SNR for the latter.
n s (t) = ys (t)
e(t)
A c (1+a mod m n(t))+ n c (t) = yc (t)
θr
Figure 5.33: At high SNR, the envelope of an AM signal is approximately equal to its I component
relative to the received carrier phase reference.
Expressing the passband noise in the received signal (5.131) with the incoming carrier as the
reference, we have
280
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where, by virtue of circular symmetry, nc , ns have the PSDs and cross-spectra as in Figure 5.31,
regardless of θr . That is,
At high SNR, the signal term is dominant, so that yc (t) ≫ ys (t). Furthermore, since the AM
signal is positive (assuming amod < 1), so that yc > 0 “most of the time,” even though nc can be
negative. We therefore obtain that
p
e(t) = yc2 (t) + ys2(t) ≈ |yc (t)| ≈ yc (t)
That is, the output of the envelope detector is approximated, for high SNR, as
The right-hand side is what we would get from ideal coherent detection. We can reuse our SNR
computation for coherent detection to conclude that the SNR at the envelope detector output is
given by
SNRAM,envdet = SNRb ηAM (5.135)
Thus, for a properly designed (amod < 1) AM system operating at high SNR, the envelope
detector approximates the performance of ideal coherent detection, without requiring carrier
synchronization.
where np (t) is passband white noise with one-sided PSD N0 over the signal band of interest, and
where the message is encoded in the phase θ(t). For example,
θ(t) = kp m(t)
281
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
n s (t)
θ n (t)
Ac + n c (t)
θ (t)
Figure 5.34: I and Q components of a noisy angle modulated signal with the phase reference
chosen as the phase of the noiseless signal.
where nc , ns have PSDs as in Figure 5.31 (with cross-spectrum Sns ,nc (f ) ≡ 0), thanks to circular
symmetry (we assume that it applies approximately even though the phase reference θ(t) is time-
varying). The I and Q components with respect to this phase reference are shown in Figure 5.34,
so that the corresponding complex envelope can be written as
where q
e(t) = (Ac + nc (t))2 + n2s (t) (5.138)
and
ns (t)
θn (t) = tan−1 (5.139)
Ac + nc (t)
The passband signal in (5.137) can now be rewritten as
yp (t) = Re(y(t)e2πfc t+θ(t) ) = Re e(t)ejθn (t) e2πfc t+θ(t) = e(t) cos (2πfc t + θ(t) + θn (t))
ns (t)
| |≪1
Ac + nc (t)
and
ns (t) ns (t)
≈
Ac + nc (t) Ac
For |x| small, tan x ≈ x, and hence x ≈ tan−1 x. We therefore obtain the following high SNR
approximation for the phase perturbation due to the noise:
ns (t) ns (t)
θn (t) = tan−1 ≈ , high SNR approximation (5.140)
Ac + nc (t) Ac
ns (t)
yp (t) ≈ Ac cos(2πfc t + θ(t) + ), high SNR approximation (5.141)
Ac
Thus, the Q component (relative to the desired signal’s phase reference) of the passband white
noise appears as phase noise, but is scaled down by the signal amplitude.
282
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
FM Noise Analysis
Let us apply the preceding to develop an analysis of the effects of white noise on FM. It is
helpful, but not essential, to have read Chapter 3 for this discussion. Suppose that we have an
ideal detector for the phase of the noisy signal in (5.141), and that we differentiate it to recover
a message encoded in the frequency. (For those who have read Chapter 3, we are talking about
an ideal limiter-discriminator). The output is the instantaneous frequency deviation, given by
1 d n′ (t)
z(t) = (θ(t) + θn (t)) ≈ kf m(t) + s (5.142)
2π dt 2πAc
using the high SNR approximation (5.140).
Message
FM Modulator
Channel
RF Frontend
Estimated
Limiter−Discriminator
message
~
~
~
~
BRF
Before limiter−discriminator After limiter−discriminator
Figure 5.36: PSDs of signal and noise before and after limiter-discriminator.
We now analyze the performance of an FM system whose block diagram is shown in Figure
5.35. For wideband FM, the bandwidth BRF of the received signal yp (t) is significantly larger
than the message bandwidth Bm : BRF ≈ 2(β + 1)Bm by Carson’s formula, where β > 1. Thus,
the RF front end in Figure 5.35 lets in passband white noise np (t) of bandwidth of the order
283
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
of BRF , as shown in Figure 5.36. Figure 5.36 also shows the PSDs once we have passed the
received signal through the limiter-discriminator. The estimated message at the output of the
limiter-discriminator is a baseband signal which we can limit to the message bandwidth Bm ,
which significantly reduces the noise that we see at the output of the limiter-discriminator. Let
us now compute the output SNR. From (5.142), the signal power is given by
n′s (t)
zn (t) =
2πAc
Since d/dt ↔ j2πf , zn (t) is obtained by passing ns (t) through an LTI system with G(f ) =
j2πf
2πAc
= jf /Ac . Thus, the noise PSD at the output of the limiter-discriminator is given by
Once we limit the bandwidth to the message bandwidth Bm after the discriminator, the noise
power is given by
Z Bm Z Bm 2 3
f N0 2Bm N0
Pn = Szn (f )df = 2
df = (5.145)
−Bm −Bm Ac 3A2c
From (5.143) and (5.145), we obtain that the SNR is given by
Ps 3kf2 Pm A2c
SNRF M = = 3N
Pn 2Bm 0
and the modulation index is defined as the ratio between the maximum frequency deviation and
the message bandwidth:
∆fmax
β=
Bm
Thus, we have
kf2 Pm (∆fmax )2 Pm
= = β 2 /P AR
Bm 2 Bm (maxt |m(t)|)2
2
284
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
defining the peak-to-average power ratio (PAR) of the message as
(maxt |m(t)|)2 (maxt |m(t)|)2
P AR = =
m2 (t) Pm
Substituting into (5.146), we obtain that
3β 2
SNRF M = SNRb (5.147)
P AR
Thus, FM can improve upon the baseband benchmark by increasing the modulation index β.
This is an example of a power-bandwidth tradeoff: by increasing the bandwidth beyond that
strictly necessary for sending the message, we have managed to improve the SNR compared to the
baseband benchmark. However, the quadratic power-bandwidth tradeoff offered by FM is highly
suboptimal compared to the best possible tradeoffs in digital communication systems, where
one can achieve exponential tradeoffs. Another drawback of the FM power-bandwidth tradeoff
is that the amount of SNR improvement depends on the PAR of the message: messages with
larger dynamic range, and hence larger PAR, will see less improvement. This is in contrast to
digital communication, where message characteristics do not affect the power-bandwidth tradeoffs
over the communication link, since messages are converted to bits via source coding before
transmission. Of course, messages with larger dynamic range may well require more bits to
represent them accurately, and hence a higher rate on the communication link, but such design
choices are decoupled from the parameters governing reliable link operation.
Threshold effect: It appears from (5.147) that the output SNR can be improved simply by
increasing β. This is somewhat misleading. For a given message bandwidth Bm , increasing β
corresponds to increasing the RF bandwidth: BRF ≈ 2(β + 1)Bm by Carson’s formula. Thus,
an increase in β corresponds to an increase in the power of the the passband white noise at
the input of the limiter-discriminator, which is given by N0 BRF = 2N0 Bm (β + 1). Thus, if we
increase β, the high SNR approximation underlying (5.140), and hence the model (5.142) for the
output of the limiter-discriminator, breaks down. It is easy to see this from the equation (5.139)
for the phase perturbation due to noise: θn (t) = tan−1 Acn+n
s (t)
c (t)
. When Ac is small, variations in
nc (t) can change the sign of the denominator, which leads to phase changes of π, over a small
time interval. This leads to impulses in the output of the discriminator. Indeed, as we start
reducing the SNR at the input to the discriminator for FM audio below the threshold where the
approximation (5.140) holds, we can actually hear these peaks as “clicks” in the audio output.
As we reduce the SNR further, the clicks swamp out the desired signal. This is called the FM
threshold effect.
To avoid this behavior, we must operate in the high-SNR regime where Ac ≫ |nc |, |ns |, so that
the approximation (5.140) holds. In other words, the SNR for the passband signal at the input
to the limiter-discriminator must be above a threshold, say γ (e.g., γ = 10 might be a good rule
of thumb), for FM demodulation to work well. This condition can be expressed as follows:
PR
≥γ (5.148)
N0 BRF
Thus, in order to utilize a large RF bandwidth to improve SNR at the output of the limiter-
discriminator, the received signal power must also scale with the available bandwidth. Using
Carson’s formula, we can rewrite (5.148) in terms of the baseband benchmark as follows:
PR
SNRb = ≥ 2γ(β + 1) , condition for operation above threshold (5.149)
N0 Bm
To summarize, the power-bandwidth tradeoff (5.147) applies only when the received power (or
equivalently, the baseband benchmark SNR) is above a threshold that scales with the bandwidth,
as specified by (5.149).
285
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Message
Preemphasis FM Modulator
Channel
RF Frontend
Estimated
Deemphasis Limiter−Discriminator
Message
A typical choice for the preemphasis filter is a highpass filter with a single zero, with transfer
function of the form
HP E (f ) = 1 + j2πf τ1
The corresponding deemphasis filter is a single pole lowpass filter with transfer function
1
HDE (f ) =
1 + j2πf τ1
For FM audio broadcast, τ1 is chosen in the range 50-75 µs (e.g., 75 µs in the United States, 50 µs
in Europe). The f 2 noise scaling at the output of the limiter-discriminator is compensated by the
(approximately) 1/f 2 scaling provided by |HDE (f )|2 beyond the cutoff frequency fpd = 1/(2πτ1 )
(the subscript indicates the use of preemphasis and deemphasis), which evaluates to 2.1 KHz for
τ1 = 75 µs.
Let us compute the SNR improvement obtained using this strategy. Assuming that the pre-
emphasis and deemphasis filters compensate each other exactly, the signal contribution to the
estimated message at the output of the deemphasis filter in Figure 5.37 is kf m(t), which equals
the signal contribution to the estimated message at the output of the limiter-discriminator in
Figure 5.35, which shows a system not using preemphasis/deemphasis. Since the signal contri-
butions in the estimated messages in both systems are the same, any improvement in SNR must
come from a reduction in the output noise. Thus, we wish to characterize the noise PSD and
power at the output of the deemphasis filter in Figure 5.37. To do this, note that the noise at
the output of the limiter-discriminator is the same as before:
n′s (t)
zn (t) =
2πAc
with PSD
Szn (f ) = |G(f )|2Sns (f ) = f 2 N0 /A2c , |f | ≤ BRF /2
286
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The noise vn obtained by passing zn through the deemphasis filter has PSD
2
f2 N0 fpd
2 N0 1
Svn (f ) = |HDE (f )| Szn (f ) = 2 = 1−
Ac 1 + (f /fpd )2 A2c 1 + (f /fpd )2
Integrating over the message bandwidth, we find that the noise power in the estimated message
in Figure 5.37 is given by
Z Bm 3
2N0 fpd
Bm −1 Bm
Pn = Svn (f )df = − tan (5.150)
−Bm A2c fpd fpd
where we have used the substitution tan x = f /fpd to evaluate the integral. As we have already
mentioned, the signal power is unchanged from the earlier analysis, so that the improvement in
SNR is given by the reduction in noise power compared with (5.145), which gives
3
3 N
2Bm Bm
0
3A2c 1 fpd
SNRgain = 3 = (5.151)
2N0 fpd Bm
− tan−1 Bm 3 Bm − tan−1 Bm
A2c fpd fpd fpd fpd
For fpd = 2.1 KHz, corresponding to the United States guidelines for FM audio broadcast, and
an audio bandwidth Bm = 15 KHz, the SNR gain in (5.151) evaluates to more than 13 dB.
For completeness, we give the formula for the SNR obtained using preemphasis and deemphasis
as 3
Bm
fpd β2
SNRF M,pd = SNRb (5.152)
Bm
− tan−1 Bfpd
m P AR
fpd
which is obtained by taking the product of the SNR gain (5.151) and the SNR without preem-
phasis/deemphasis given by (5.147).
287
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
288
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 6
Optimal Demodulation
As we saw in Chapter 4, we can send bits over a channel by choosing one of a set of waveforms to
send. For example, when sending a single 16QAM symbol, we are choosing one of 16 passband
waveforms:
sbc ,bs = bc p(t) cos 2πfc t − bs p(t) sin 2πfc t
where bc , bs each take values in {±1, ±3}. We are thus able to transmit log2 16 = 4 bits of
information. In this chapter, we establish a framework for recovering these 4 bits when the
received waveform is a noisy version of the transmitted waveform. More generally, we consider
the fundamental problem of M-ary signaling in additive white Gaussian noise (AWGN): one of
M signals, s1 (t), ..., sM (t) is sent, and the received signal equals the transmitted signal plus white
Gaussian noise (WGN).
At the receiver, we are faced with a hypothesis testing problem: we have M possible hypotheses
about which signal was sent, and we have to make our “best” guess as to which one holds, based
on our observation of the received signal. We are interested in finding a guessing strategy, more
formally termed a decision rule, which is the “best” according to some criterion. For communi-
cations applications, we are typically interested in finding a decision rule which minimizes the
probability of error (i.e., the probability of making a wrong guess). We can now summarize the
goals of this chapter as follows.
Goals: We wish to design optimal receivers when the received signal is modeled as follows:
Hi : y(t) = si (t) + n(t) , i = 1, , , .M
where Hi is the ith hypothesis, corresponding to signal si (t) being transmitted, and where n(t)
is white Gaussian noise. We then wish to analyze the performance of such receivers, to see how
performance measures such as the probability of error depend on system parameters. It turns
out that, for the preceding AWGN model, the performance depends only on the received signal-
to-noise ratio (SNR) and on the “shape” of the signal constellation {s1 (t), ..., sM (t)}. Underlying
both the derivation of the optimal receiver and its analysis is a geometric view of signals and
noise as vectors, which we term signal space concepts. Once we have this background, we are in
a position to discuss elementary power-bandwidth tradeoffs. For example, 16QAM has higher
bandwidth efficiency than QPSK, so it makes sense that it has lower power efficiency; that is, it
requires higher SNR, and hence higher transmit power, for the same probability of error. We will
be able to quantify this intuition, previewed in Chapter 4, based on the material in this chapter.
We will also be able to perform link budget calculations: for example, how much transmit power
is needed to attain a given bit rate using a given constellation as a function of range, and transmit
and receive antenna gains?
Chapter Plan: The prerequisites for this chapter are Chapter 4 (digital modulation) and the
material on Gaussian random variables (Section 5.6) and noise modeling (Section 5.8) in Chap-
ter 5. We build up the remaining background required to attain our goals in this chapter in a
289
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
step-by-step fashion, as follows.
Hypothesis testing: In Section 6.1, we establish the basic framework for hypothesis testing, de-
rive the form of optimal decision rules, and illustrate the application of this framework for
finite-dimensional observations.
Signal space concepts: In Section 6.2, we show that continuous time M-ary signaling in AWGN
can be reduced to an equivalent finite-dimensional system, in which transmitted signal vectors
are corrupted by vector WGN. This is done by projecting the continuous time signal into the
finite-dimensional signal space spanned by the set of possible transmitted signals, s1 , ..., sM . We
apply the hypothesis testing framework to derive the optimal receiver for the finite-dimensional
system, and from this we infer the optimal receiver in continuous time.
Performance analysis: In Section 6.3, we analyze the performance of optimal reception. We show
that performance depends only on SNR and the relative geometry of the signal constellation. We
provide exact error probability expressions for binary signaling. While the probability of error for
larger signal constellations must typically be computed by simulation or numerical integration,
we obtain bounds and approximations, building on the analysis for binary signaling, that provide
quick insight into power-bandwidth tradeoffs.
Link budget analysis: In Section 6.5, we illustrate how performance analysis is applied to obtain-
ing the “link budget” for a typical radio link, which is the tool used to obtain coarse guidelines
for the design of hardware, including transmit power, transmit and receive antennas, and receiver
noise figure.
Software: Software Lab 6.1 in this chapter builds on Software Lab 4.1, providing a hands-on
feel for Nyquist signaling over an AWGN channel. In turn, we build on this lab in Software Lab
8.1, which adds in channel dispersion to the model.
Notational shortcut: In this chapter, we make extensive use of the notational simplification
discussed at the end of Section 5.3. Given a random variable X, a common notation for probabil-
ity density function or probability mass function is pX (x), with X denoting the random variable,
and x being a dummy variable which we might integrate out when computing probabilities.
However, when there is no scope for confusion, we use the less cumbersome (albeit incomplete)
notation p(x), using the dummy variable x not only as the argument of the density, but also
to indicate that the density corresponds to the random variable X. (Similarly, we would use
p(y) to denote the density for a random variable Y .) The same convention is used for joint and
conditional densities as well. For random variables X and Y , we use the notation p(x, y) in-
stead of pX,Y (x, y), and p(y|x) instead of pY |X (y|x), to denote the joint and conditional densities,
respectively.
290
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
different under the two hypotheses, then it is no longer clear that splitting the difference between
the means is the right thing to do. We therefore need a systematic framework for hypothesis
testing, which allows us to derive good decision rules for a variety of statistical models.
In this section, we consider the general problem of M-ary hypothesis testing, in which we must
decide which of M possible hypotheses, H0 , ..., HM −1, “best explains” an observation Y . For
our purpose, the observation Y can be a scalar or vector, and takes values in an observation
space Γ. The link between the hypotheses and observation is statistical: for each hypothesis
Hi , we know the conditional distribution of Y given Hi . We denote the conditional density
of Y given Hi as p(y|i), i = 0, 1, ..., M − 1. We may also know the prior probabilities of the
hypotheses (i.e., the probability of each hypothesis prior to seeing the observation), denoted by
πi = P [Hi ], i = 0, 1, ..., M − 1, which satisfy M
P −1
i=0 π= 1. The final ingredient of the hypothesis
testing framework is the decision rule: for each possible value Y = y of the observation, we must
decide which of the M hypotheses we will bet on. Denoting this guess as δ(y), the decision rule
δ(·) is a mapping from the observation space Γ to {0, 1, ..., M − 1}, where δ(y) = i means that
we guess that Hi is true when we see Y = y. The decision rule partitions the observation space
into decision regions, with Γi denoting the set of values of Y for which we guess Hi . That is,
Γi = {y ∈ Γ : δ(y) = i}, i = 0, 1, ..., M − 1. We summarize these ingredients of the hypothesis
testing framework as follows.
Ingredients of hypothesis testing framework
• Hypotheses H0 , H1 , ..., HM −1
• Observation Y ∈ Γ
• Conditional densities p(y|i), for i = 0, 1, ..., M − 1
Prior probabilities πi = P [Hi ], i = 0, 1, ..., M − 1, with M
P −1
• i=0 πi = 1
• Decision rule δ : Γ → {0, 1, ..., M − 1}
• Decision regions Γi = {y ∈ Γ : δ(y) = i}, i = 0, 1, ..., M − 1
To make the concepts concrete, let us quickly recall Example 5.6.3, where we have M = 2
hypotheses, with H0 : Y ∼ N(0, v 2 ) and H1 : Y ∼ N(m, v 2 ). The “sensible” decision rule in this
example can be written as
0, y ≤ m/2
δ(y) =
1, y > m/2
so that Γ0 = (−∞, m/2] and Γ1 = (m/2, ∞). Note that this decision rule need not be optimal
if we know the prior probabilities. For example, if we know that π0 = 1, we should say that H0
m
is true, regardless of the value of Y : this would reduce the probability of error from Q 2v (for
the “sensible” rule) to zero!
291
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Conditional Probabilities of Correct Decision: These are defined as
Pc|i = 1 − Pe|i = P [Y ∈ Γi |Hi ] (6.2)
Average Error Probability: This is given by averaging the conditional error probabilities
using the priors:
M
X
Pe = πi Pe|i (6.3)
i=1
292
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Properties of the MAP rule:
• The MAP rule reduces to the ML rule for equal priors.
• The MAP rule minimizes the probability of error. In other words, it is also the Minimum
Probability of Error (MPE) rule.
The first property follows from (6.6) by setting πi ≡ 1/M: in this case πi does not depend on i
and can therefore be dropped when maximizing over i. The second property is important enough
to restate and prove as a theorem.
Theorem 6.1.1 The MAP rule (6.6) minimizes the probability of error.
Proof of Theorem 6.1.1: We show that the MAP rule maximizes the probability of correct
decision. To do this, consider an arbitrary decision rule δ, with corresponding decision regions
{Γi }. The conditional probabilities of correct decision are given by
Z
Pc|i = P [Y ∈ Γi |Hi ] = p(y|i)dy, i = 0, 1, ..., M − 1
Γi
Any point y ∈ Γ can belong in exactly one of the M decision regions. If we decide to put it in
Γi , then the point contributes the term πi p(y|i) to the integrand. Since we wish to maximize
the overall integral, we choose to put y in the decision region for which it makes the largest
contribution to the integrand. Thus, we put it in Γi so as to maximize πi p(y|i), which is precisely
the MAP rule (6.6).
p(y|0)
p(y|1)
y
1.85
Figure 6.1: Hypothesis testing with exponentially distributed observations.
293
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 6.1.1 (Hypothesis testing with exponentially distributed observations): A
binary hypothesis problem is specified as follows:
H0 : Y ∼ Exp(1) , H1 : Y ∼ Exp(1/4)
where Exp(µ) denotes an exponential distribution with density µe−µy , CDF 1 − e−µy and com-
plementary CDF e−µy , where y ≥ 0 (all the probability mass falls on the nonnegative numbers).
Note that the mean of an Exp(µ) random variable is 1/µ. Thus, in our case, the mean under H0
is 1, while the mean under H1 is 4.
(a) Find the ML rule and the corresponding conditional error probabilities.
(b) Find the MPE rule when the prior probability of H1 is 1/5. Also find the conditional and
average error probabilities.
Solution:
(a) As shown in Figure 6.1, we have
H1
>
y (4/3) log 4 = 1.8484
<
H0
294
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
which reduces to
H1
>
(1/5) (1/4)e−y/4 (4/5) e−y
<
H0
This gives
H1
> 4
y log 16 = 3.6968
< 3
H0
Proceeding as in (a), we obtain
Since the prior probability of H1 is small, the MPE rule is biased towards guessing that H0 is
true. In this case, the decision rule is so skewed that the conditional probability of error under
H1 is actually worse than a random guess. Taking this one step further, if the prior probability
of H1 actually becomes zero, then the MPE rule would always guess that H0 is true. In this case,
the conditional probability of error under H1 would be one! This shows that we must be careful
about modeling when applying the MAP rule: if we are wrong about our prior probabilities, and
H1 does occur with nonzero probability, then our performance would be quite poor.
Both the ML and MAP rules involve comparison of densities, and it is convenient to express
them in terms of a ratio of densities, or likelihood ratio, as discussed next.
Binary hypothesis testing and the likelihood ratio: For binary hypothesis testing, the ML
rule (6.5) reduces to
H1 H1
> p(y|1) >
p(y|1) p(y|0) , or 1 (6.7)
< p(y|0) <
H0 H0
The ratio of conditional densities appearing above is defined to be the likelihood ratio (LR) L(y)
a function of fundamental importance in hypothesis testing. Formally, we define the likelihood
ratio as
p(y|1)
L(y) = , y∈Γ (6.8)
p(y|0)
Likelihood ratio test: A likelihood ratio test (LRT) is a decision rule in which we compare the
likelihood ratio to a threshold.
H1
>
L(y) γ
<
H0
where the choice of γ depends on our performance criterion. An equivalent form is the log
likelihood ratio test (LLRT), where the log of the likelihood ratio is compared with a threshold.
295
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
We have already shown in (6.7) that the ML rule is an LRT with threshold γ = 1. From (6.6),
we see that the MAP, or MPE, rule is also an LRT:
H1 H1
> p(y|1) > π0
π1 p(y|1) π0 p(y|0) , or
< p(y|0) < π1
H0 H0
H1 H1
> >
L(y) 1 or log L(y) 0 ML rule (6.9)
< <
H0 H0
H1 H1
> π0 > π0
L(y) or log L(y) log MAP/MPE rule (6.10)
< π1 < π1
H0 H0
We now specialize further to the setting of Example 5.6.3. The conditional densities are as shown
in Figure 6.2. Since this example is fundamental to our understanding of signaling in AWGN,
let us give it a name, the basic Gaussian example, and summarize the set-up in the language of
hypothesis testing.
p(y|0) p(y|1)
0 m/2 m
H0 : Y ∼ N(0, v 2 ), H1 : Y ∼ N(m, v 2 ), or
2 2
y
exp(− 2v 2) exp(− (y−m)
2 )
p(y|0) = √ ; p(y|1) = √ 2v (6.11)
2πv 2 2πv 2
Likelihood ratio for basic Gaussian example: Substituting (6.11) into (6.8) and simplifying
(this is left as an exercise), obtain that the likelihood ratio for the basic Gaussian example is
1 m2
L(y) = exp v2
(my − 2
)
(6.12)
1 m2
log L(y) = v2
my − 2
296
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
ML and MAP rules for basic Gaussian example: Using (6.12) in (6.9), we leave it as an
exercise to check that the ML rule reduces to
H1
>
Y m/2, ML rule (m > 0) (6.13)
<
H0
(check that the inequalities get reversed for m < 0). This is exactly the “sensible” rule that we
analyzed in Example 5.6.3. Using (6.12) in (6.10), we obtain the MAP rule:
H1
> v2 π0
Y m/2 + log , MAP rule (m > 0) (6.14)
< m π1
H0
Example 6.1.2 (ML versus MAP for the basic Gaussian example): For the basic Gaus-
sian example, we now know that the decision rule in Example 5.6.3 is the ML rule, and we
showed in that example that the performance of this rule is given by
m p
Pe|0 = Pe|1 = Pe = Q =Q SNR/2
2v
We also saw that at 13 dB SNR, the error probability for the ML rule is
Pe,M L = 7.8 × 10−4
regardless of the prior probabilities. For equal priors, the ML rule is also MPE, and we cannot
hope to do better than this. Let us now see what happens when the prior probability of H0 is
π0 = 13 . The ML rule is no longer MPE, and we should be able to do better by using the MAP
rule. We leave it as an exercise to show that the conditional error probabilities for the MAP rule
are given by
m v π0 m v π0
Pe|0 = Q + log , Pe|1 = Q − log (6.15)
2v m π1 2v m π1
Plugging in the numbers for SNR of 13 dB and π0 = 13 , we obtain
297
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0 0
10 10
P(error) (ML) P(error) (ML)
P(error) (MAP) P(error) (MAP)
P(error|0) (MAP) P(error|0) (MAP)
P(error|1) (MAP) P(error|1) (MAP)
−1
10
−1
Error Probabilities
Error Probabilities
10
−2
10
−2
10
−3
10
−3 −4
10 10
3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
SNR (dB) Prior Probability of H0
(a) Dependence on SNR (π0 = 0.3) (b) Dependence on priors (SNR = 10 dB)
Figure 6.3: Conditional and average error probabilities for the MAP receiver compared to the
error probability for the ML receiver. We consider the basic Gaussian example, fixing the priors
and varying SNR in (a), and fixing SNR and varying the priors in (b). For the MAP rule,
the conditional error probability given a hypothesis increases as the prior probability of the
hypothesis decreases. The average error probability for the MAP rule is always smaller than the
ML rule (which is the MAP rule for equal priors) when π0 6= 12 . The MAP error probability
tends towards zero as π0 → 0 or π0 → 1.
but we would be a lot more confident about our guess in the latter instance. Rather than
throwing away this information, we can employ soft decisions that convey reliability information
which could be used at a higher layer, for example, by a decoder which is processing a codeword
consisting of many bits.
Actually, we already know how to compute soft decisions: the posterior probabilities P [Hi |Y = y],
i = 0, 1, ..., M − 1, that appear in the MAP rule are actually the most information that we can
hope to get about the hypotheses from the observation. For notational compactness, let us
denote these by πi (y). The posterior probabilities can be computed using Bayes’ rule as follows:
πi p(y|i) πi p(y|i)
πi (y) = P [Hi |Y = y] = = PM −1 (6.16)
p(y) j=0 πj p(y|j)
In practice, we may settle for quantized soft decisions which convey less information than the
posterior probabilities due to tradeoffs in precision or complexity versus performance.
Example 6.1.3 (Soft decisions for 4PAM in AWGN): Consider a 4-ary hypothesis testing
problem modeled as follows:
This is a model that arises for 4PAM signaling in AWGN, as we see later. For σ 2 = 1, A = 1
and Y = −1.5, find the posterior probabilities if π0 = 0.4 and π1 = π2 = π3 = 0.2.
Solution: The posterior probability for the ith hypothesis is of the form
(y−mi )2
πi (y) = c πi e− 2σ 2
298
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where mi ∈ {±A, ±3A} is the conditional mean under Hi , and where c is a constant that does
not depend on i. Since the posterior probabilities must sum to one, we have
3
X 3
X (y−mj )2
−
πj (y) = c πj e 2σ 2 =1
j=0 j=0
The MPE hard decision in this case is δM P E (−1.5) = 1, but note that the posterior probability
for H0 is also quite high, which is information which would have been thrown away if only
hard decisions were reported. However, if the noise strength is reduced, then the hard decision
becomes more reliable. For example, for σ 2 = 0.1, we obtain
π0 (−1.5) = 9.08 × 10−5 , π1 (−1.5) = 0.9999, π2 (−1.5) = 9.36 × 10−14 , π3 (−1.5) = 3.72 × 10−44
where it is not wise to trust some of the smaller numbers. Thus, we can be quite confident about
the hard decision from the MPE rule in this case.
For binary hypothesis testing, it suffices to output one of the two posterior probabilities, since
they sum to one. However, it is often more convenient to output the log of the ratio of the
posteriors, termed the log likelihood ratio (LLR):
p(y|1) (6.17)
= log ππ10 + log p(y|0)
Notice how the information from the priors and the information from the observations, each of
which also takes the form of an LLR, add up in the overall LLR. This simple additive combining of
information is exploited in sophisticated decoding algorithms in which information from one part
of the decoder provides priors for another part of the decoder. Note that the LLR contribution
due to the priors is zero for equal priors.
Example 6.1.4 (LLRs for binary antipodal signaling): Consider H1 : Y ∼ N(A, σ 2 ) versus
H0 : Y ∼ N(−A, σ 2 ). We shall see later how this model arises for binary antipodal signaling in
AWGN. We leave it as an exercise to show that the LLR is given by
2Ay
LLR(y) =
σ2
299
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Figure 6.4: For linear modulation with no intersymbol interference, the complex symbols them-
selves provide a two-dimensional signal space representation. Three different constellations are
shown here.
Example 6.2.1 (Signal space for two-dimensional modulation): Consider a single complex-
valued symbol b = bc + jbs (assume that there is no intersymbol interference) sent using two-
dimensional passband linear modulation. The set of possible transmitted signals are given by
sbc ,bs (t) = bc p(t) cos 2πfc t − bs p(t) sin 2πfc t
300
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where (bc , bs ) takes M possible values for an M-ary constellation (e.g., M = 4 for QPSK, M = 16
for 16QAM), and where p(t) is a baseband pulse of bandwidth smaller than the carrier frequency
fc . Setting φc (t) = p(t) cos 2πfc t and φs (t) = −p(t) sin 2πfc t, we see that we can write the set of
transmitted signals as a linear combination of these signals as follows:
so that the signal space has dimension at most 2. From Chapter 2, we know that φc and φs
are orthogonal (I-Q orthogonality), and hence linearly independent. Thus, the signal space has
dimension exactly 2. Noting that ||φc ||2 = ||φs ||2 = 21 ||p||2 , the normalized versions of φc and φs
provide an orthonormal basis for the signal space:
φc (t) φs (t)
ψc (t) = , ψs (t) =
||φc || ||φs ||
We can now write
1 1
sbc ,bs (t) = √ ||p||bcψc (t) + √ ||p||bsψs (t)
2 2
With respect to this basis, the signals can be represented as two dimensional vectors:
1 bc
sbc ,bs (t) ↔ sbc ,bs = √ ||p||
2 bs
That is, up to scaling, the signal space representation for the transmitted signals are simply the
two-dimensional symbols (bc , bs )T . Indeed, while we have been careful about keeping track of
the scaling factor in this example, we shall drop it henceforth, because, as we shall soon see,
what matters in performance is the signal-to-noise ratio, rather than the absolute signal or noise
strength.
Orthogonal modulation provides another example where an orthonormal basis for the signal
space is immediately obvious. For example, if s1 , ..., sM are orthogonal signals with equal energy
si (t)
||s||2 ≡ Es , then ψi (t) = √ Es
provide an orthonormal basis for the signal space, and the vector
√
representation of the ith signal is the scaled unit vector Es (0, ..., 0, 1( in ith position), 0, ..., 0)T .
Yes another example where an orthonormal basis can be determined by inspection is shown in
Figures 6.5 and 6.6, and discussed in Example 6.2.2.
s 0 (t) s 1(t)
1
1
t t
0 3 1 3
−1
s2 (t)
s 3 (t)
2
t t
0 1 3 0 1 2 3
−1
301
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
ψ (t) ψ1(t) ψ (t)
0 2
1 1 1
t t
0 1 t 0 1 2 2 3
Figure 6.6: An orthonormal basis for the signal set in Figure 6.5, obtained by inspection.
Example 6.2.2 (Developing a signal space representation for a 4-ary signal set): Con-
sider the example depicted in Figure 6.5, where there are 4 possible transmitted signals, s0 , ..., s3 .
It is clear from inspection that these span a three-dimensional signal space, with a convenient
choice of basis signals
ψ0 (t) = I[0,1] (t), ψ1 (t) = I[1,2] (t), ψ2 (t) = I[2,3] (t)
as shown in Figure 6.6. Let si = (si [1], si [2], si [3])T denote the vector representation of the signal
si with respect to the basis, for i = 0, 1, 2, 3. That is, the coefficients of the vector si are such
that
X 2
si (t) = si [k]ψk (t)
k=0
Now that we have seen some examples, it is time to be more precise about what we mean
by the “signal space.” The signal space S is the finite-dimensional subspace (of dimension
n ≤ M) spanned by s0 (t), ..., sM −1 (t). That is, S consists of all signals of the form a0 s0 (t) +
... + aM −1 sM −1 (t), where a0 , ..., aM −1 are arbitrary scalars. Let ψ0 (t), ..., ψn−1 (t) denote an or-
thonormal basis for S. We have seen in the preceding examples that such a basis can often be
determined by inspection. In general, however, given an arbitrary set of signals, we can always
construct an orthonormal basis using the Gram-Schmidt procedure described below. We do not
need to use this procedure often–in most settings of interest, the way to go from continuous to
discrete time is clear–but state it below for completeness.
Gram-Schmidt orthogonalization: The idea is to build up an orthonormal basis step by
step, with the basis after the mth step spanning the first m signals. The first basis function is
a scaled version of the first signal (assuming this is nonzero–otherwise we proceed to the second
signal without adding a basis function). We then consider the component of the second signal
orthogonal to the first basis function. This projection is nonzero if the second signal is linearly
independent of the first; in this case, we introduce a basis function that is a scaled version of
the projection. See Figure 6.7. This procedure goes on until we have covered all M signals. The
number of basis functions n equals the dimension of the signal space, and satisfies n ≤ M. We
can summarize the procedure as follows.
Letting Sk−1 denote the subspace spanned by s0 , ..., sk−1 , the Gram-Schmidt algorithm proceeds
iteratively: given an orthonormal basis for Sk−1 , it finds an orthonormal basis for Sk . The
procedure stops when k = M. The method is identical to that used for finite-dimensional
vectors, except that the definition of the inner product involves an integral, rather than a sum,
for the continuous-time signals considered here.
302
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
φ 1 (t)
ψ (t) =
1
|| φ 1 ||
φ (t) s 1(t)
1
s 0(t) = φ 0 (t)
φ (t)
ψ (t) = 0
0
|| φ 0 ||
The signal φk (t) is the component of sk (t) orthogonal to the subspace Sk−1 . If φk 6= 0, define
a new basis function ψm (t) = φ||φk (t)
k ||
, and update the basis as Bk = {ψ1 , ..., ψm , ψm }. If φk = 0,
then sk ∈ Sk−1 , and it is not necessary to update the basis; in this case, we set Bk = Bk−1 =
{ψ0 , ..., ψm−1 }.
The procedure terminates at step M, which yields a basis B = {ψ0 , ..., ψn−1 } for the signal space
S = SM −1 . The basis is not unique, and may depend (and typically does depend) on the order in
which we go through the signals in the set. We use the Gram-Schmidt procedure here mainly as
a conceptual tool, in assuring us that there is indeed a finite-dimensional vector representation
for a finite set of continuous-time signals.
ψ (t) ψ (t) ψ (t)
0 1 2
A B C
t t
0 3 t 0 1 3 1 2 3
−C
−2B
Figure 6.8: An orthonormal basis for the signal set in Figure 6.5, obtained by applying the
Gram-Schmidt procedure. The unknowns A, B, and C are to be determined in Exercise 6.2.1.
303
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
in the missing numbers. While the basis thus obtained is not as “nice” as the one obtained by
inspection in Figure 6.6, the Gram-Schmidt procedure has the advantage of general applicability.
Inner products are preserved: We shall soon see that the performance of M-ary signaling
in AWGN depends only on the inner products between the signals, if the noise PSD is fixed.
Thus, an important observation when mapping the continuous time hypothesis testing problem
to discrete time is to check that these inner products are preserved when projecting onto the
signal space. Consider the continuous time inner products
Z
hsi , sj i = si (t)sj (t)dt , i, j = 0, 1, ..., M − 1 (6.19)
where the extreme right-hand side is the inner product of the signal vectors si = (si [0], ..., si [n −
1])T and sj = (sj [0], ..., sj [n − 1])T . This makes sense: the geometric relationship between signals
(which is what the inner products capture) should not depend on the basis with respect to which
they are expressed.
304
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Then we can write the noise n(t) as follows:
n−1
X
n(t) = Ni ψi (t) + n⊥ (t)
i=0
where n⊥ (t) is the projection of the noise orthogonal to the signal space. Thus, we can decom-
pose the noise into two parts: a noise vector N = (N0 , ..., Nn−1 )T corresponding to the projection
onto the signal space, and a component n⊥ (t) orthogonal to the signal space. In order to charac-
terize the statistics of these quantities, we need to consider random variables obtained by linear
processing of WGN. Specifically, consider random variables generated by passing WGN through
correlators: Z ∞
Z1 = n(t)u1 (t)dt = hn, u1i
−∞
Z ∞
Z2 = n(t)u2 (t)dt = hn, u2i
−∞
where u1 and u2 are deterministic, finite energy signals. We can now state the following result.
Theorem 6.2.1 (WGN through correlators): The random variables Z1 = hn, u1 i and Z2 =
hn, u2 i are zero mean, jointly Gaussian, with
Proof of Theorem 6.2.1: The random variables Z1 = hn, u1 i and Z2 = hn, u2 i are zero mean
and jointly Gaussian, since n is zero mean and Gaussian. Their covariance is computed as
R R
cov (hn, u 1 i, hn, u 2 i) = E [hn, u 1 ihn, u 2 i] = E n(t)u 1 (t)dt n(s)u 2 (s)ds
u1 (t)u2 (s)σ 2 δ(t − s)dt ds
RR RR
= R u1(t)u2 (s)E[n(t)n(s)]dt ds =
= σ 2 u1 (t)u2 (t)dt = σ 2 hu1 , u2 i
The preceding computation is entirely analogous to the ones we did in Example 5.8.2 and in
Section 5.10, but it is important enough that we repeat some points that we had mentioned
then. First, we need to use two different variables of integration, t and s, in order to make sure
we capture all the cross terms. Second, when we take the expectation inside the integrals, we
must group all random terms inside it. Third, the two integrals collapse into one because the
autocorrelation function of WGN is impulsive. Finally, specializing the covariance to get the
variance leads to the remaining results stated in the theorem.
We can now provide the following geometric interpretation of WGN.
Remark 6.2.1 (Geometric interpretation of WGN): Theorem 6.2.1 implies that the pro-
jection of WGN along any “direction” in the space of signals (i.e., the result of correlating WGN
with a unit energy signal) has variance σ 2 = N0 /2. Also, its projections in orthogonal directions
are jointly Gaussian and uncorrelated random variables, and are therefore independent.
305
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Noise projection on the signal space is discrete time WGN: It follows from the preceding
remark that the noise projections Ni = hn, ψi i along the orthonormal basis functions {ψi } for
the signal space are i.i.d. N(0, σ 2 ) random variables. In other words, the noise vector N =
(N0 , ..., Nn−1 )T ∼ N (0, σ 2 I). In other word, the components of N constitute discrete time white
Gaussian noise (“white” in this case means uncorrelated and having equal variance across all
components).
s
0
s1
Figure 6.9: Illustration of signal space concepts. The noise projection n⊥ (t) orthogonal to the
signal space is irrelevant. The relevant part of the received signal is the projection onto the signal
space, which equals the vector Y = si + N under hypothesis Hi .
Now that we have the signal and noise models, we can put them together in our hypothesis
testing framework. Let us condition on hypothesis Hi . The received signal is given by
y(t) = si (t) + n(t) (6.22)
Projecting this onto the signal space by correlating against the orthonormal basis functions, we
get
Y [k] = hy, ψk i = hsi + n, ψk i = si [k] + N[k] , k = 0, 1., , , .n − 1
Collecting these into an n-dimensional vector, we get the model
Hi : Y = si + N
Note that the vector Y = (y[1], ..., y[n])T completely describes the component of the received
signal y(t) in the signal space, given by
n−1
X n−1
X
yS (t) = hy, ψj iψj (t) = Y [j]ψj (t)
j=0 j=0
The component of the received signal orthogonal to the signal space is given by
y ⊥ (t) = y(t) − yS (t)
It is shown in Appendix 6.A that this component is irrelevant to our decision. There are two
reasons for this, as elaborated in the appendix: first, there is no signal contribution orthogonal
306
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
to the signal space (by definition); second, for the WGN model, the noise component orthogonal
to the signal space carries no information regarding the noise vector in the signal space. As illus-
trated in Figure 6.9, this enables us to reduce our infinite-dimensional problem to the following
finite-dimensional vector model, without loss of optimality.
Model for received vector in signal space
Hi : Y = si + N , i = 0, 1, ..., M − 1 (6.23)
Ns N
Y
s1 s0
Nc
s3 s2
Figure 6.10: A signal space view of QPSK. In the scenario shown, s0 is the transmitted vector,
and Y = s0 + N is the received vector after noise is added. The noise components Nc , Ns are
i.i.d. N(0, σ 2 ) random variables.
Two-dimensional modulation (Example 6.2.1 revisited): For a single symbol sent using
two-dimensional modulation, we have the hypotheses
where
sbc ,bs (t) = bc p(t) cos 2πfc t − bs p(t) sin 2πfc t
Restricting attention to the two-dimensional signal space identified in the example, we obtain
the model
Yc bc Nc
Hbc ,bs : Y = = +
Ys bs Ns
where we have absorbed scale factors into the symbol (bc , bs ), and where the I and Q noise compo-
nents Nc , Ns are i.i.d. N(0, σ 2 ). This is illustrated for QPSK in Figure 6.10. Thus, conditioned
on Hbc ,bs , Yc ∼ N(bc , σ 2 ) and Ys ∼ N(bs , σ 2 ), and Yc , Ys are conditionally independent. The
conditional density of Y = (Yc , Ys )T conditioned on Hbc ,bs is therefore given by
307
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
We can now infer the ML and MPE rules using our hypothesis testing framework. However, since
the same reasoning applies to signal spaces of arbitrary dimensions, we provide a more general
discussion in the next section, and then return to examples of two-dimensional modulation.
Hi : Y = si + N i = 0, 1, ..., M − 1 (6.24)
where N ∼ N(0, σ 2 I) is discrete time WGN. The ML and MPE rules for this problem are given
as follows. As usual,P −1we denote the prior probabilities required to specify the MPE rule by
{πi , i = 1, .., M} ( M
i=0 πi = 1).
ML rule
δM L (y) = arg min0≤i≤M −1 ||y − si ||2
2 (6.25)
= arg max0≤i≤M −1 hy, si i − ||s2i||
MPE rule
δM P E (y) = arg min0≤i≤M −1 ||y − si ||2 − 2σ 2 log πi
2 (6.26)
= arg max0≤i≤M −1 hy, si i − ||s2i || + σ 2 log πi
Interpretation of optimal decision rules: The ML rule can be interpreted in two ways.
The first is as a minimum distance rule, choosing the transmitted signal which has minimum
Euclidean distance to the noisy received signal. The second is as a “template matcher”: choosing
the transmitted signal with highest correlation with the noisy received signal, while adjusting
for the fact that the energies of different transmitted signals may be different. The MPE rule
adjusts the ML cost function to reflect prior information: the adjustment term depends on the
noise level and the prior probabilities. The MPE cost functions decompose neatly into a sum of
the ML cost function (which depends on the observation) and a term reflecting prior knowledge
(which depends on the prior probabilities and the noise level). The latter term scales with the
noise variance σ 2 . Thus, we rely more on the observation at high SNR (small σ), and more on
prior knowledge at low SNR (large σ).
Derivation of optimal receiver structures (6.25) and (6.26): Under hypothesis Hi , Y is
a Gaussian random vector with mean si and covariance matrix σ 2 I (the translation of the noise
vector N by the deterministic signal vector si does not change the covariance matrix), so that
1 ||y − si ||2
pY|i (y|Hi ) = exp(− ) (6.27)
(2πσ 2 )n/2 2σ 2
Plugging (6.27) into the ML rule (6.5, we obtain the rule (6.25) upon simplification. Similarly,
we obtain (6.26) by substituting (6.27) in the MPE rule (6.6).
We now map the optimal decision rules in discrete time back to continuous time to obtain optimal
detectors for the original continuous-time model (6.18), as follows.
308
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
ML rule
||si ||2
δM L (y) = arg max0≤i≤M −1 hy, si i − (6.28)
2
MPE rule
||si ||2
δM P E (y) = arg max0≤i≤M −1 hy, si i − + σ 2 log πi (6.29)
2
Derivation of optimal receiver structures (6.28) and (6.29): Due to the irrelevance of y ⊥ ,
the continuous time model (6.18) reduces to the discrete time model (6.24) by projecting onto
the signal space. It remains to map the optimal decision rules (6.25) and (6.26) for discrete time
observations, back to continuous time. These rules involve correlation between the received and
transmitted signals, and the transmitted signal energies. It suffices to show that these quantities
are the same for both the continuous time model and the equivalent discrete time model. We
know now that signal inner products are preserved, so that
||si ||2 = ||si ||2
Further, the continuous-time correlator output can be written as
hy, si i = hyS + y ⊥ , si i = hyS , si i + hy ⊥, si i
= hyS , si i = hy, si i
where the last equality follows because the inner product between the signals yS and si (which
both lie in the signal space) is the same as the inner product between their vector representations.
Why don’t we have a “minimum distance” rule in continuous time? Notice that the
optimal decision rules for the continuous time model do not contain the continuous time version
of the minimum distance rule for discrete time. This is because of a technical subtlety. In
continuous time, the squares of the distances would be
||y − si ||2 = ||yS − si ||2 + ||y ⊥||2 = ||yS − si ||2 + ||n⊥ ||2
Under the AWGN model, the noise power orthogonal to the signal space is infinite, hence from
a purely mathematical point of view, the preceding quantities are infinite for each i (so that we
cannot minimize over i). Hence, it only makes sense to talk about the minimum distance rule
in a finite-dimensional space in which the noise power is finite. The correlator based form of
the optimal detector, on the other hand, automatically achieves the projection onto the finite-
dimensional signal space, and hence does not suffer from this technical difficulty. Of course, in
practice, even the continuous time received signal may be limited to a finite-dimensional space by
filtering and time-limiting, but correlator-based detection still has the practical advantage that
only components of the received signal which are truly useful appear in the decision statistics.
Bank of Correlators or Matched Filters: The optimal receiver involves computation of the
decision statistics Z
hy, si i = y(t)si (t)dt
and can therefore be implemented using a bank of correlators, as shown in Figure 6.11. Of
course, any correlation operation can also be implemented using a matched filter, sampled at the
appropriate time. Defining si,mf (t) = si (−t) as the impulse response of the filter matched to si ,
we have Z Z
hy, si i = y(t)si (t)dt = y(t)si,mf (−t)dt = (y ∗ si,mf ) (0)
309
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
−
s 0 (t) a0
Choose
−
y(t) the Decision
s (t) a1
1
max
−
s (t) a M−1
M−1
Figure 6.11: The optimal receiver for an AWGN channel can be implemented using a bank of
correlators. For the ML rule, the constants ai = ||si ||2/2; for the MPE rule, ai = ||si ||2 /2 −
σ 2 log πi .
t=0
s 0 (−t)
−
a0
t=0
s (−t) Choose
1
−
y(t) the Decision
a1
max
t=0
s (−t)
M−1
−
a M−1
Figure 6.12: An alternative implementation for the optimal receiver using a bank of matched
filters. For the ML rule, the constants ai = ||si ||2 /2; for the MPE rule, ai = ||si ||2 /2 − σ 2 log πi .
310
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Figure 6.12 shows an alternative implementation for the optimal receiver using a bank of matched
filters.
Decision
yp (t)
statistic
s (t)
p
yc (t)
LPF
−2 sin 2π fc t s s (t)
Figure 6.13: The passband correlations required by the optimal receiver can be implemented in
complex baseband. Since the I and Q components are lowpass waveforms, correlation with them
is an implicit form of lowpass filtering. Thus, the LPFs after the mixers could potentially be
eliminated, which is why they are shown within dashed boxes.
311
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
sj
si
ML decision boundary
is an (n−1) dimensional hyperplane
Figure 6.14: The ML decision boundary when testing between si and sj is the perpendicular
bisector of the line joining the signal points, which is an (n − 1)-dimensional hyperplane for an
n-dimensional signal space.
can visualize a plane containing the decision boundary coming out of the paper for a three-
dimensional signal space. While it is hard to visualize signal spaces of more than 3 dimensions,
the computation for deciding which side of the ML decision boundary the received vector y lies
on is straightforward: simply compare the Euclidean distances ||y − si || and ||y − sj ||.
s5
L15
L12
Γ1 L16
s2
s1 s6
L 14 s4
L 13
s3
The ML decision regions are constructed from drawing these pairwise decision regions. For any
given i, draw a line between si and sj for all j 6= i. The perpendicular bisector of the line between
si and sj defines two half-spaces (half-planes for n = 2), one in which we choose si over sj , the
other in which we choose sj over si . The intersection of the half-spaces in which si is chosen over
sj , for j 6= i, defines the decision region Γi . This procedure is illustrated for a two-dimensional
signal space in Figure 6.15. The line L1i is the perpendicular bisector of the line between s1 and
si . The intersection of these lines defines Γ1 as shown. Note that L16 plays no role in determining
Γ1 , since signal s6 is “too far” from s1 , in the following sense: if the received signal is closer to s6
than to s1 , then it is also closer to si than to s1 for some i = 2, 3, 4, 5. This kind of observation
plays an important role in the performance analysis of ML reception in Section 6.3.
312
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
QPSK 8PSK
16QAM
The preceding procedure can now be applied to the simpler scenario of two-dimensional constel-
lations to obtain ML decision regions as shown in Figure 6.16. For QPSK, the ML regions are
simply the four quadrants. For 8PSK, the ML regions are sectors of a circle. For 16QAM, the
ML regions take a rectangular form.
Now, let us apply the same reasoning to the decision boundary corresponding to making an
ML decision between two signals s0 and s1 , as shown in Figure 6.18. Suppose that s0 is sent.
313
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Decision
boundary
N
Npar
Nperp
s D
Figure 6.17: Only the component of noise perpendicular to the decision boundary, Nperp , can
cause the received vector to cross the decision boundary, starting from the signal point s.
s1
Decision
boundary
N
Npar
Nperp
D
s0
d=|| s 1 − s 0 ||
Figure 6.18: When making an ML decision between s0 and s1 , the decision boundary is at
distance D = d/2 from each signal point, where d = ||s1 − s0 || is the Euclidean distance between
the two points.
314
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
What is the probability that the noise vector N, when added to it, sends the received vector into
the wrong region by crossing the decision boundary? We know from (6.31) that the answer is
Q(D/σ), where D is the distance between s0 and the decision boundary. For ML reception, the
decision boundary is the plane that is the perpendicular bisector of the line between s0 and s1 ,
whose length equals d = ||s1 − s0 ||, the Euclidean distance between the two signal vectors. Thus,
D = d/2 = ||s1 − s0 ||/2. Thus, the probability of crossing the ML decision boundary between
the two signal vectors (starting from either of the two signal points) is
||s1 − s0 || ||s1 − s0 ||
P [cross ML boundary between s0 and s1 ] = Q =Q (6.32)
2σ 2σ
where we note that the Euclidean distance between the signal vectors and the corresponding
continuous time signals is the same.
Notation: Now that we have established the equivalence between working with continuous time
signals and the vectors that represent their projections onto signal space, we no longer need to
be careful about distinguishing between them. Accordingly, we drop the use of boldface notation
henceforth, using the notation y, si and n to denote the received signal, the transmitted signal,
and the noise, respectively, in both settings.
Geometric computation of error probability: The ML decision boundary for this problem
is as in Figure 6.18. The conditional error probability is simply the probability that, starting from
one of the signal points, the noise makes us cross the boundary to the wrong side, the probability
of which we have already computed in (6.32). Since the conditional error probabilities are equal,
they also equal the average error probability regardless of the priors. We therefore obtain the
following expression.
Error probability for binary signaling with ML reception
||s1 − s0 || d
Pe,M L = Pe|1 = Pe|0 = Q =Q (6.34)
2σ 2σ
where d = ||s1 − s0 || is the distance between the two possible received signals.
Algebraic computation: While this geometric computation is intuitively pleasing, it is impor-
tant to also master algebraic approaches to computing the probabilities of errors due to WGN.
It is easiest to first consider on-off keying.
H1
> ||s||2
hy, si (6.36)
< 2
H0
315
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Setting Z = hy, si, we wish to compute the conditional error probabilities given by
||s||2 ||s||2
Pe|1 = P [Z < |H1 ] Pe|0 = P [Z > |H0] (6.37)
2 2
We have actually already done these computations in Example 5.8.2, but it pays to review them
quickly. Note that, conditioned on either hypothesis, Z is a Gaussian random variable. The
conditional mean and variance of Z under H0 are given by
where we have used Theorem 6.2.1, and the fact that n(t) has zero mean. The corresponding
computation under H1 is as follows:
noting that covariances do not change upon adding constants. Thus, Z ∼ N(0, v 2 ) under H0 and
Z ∼ N(m, v 2 ) under H1 , where m = ||s||2 and v 2 = σ 2 ||s||2. Substituting in (6.37), it is easy to
check that
||s||
Pe|1 = Pe|0 = Q (6.38)
2σ
Going back to the more general binary signaling problem (6.33), the ML rule is given by (6.28)
to be
H1
||s1 ||2 > ||s0 ||2
hy, s1i − hy, s0i −
2 < 2
H0
We can analyze this system by considering the joint distribution of the correlator statistics hy, s1i
and hy, s0i, which are jointly Gaussian conditioned on each hypothesis. However, it is simpler
and more illuminating to rewrite the ML decision rule as
H1
> ||s1||2 ||s0 ||2
hy, s1 − s0 i −
< 2 2
H0
This is consistent with the geometry depicted in Figure 6.18: only the projection of the received
signal along the line joining the signals matters in the decision, and hence only the noise along
this direction can produce errors. The analysis now involves the conditional distributions of the
single decision statistic Z = hy, s1 − s0 i, which is conditionally Gaussian under either hypothesis.
The computation of the conditional error probabilties is left as an exercise, but we already know
that the answer should work out to (6.34).
A quicker approach is to consider a transformed system with received signal ỹ(t) = y(t) − s0 (t).
Since this transformation is invertible, the performance of an optimal rule is unchanged under
it. But the transformed received signal ỹ(t) falls under the on-off signaling model (6.35), with
s(t) = s1 (t) − s0 (t). The ML error probability formula (6.34) therefore follows from the formula
(6.38).
Scale Invariance: The formula (6.34) illustrates that the performance of the ML rule is scale-
invariant: if we scale the signals and noise by the same factor α, the performance does not
316
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
change, since both ||s1 − s0 || and σ scale by α. Thus, the performance is determined by the ratio
of signal and noise strengths, rather than individually on the signal and noise strengths. We now
define some standard measures for these quantities, and then express the performance of some
common binary signaling schemes in terms of them.
Energy per bit, Eb : For binary signaling, this is given by
1
Eb = (||s0||2 + ||s1 ||2 )
2
assuming that 0 and 1 are equally likely to be sent.
Scale-invariant parameters: If we scale up both s1 and s0 by a factor A, Eb scales up by a
factor A2 , while the distance d scales up by a factor A. We can therefore define the scale-invariant
parameter
d2
ηP = (6.39)
Eb
√ p
Now, substituting, d = ηP Eb and σ = N0 /2 into (6.34), we obtain that the ML performance
is given by s r
r ! !
ηP Eb d 2 Eb
Pe,M L = Q =Q (6.40)
2N0 Eb 2N0
1 s1
s0 s1 s1 s0 0 s0
0 1 −1 0 1 0 1
Figure 6.19: Signal space representations with conveniently chosen scaling for three binary sig-
naling schemes.
On-off keying: Here s1 (t) = s(t) and s0 (t) = 0. As shown in Figure 6.19, the signal space is
one-dimensional. For the scaling in the figure, we have d = 1 and Eb = 21 (12 + 02 ) = 12 , so that
2
q
Eb
ηP = Ed b = 2. Substituting into (6.40), we obtain Pe,M L = Q N0
.
317
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Antipodal signaling: Here s1 (t) = −s0 (t), leading again to a one-dimensional signal space
representation. One possible realization of antipodal signaling is BPSK, discussed in the previous
2
chapter. For the scaling chosen, d = 2 and Eb = 21 (12 + (−1)2 ) = 1, which gives ηP = Ed b = 4.
q
2Eb
Substituting into (6.40), we obtain Pe,M L = Q N0
.
Equal-energy, orthogonal signaling: Here s1 and s0 are orthogonal, with ||s1 ||2 = ||s0 ||2 .
This is a two-dimensional signal space. As discussed in the previous chapter, possible realizations
of orthogonal signaling include FSK and Walsh-Hadamard codes. q From Figure 6.19, we have
√ d2 Eb
d = 2 and Eb = 1, so that ηP = Eb = 2. This gives Pe,M L = Q N0
.
Thus, on-off keying (which is orthogonal signaling with unequal energies) and equal-energy or-
thogonal signaling have the same power efficiency, while the power efficiency of antipodal signaling
is a factor of two (i.e., 3 dB) better.
In plots of error probability versus SNR, we typically express error probability on a log scale (in
order to capture its rapid decay with SNR) and to express SNR in decibels (in order to span a
large range). We provide such a plot for antipodal and orthogonal signaling in Figure 6.20.
0
10
−1
10
−2
10
Probability of error (log scale)
(Orthogonal)
−3
10 FSK/OOK
(Antipodal)
−4
10 BPSK
−5
10
−6
10
−7
10
−8
10
0 2 4 6 8 10 12 14 16 18 20
E /N (dB)
b o
Figure 6.20: Error probability versus Eb /N0 (dB) for binary antipodal and orthogonal signaling
schemes.
318
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Before doing detailed computations, let us discuss some general properties that greatly simplify
the framework for performance analysis.
Scale Invariance: For binary signaling, we have observed through explicit computation of
the error probability that performance depends only on signal-to-noise ratio (Eb /N0 ) and the
geometry of the signal set (which determines the power efficiency d2 /Eb ). Actually, we can make
such statements in great generality for M-ary signaling without explicit computations. First, let
us note that the performance of an optimal receiver does not change if we scale both signal and
noise by the same factor. Specifically, optimal reception for the model
does not depend on A. This is inferred from the following general observation: the performance
of an optimal receiver is unchanged when we pass the observation through an invertible transfor-
mation. Specifically, suppose z(t) = F (y(t)) is obtained by passing y(t) through an invertible
transformation F . If the optimal receiver for z does better than the optimal receiver for y, then
we could apply F to y to get z, then do optimal reception for z. This would perform better
than the optimal receiver for y, which is a contradiction. Similarly, if the optimal receiver for y
does better than the optimal receiver for z, then we could apply F −1 to z to get y, and then do
optimal reception for y to perform better than the optimal receiver for z, again a contradiction.
if the optimal receiver for y does better than the optimal receiver for f (y).
The preceding argument implies that performance depends only on the signal-to-noise ratio,
once we have fixed the signal constellation. Let us now figure out what properties of the signal
constellation are relevant in determining performance For M = 2, we have seen that all that
matters is the scale-invariant quantity d2 /Eb . What are the analogous quantities for M > 2? To
determine these, let us consider the conditional error probabilities for the ML rule.
Conditional error probability: The conditional error probability, conditioned on Hi , is given
by
Pe|i = P [y ∈
/ Γi |i sent] = P [Zi < Zj for some j 6= i|i sent] (6.43)
While computation of the conditional error probability in closed form is typically not feasible,
we can actually get significant insight on what parameters it depends on by examining the
conditional distributions of the decision statistics. Since y = si + n conditioned on Hi , the
decision statistics are given by
Zj = hy, sj i − ||sj ||2 /2 = hsi + n, sj i − ||sj ||2/2 = hn, sj i + hsi , sj i − ||sj ||2/2 , 0 ≤ j ≤ M − 1
By the Gaussianity of n(t), the decision statistics {Zj } are jointly Gaussian (conditioned on Hi ).
Their joint distribution is therefore completely characterized by their means and covariances.
Since the noise is zero mean, we obtain
Using Theorem 6.2.1, and noting that covariance is unaffected by translation, we obtain that
Thus, conditioned on Hi , the joint distribution of {Zj } depends only on the noise variance σ 2
and the signal inner products {hsi , sj i, 1 ≤ i, j ≤ M}. Now that we know the joint distribution,
we can in principle compute the conditional error probabilities Pe|i . In practice, this is often
difficult, and we often resort to Monte Carlo simulations. However, what we have found out
about the joint distribution can now be used to refine our concepts of scale-invariance.
Performance only depends on normalized inner products: Let us replace Zj by Zj /σ 2 .
Clearly, since we are simply picking the maximum among the decision statistics, scaling by a
319
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
common factor does not change the decision (and hence the performance). However, we now
obtain that
Zj hsi , sj i
E[ 2 |Hi ] =
σ σ2
and
Zj Zk 1 hsj , sk i
cov 2
, 2 |Hi = 4 cov(Zj , Zk |Hi ) =
σ σ σ σ2
Thus, the joint distribution of the normalized decision statistics {Zj /σ 2 }, conditioned on any
hs ,s i
of the hypotheses, depends only on the normalized inner products { σi 2 j , 1 ≤ i, j ≤ M}. Of
course, this means that the performance also depends only on these normalized inner products.
Let us now carry these arguments further, still without any explicit computations. We define
energy per symbol and energy per bit for M-ary signaling as follows.
Energy per symbol, Es : For M-ary signaling with equal priors, the energy per symbol Es is
given by
M
1 X
Es = ||si ||2
M i=1
Energy per bit, Eb : Since M-ary signaling conveys log2 M bits/symbol, the energy per bit is
given by
Es
Eb =
log2 M
If all signals in a M-ary constellation are scaled up by a factor A, then Es and Eb get scaled
up by A2 , as do all inner products {hsi , sj i}. Thus, we can define scale-invariant inner products
hs ,s i}
{ iEbj which depend only on the shape of the signal constellation. Indeed, we can define the
shape of a constellation as these scale-invariant inner products. Setting σ 2 = N0 /2, we can now
write the normalized inner products determining performance as follows:
hsi , sj i hsi , sj i 2Eb
2
= (6.44)
σ Eb N0
We can now make the following statement.
Performance depends only on Eb /N0 and constellation shape (as specified by the
scale-invariant inner products): We have shown that the performance depends only on the
hs ,s i
normalized inner products { iσ2 j }. Fromn(6.44),
o we see that these in turn depend only on Eb /N0
hs ,s i
i j
and the scale-invariant inner products Eb
. The latter depend only on the shape of the
signal constellation, and are completely independent of the signal and noise strengths. What
this means is that we can choose any convenient scaling that we want for the signal constellation
when investigating its performance, as long as we keep track of the signal-to-noise ratio. We
illustrate this via an example where we determine the error probability by simulation.
320
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Typically, Eb /N0 is specified in dB, so we need to convert it to the “raw” Eb /N0 . We now have
a simulation consisting of the following steps, repeated over multiple symbol transmissions:
Step 1: Choose a symbol s at random from A. For this symmetric constellation, we can actually
keep sending the same symbol in order to compute the performance of the ML rule, since the
conditional error probabilities are all equal. For example, set s = (1, 0)T .
Step 2: Generate two i.i.d. N(0, 1) random variables Uc and Us . The I and Q noises can now be
set as Nc = σUc and Ns = σUs , so that N = (Nc , Ns )T .
Step 3: Set the received vector y = s + N.
Step 4: Compute the ML decision arg maxi hy, si i (the energy terms can be dropped, since the
signals are of equal energy) or arg mini ||y − si ||2 .
Step 5: If there is an error, increment the error count.
The error probability is estimated as the error count, divided by the number of symbols trans-
mitted. We repeat this simulation over a range of Eb /N0 , and typically plot the error probability
on a log scale versus Eb /N0 in dB.
These steps are carried out in the following code fragment, which generates Figure 6.21 comparing
a simulation-based estimate of the error probability for 8PSK against the intelligent union bound,
an analytical estimate that we develop shortly. The analytical estimate requires very little
computation (evaluation of a single Q function), but its agreement with simulations is excellent.
As we shall see, developing such analytical estimates also gives us insight into how errors are
most likely to occur for M-ary signaling in AWGN.
The code fragment is written for transparency rather than computational efficiency. The code
contains an outer for-loop for varying SNR, and an inner for-loop for computing minimum dis-
tances for the symbols sent at each SNR. The inner loop can be avoided and the program sped up
considerably by computing all minimum distances for all symbols at once using matrix operations
(try it!). We use a less efficient program here to make the operations easy to understand.
0
10
Simulation
Intelligent Union Bound
Symbol error probability
−1
10
−2
10
−3
10
0 1 2 3 4 5 6 7 8 9 10
Eb/N0 (dB)
321
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
ebnodb = 0:0.1:10;
number_snrs = length(ebnodb);
perr_estimate = zeros(number_snrs,1);
for k=1:number_snrs, %SNR for loop
ebnodb_now = ebnodb(k);
ebno=10^(ebnodb_now/10);
sigma=sqrt(1/(6*ebno));
%send first symbol without loss of generality, add 2d Gaussian noise
received = 1 + sigma*randn(nsymbols,1)+j*sigma*randn(nsymbols,1);
decisions=zeros(nsymbols,1);
for n=1:nsymbols, %Symbol for loop (can/should be avoided for fast implementation)
distances = abs(received(n)-constellation);
[min_dist,decisions(n)] = min(distances);
end
errors = (decisions ~= 1);
perr_estimate(k) = sum(errors)/nsymbols;
end
semilogy(ebnodb,perr_estimate);
hold on;
%COMPARE WITH INTELLIGENT UNION BOUND
etaP = 6-3*sqrt(2); %power efficiency
Ndmin = 2;% number of nearest neighbors
ebno = 10.^(ebnodb/10);
perr_union = Ndmin*q_function(sqrt(etaP*ebno/2));
semilogy(ebnodb,perr_union,’:r’);
xlabel(’Eb/N0 (dB)’);
ylabel(’Symbol error probability’);
legend(’Simulation’,’Intelligent Union Bound’,’Location’,’NorthEast’);
Ns
s1 s0
Nc
d
s3 s2
Figure 6.22: If s0 is sent, an error occurs if Nc or Ns is negative enough to make the received
vector fall out of the first quadrant.
Exact analysis for QPSK: Let us find Pe|0 , the conditional error probability for the ML rule
322
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
conditioned on s0 being sent. For the scaling shown in Figure 6.22,
d
s0 = 2
d
2
323
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
s1
s4
s2
N2 N1
s0
N3
Γ
0
s3
Figure 6.23: The noise random variables N1 , N2 , N3 which can drive the received vector outside
the decision region Γ0 are correlated, which makes it difficult to find an exact expression for Pe|0
.
Applying (6.49) to (6.48), we obtain that, for the scenario depicted in Figure 6.23, the conditional
error probability can be upper bounded as follows:
Pe|0 ≤ P [N1 > ||s1 − s0||/2] + P[N2 >||s2 − s 0 ||/2] + P [N3 > ||s3 − s0 ||/2]
(6.50)
= Q ||s12σ−s0 ||
+ Q ||s22σ−s0 ||
+ Q ||s32σ
−s0 ||
324
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Thus, the conditional error probability is upper bounded by a sum of probabilities, each of which
corresponds to the error probability for a binary decision: s0 versus s1 , s0 versus s2 , and s0 versus
s3 . This approach applies in great generality, as we show next.
Union Bound and variants: Pictures such as the one in Figure 6.23 typically cannot be
drawn when the signal space dimension is high. However, we can still find union bounds on error
probabilities, as long as we can enumerate all the signals in the constellation. To do this, let us
rewrite (6.43), the conditional error probability, conditioned on Hi , as a union of M − 1 events
as follows:
Pe|i = P [∪j6=i {Zi < Zj }|i sent]]
where {Zj } are the decision statistics. Using the union bound (6.49), we obtain
X
Pe|i ≤ P [Zi < Zj |i sent]] (6.51)
j6=i
But the jth term on the right-hand side above is simply the error probability of ML reception
for binary hypothesis testing between the signals si and sj . From the results of Section 6.3.2, we
therefore obtain the following pairwise error probability:
||sj − si ||
P [Zi < Zj |i sent]] = Q
2σ
Substituting into (6.51), we obtain upper bounds on the conditional error probabilities and the
average error probability as follows.
Union Bound on conditional error probabilities: The conditional error proba-
bilities for the ML rule are bounded as
X ||sj − si || X dij
Pe|i ≤ Q = Q (6.52)
j6=i
2σ j6=i
2σ
We can now rewrite the union bound in terms of Eb /N0 and the scale-invariant squared
d2ij
distances Eb
as follows:
s
d2ij
r
X Eb
Pe|i ≤ Q (6.54)
j6=i
Eb 2N0
s
d2ij
r
X X X Eb
Pe = πi Pe|i ≤ πi Q (6.55)
i i j6=i
Eb 2N0
325
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Notice that this answer is different from the one we had in (6.50). This is because the fourth
term corresponds to the signal s4 , which is “too far away” from s0 to play a role in determining
the decision region Γ0 . Thus, when we do have a more detailed geometric understanding of the
decision regions, we can do better than the generic union bound (6.52) and get a tighter bound,
as in (6.50). We term this the intelligent union bound, and give a general formulation in the
following.
Denote by Nml (i) the indices of the set of neighbors of signal si (we exclude i from Nml (i) by
definition) that characterize the ML decision region Γi . That is, the half-planes that we intersect
to obtain Γi correspond to the perpendicular bisectors of lines joining si and sj , j ∈ Nml (i). For
example, in Figure 6.23, Nml (0) = {1, 2, 3}; s4 is excluded from this set, since it does not play a
role in determining Γ0 . The decision region in (6.41) can now be expressed as
We can now say the following: y falls outside Γi if and only if Zi < Zj for some j ∈ Nml (i). We
can therefore write
Pe|i = P [y ∈
/ Γi |i sent] = P [Zi < Zj for some j ∈ Nml (i)|i sent] (6.57)
and from there, following the same steps as in the union bound, get a tighter bound, which we
express as follows.
Intelligent Union Bound: A better bound on Pe|i is obtained by considering only
the neighbors of si that determine its ML decision region, as follows:
X ||sj − si ||
Pe|i ≤ Q (6.58)
2σ
j ∈ Nml (i)
(the bound on the average error probability Pe is computed as before by averaging the
bounds on Pe|i using the priors).
Union Bound for QPSK: For QPSK, we infer from Figure 6.22 that the union bound for Pe|1
is given by
√ !
d01 d02 d03 d 2d
Pe = Pe|0 ≤ Q +Q +Q = 2Q +Q
2σ 2σ 2σ 2σ 2σ
d2
Using Eb
= 4, we obtain the union bound in terms of Eb /N0 to be
r ! r !
2Eb 4Eb
Pe ≤ 2Q +Q QPSK union bound (6.60)
N0 N0
For moderately large Eb /N0 , the dominant term in terms of the decay of the error probability is
the first one, since Q(x) falls off rapidly as x gets large. Thus, while the union bound (6.60) is
326
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
larger than the exact error probability (6.47), as it must be, it gets the multiplicity and argument
of the dominant term right. Tightening the analysis using the intelligent union bound, we get
r !
d01 d02 2Eb
Pe|0 ≤ Q +Q = 2Q QPSK intelligent union bound (6.61)
2σ 2σ N0
since Nml (0) = {1, 2} (the decision region for s0 is determined by the neighbors s1 and s2 ).
Another common approach for getting a better (and quicker to compute) estimate than the
original union bound is the nearest neighbors approximation. This is a loose term employed to
describe a number of different methods for pruning the terms in the summation (6.52). Most
commonly, it refers to regular signal sets in which each signal point has a number of nearest
neighbors at distance dmin from it, where dmin = mini6=j ||si − sj ||. Letting Ndmin (i) denote the
number of nearest neighbors of si , we obtain the following approximation.
Nearest Neighbors Approximation
dmin
Pe|i ≈ Ndmin (i)Q (6.62)
2σ
Averaging over i, we obtain that
dmin
Pe ≈ N̄dmin Q (6.63)
2σ
where N̄dmin denotes the average number of nearest neighbors for a signal point. The rationale
2
for the nearest neighbors approximation is that, since Q(x) decays rapidly, Q(x) ∼ e−x /2 , as
x gets large, the terms in the union bound corresponding to the smallest arguments for the Q
function dominate at high SNR.
The corresponding formulas as a function of scale-invariant quantities and Eb /N0 are:
s r
2
dmin Eb
Pe|i ≈ Ndmin (i)Q (6.64)
Eb 2N0
It is also worth explicitly writing down an expression for the average error probability, averaging
the preceding over i: s
2
r
dmin Eb
Pe ≈ N̄dmin Q (6.65)
Eb 2N0
where
M
1 X
N̄dmin = Nd (i)
M i=1 min
is the average number of nearest neighbors for the signal points in the constellation.
For QPSK, we have from Figure 6.22 that
327
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
yielding !
r
2Eb
Pe ≈ 2Q
N0
In this case, the nearest neighbors approximation coincides with the intelligent union bound
(6.61). This happens because the ML decision region for each signal point is determined by its
nearest neighbors for QPSK. Indeed, the latter property holds for many regular constellations,
including all of the PSK and QAM constellations whose ML decision regions are depicted in
Figure 6.16.
Power Efficiency: While exact performance analysis for M-ary signaling can be computation-
ally demanding, we have now obtained simple enough estimates that we can define concepts such
as power efficiency, analogous to the development for binary signaling. In particular, comparing
the nearest neighbors approximation (6.63) with the error probability for binary signaling (6.40),
we define in analogy the power efficiency of an M-ary signaling scheme as
d2min
ηP = (6.66)
Eb
Since the argument of the Q function in (6.67) plays a bigger role than the multiplicity N̄dmin for
moderately large SNR, ηP offers a means of quickly comparing the power efficiency of different
signaling constellations, as well as for determining the dependence of performance on Eb /N0 .
1
−3 −1 1 3
−1
−3
Figure 6.24: ML decision regions for 16QAM with scaling chosen for convenience in computing
power efficiency.
Performance analysis for 16QAM: We now apply the preceding performance analysis to the
16QAM constellation depicted in Figure 6.24, where we have chosen a convenient scale for the
constellation. We now compute the nearest neighbors approximation, which coincides with the
intelligent union bound, since the ML decision regions are determined by the nearest neighbors.
Noting that the number of nearest neighbors is four for the four innermost signal points, two for
the four outermost signal points, and three for the remaining eight signal points, we obtain upon
averaging
N̄dmin = 3 (6.68)
It remains to compute the power efficiency ηP and apply (6.67). We had done this in the preview
in Chapter 4, but we repeat it here. For the scaling shown, we have dmin = 2. The energy per
328
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
symbol is obtained as follows:
by symmetry. Since the I component is equally likely to take the four values ±1 and ±3, we have
1
average energy of I component = (12 + 32 ) = 5
2
and
Es = 10
We therefore obtain
Es 10 5
Eb = = =
log2 M log2 16 2
The power efficiency is therefore given by
d2min 22 8
ηP = = 5 = (6.69)
Eb 2
5
as the nearest neighbors approximation and intelligent union bound for 16QAM. The bandwidth
efficiency for 16QAM is 4 bits/2 dimensions, which is twice that of QPSK, whose bandwidth
efficiency is 2 bits/2 dimensions. It is not surprising, therefore, that the power efficiency of
16QAM (ηP = 1.6) is smaller than that of QPSK (ηP = 4). We often encounter such tradeoffs
between power and bandwidth efficiency in the design of communication systems, including when
the signaling waveforms considered are sophisticated codes that are constructed from multiple
symbols drawn from constellations such as PSK and QAM.
0
10
−1
10
−2
10
Symbol Error Probability
−3
10
−4
10
−5
10
−6
10
−7 QPSK (IUB)
10 QPSK (exact)
16QAM (IUB)
16QAM (exact)
−8
10
0 2 4 6 8 10 12
Eb/N0 (dB)
Figure 6.25 shows the symbol error probabilities for QPSK and 16QAM, comparing the intelligent
union bounds (which coincide with nearest neighbors approximations) with exact results. The
329
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
exact computations for 16QAM use the closed form expression (6.70) derived in Problem 6.21. We
see that the exact error probability and intelligent union bound are virtually indistinguishable.
The power efficiencies of the constellations (which depend on the argument of the Q function)
accurately predict the distance between the curves: ηηPP(16QAM
(QP SK)
)
4
= 1.6 , which equals about 4 dB.
From Figure 6.25, we see that the distance between the QPSK and 16QAM curves at small error
probabilities (high SNR) is indeed about 4 dB.
1 s0
s1
Decision boundary
The performance analysis techniques developed here can also be applied to suboptimal receivers.
Suppose, for example, that the receiver LO in a BPSK system is offset from the incoming carrier
by a phase shift θ, but that the receiver uses decision regions corresponding to no phase offset.
The signal space picture is now as in Figure 6.26. The error probability is now given by
s !
D 2 2Eb
D
Pe = Pe|0 = Pe|1 = Q =Q
σ Eb N0
so that there is a loss of 10 log10 cos2 θ dB in performance due to the phase offset (e.g. θ = 10◦
leads to a loss of 0.13 dB, while θ = 30◦ leads to a loss of 1.25 dB).
330
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Let us first quickly derive the union bound. Without loss of generality, take the M orthogonal
signals as unit vectors along the M axes in our signal space. With this scaling, we have ||si ||2 ≡ 1,
so that Es = 1 and Eb = log1 M . Since the signals are orthogonal, the squared distance between
2
any two signals is
d2ij = ||si − sj ||2 = ||si ||2 + ||sj ||2 − 2hsi , sj i = 2Es = 2 , i 6= j
Thus, dmin ≡ dij (i 6= j) and the power efficiency
d2min
ηP = = 2 log2 M
Eb
The union bound, intelligent union bound and nearest neighbors approximation all coincide, and
we get s
r
2
X dij dmin Eb
Pe ≡ Pe|i ≤ Q = (M − 1)Q
j6=i
2σ Eb 2N0
Exact expressions: By symmetry, the error probability equals the conditional error probability,
conditioned on any one of the hypotheses; similarly, the probability of correct decision equals
the probability of correct decision given any of the hypothesis. Let us therefore condition on
hypothesis H0 (i.e., that s0 is sent), so that the received signal y = s0 + n. The decision statistics
Zi = hs0 + n, si i = Es δ0i + Ni , i = 0, 1, ..., M − 1
where {Ni = hn, si i} are jointly Gaussian, zero mean, with
cov(Ni , Nj ) = σ 2 hsi , sj i = σ 2 Es δij
Thus, Ni ∼ N(0, σ 2 Es ) are i.i.d. We therefore infer that, conditioned on s0 sent, the {Zi } are
conditionally independent, with Z0 ∼ N(Es , σ 2 Es ), and Zi ∼ N(0, σ 2 Es ) for i = 1, ..., M − 1.
Zi
Let us now express the decision statistics in scale-invariant terms, by replacing Zi by √
σ Es
. This
gives Z0 ∼ N(m, 1), Z1 , ..., ZM −1 ∼ N(0, 1), conditionally independent, where
r
Es Es p p
m= √ = = 2E s /N0 = 2Eb log2 M/N0
σ Es σ2
The conditional probability of correct reception is now given by
R
Pc|0 = RP [Z1 ≤ Z0 , ..., ZM −1 ≤ Z0 |H0 ] = P [Z1 ≤ x, ..., ZM −1 ≤ x|Z0 = x, H0 ]pZ0 |H0 (x|H0 )dx
= P [Z1 ≤ x|H0 ]...P [ZM −1 ≤ x|H0 ]pZ0 |H0 (x|H0 )dx
where we have used the conditional independence of the {Zi }. Plugging in the conditional
distributions, we get the following expression for the probability of correct reception.
Probability of correct reception for M-ary orthogonal signaling
R∞ 2
Pc = Pc|i = −∞ [Φ(x)]M −1 √12π e−(x−m) /2 dx (6.72)
p p
where m = 2Es /N0 = 2Eb log2 M/N0 .
331
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The probability of error is, of course, one minus the preceding expression. But for small error
probabilities, the probability of correct reception is close to one, and it is difficult to get good
estimates of the error probability using (6.72). We therefore develop an expression for the error
probability that can be directly computed, as follows:
X
Pe|0 = P [Zj = maxi Zi |H0 ] = (M − 1)P [Z1 = maxi Zi |H0]
j6=0
0
10
−1
10
Probability of symbol error (log scale)
−2
10 M =16
−3
10
M=2
−4
10
M=4
−5
10 M=8
−6
10
−5 −1.6 0 5 10 15 20
Eb/No(dB)
Asymptotics for large M: The error probability for M-ary orthogonal signaling exhibits an
interesting thresholding effect as M gets large:
Eb
0, N > ln 2
lim Pe = Eb
0
(6.74)
M →∞ 1, N0 < ln 2
That is, by letting M get large, we can get arbitrarily reliable performance as long as Eb /N0
exceeds -1.6 dB (ln 2 expressed in dB). This result is derived in one of the problems. Actually, we
can show using the tools of information theory that this is the best we can do over the AWGN
channel in the limit of bandwidth efficiency tending to zero. That is, M-ary orthogonal signaling
is asymptotically optimum in terms of power efficiency.
Figure 6.27 shows the probability of symbol error as a function of Eb /N0 for several values of M.
We see that the performance is quite far away from the asymptotic limit of -1.6 dB (also marked
on the plot) for the moderate values of M considered. For example, the Eb /N0 required for
achieving an error probability of 10−6 for M = 16 is more than 9 dB away from the asymptotic
limit.
332
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
10 00 Nc
11 01
QPSK with Gray coding: We begin with the example of QPSK, with the bit mapping shown
in Figure 6.28. This bit mapping is an example of a Gray code, in which the bits corresponding
to neighboring symbols differ by exactly one bit (since symbol errors are most likely going to
occur by decoding into neighboring decision regions, this reduces the number of bit errors). Let
us denote the symbol labels as b[1]b[2] for the transmitted symbol, where b[1] and b[2] each take
values 0 and 1. Letting b̂[1]b̂[2] denote the label for the ML symbol decision, the probabilities of
bit error are given by p1 = P [b̂[1] 6= b[1]] and p2 = P [b̂[2] 6= b[2]]. The average probability of bit
error, which we wish to estimate, is given by pb = 12 (p1 + p2 ). Conditioned on 00 being sent, the
probability of making an error on b[1] is as follows:
r !
d d 2Eb
P [b̂[1] = 1|00 sent] = P [ML decision is 10 or 11|00 sent] = P [Nc < − ] = Q( ) = Q
2 2σ N0
where, as before, we have expressed the result in terms of Eb /N0 using the power efficiency
d2
Eb
= 4. We also note, by the symmetry of the constellation and the bit map, that the conditional
probability of error of b[1] is the same, regardless of which symbol we condition on. Moreover,
exactly the same analysis holds for b[2], except that errors are caused by the noise random
variable Ns . We therefore obtain that
r !
2Eb
pb = p1 = p2 = Q (6.75)
N0
The fact that this expression is identical to the bit error probability for binary antipodal signaling
is not a coincidence. QPSK with Gray coding can be thought of as two independent BPSK
systems, one signaling along the I component, and the other along the Q component.
Gray coding is particularly useful at low SNR (e.g., for heavily coded systems), where symbol
errors happen more often. For example, in a coded system, we would pass up fewer bit errors to
the decoder for the same number of symbol errors. We define it in general as follows.
333
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
n
Gray Coding: Consider a 2 -ary constellation in which each point is represented by a binary
string b = (b1 , ..., bn ). The bit assigment is said to be Gray coded if, for any two constellation
points b and b′ which are nearest neighbors, the bit representations b and b′ differ in exactly
one bit location.
Nearest neighbors approximation for BER with Gray coded constellation: Consider
the ith bit bi in an n-bit Gray code for a regular constellation with minimum distance dmin . For
a Gray code, there is at most one nearest neighbor which differs in the ith bit, and the pairwise
error probability of decoding to that neighbor is Q dmin 2σ
. We therefore have
r !
ηP Eb
P (bit error) ≈ Q with Gray coding (6.76)
2N0
d2min
where ηP = Eb
is the power efficiency.
0
10
−2
10
Probability of bit error (BER) (log scale)
16PSK
−4
10
−6
10
−8 16QAM
10
−10
10 Exact
Nearest Neighbor Approximation
−12
10
−5 0 5 10 15 20
Eb/No(dB)
Figure 6.29: BER for 16QAM and 16PSK with Gray coding.
Figure 6.29 shows the BER of 16QAM and 16PSK with Gray coding, comparing the nearest
neighbors approximation with exact results (obtained analytically for 16QAM, and by simulation
for 16PSK). The slight pessimism and ease of computation of the nearest neighbors approximation
implies that it is an excellent tool for link design.
Gray coding may not always be possible. Indeed, for an arbitrary set of M = 2n signals, we may
not understand the geometry well enough to assign a Gray code. In general, a necessary (but
not sufficient) condition for an n-bit Gray code to exist is that the number of nearest neighbors
for any signal point should be at most n.
BER for orthogonal modulation: For M = 2m -ary equal energy, orthogonal modulation,
each of the m bits split the signal set into half. By the symmetric geometry of the signal set,
any of the M − 1 wrong symbols are equally likely to be chosen, given a symbol error, and M2 of
these will correspond to error in a given bit. We therefore have
M
2
P (bit error) = P (symbol error), BER for M − ary orthogonal signaling (6.77)
M −1
Note that Gray coding is out of the question here, since there are only m bits and 2m − 1
neighbors, all at the same distance.
334
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(c) Given the receiver noise figure F (dB), we can infer the noise power Pn = N0 B = N0,nom 10F/10 B,
and hence the minimum required received signal power is given by
Eb Rb Eb
PRX (min) = SNRreqd Pn = N0 B = Rb N0,nom 10F/10 (6.79)
N0 reqd B N0 reqd
This is called the required receiver sensitivity, and is usually quoted in dBm, as PRX,dBm (min) =
10 log10 PRX (min)(mW). Using (5.93), we obtain that
Eb
PRX,dBm (min) = + 10 log10 Rb − 174 + F (6.80)
N0 reqd,dB
where Rb is in bits per second. Note that dependence on bandwidth B (and hence on excess
bandwidth) cancels out in (6.79), so that the final expression for receiver sensitivity depends
only on the required Eb /N0 (which depends on the signaling scheme and target BER), the bit
rate Rb , and the noise figure F .
Once we know the receiver sensitivity, we need to determine the link parameters (e.g., transmitted
power, choice of antennas, range) such that the receiver actually gets at least that much power,
plus a link margin (typically expressed in dB). We illustrate such considerations via the Friis
formula for propagation loss in free space, which we can think of as modeling a line-of-sight
wireless link. While deriving this formula from basic electromagnetics is beyond our scope here,
let us provide some intuition before stating it.
Suppose that a transmitter emits power PT X that radiates uniformly in all directions. The power
PT X
per unit area at a distance R from the transmitter is 4πR 2 , where we have divided by the area
of a sphere of radius R. The receive antenna may be thought of as providing an effective area,
termed the antenna aperture, for catching a portion of this power. (The aperture of an antenna
is related to its size, but the relation is not usually straightforward.) If we denote the receive
antenna aperture by ARX , the received power is given by
PT X
PRX = ARX
4πR2
Now, if the transmitter can direct power selectively in the direction of the receiver rather than
radiating it isotropically, we get
PT X
PRX = GT X ARX (6.81)
4πR2
335
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where GT X is the transmit antenna’s gain towards the receiver, relative to a hypothetical isotropic
radiator. We now have a formula for received power in terms of transmitted power, which depends
on the gain of the transmit antenna and the aperture of the receive antenna. We would like to
express this formula solely in terms of antenna gains or antenna apertures. To do this, we need
to relate the gain of an antenna to its aperture. To this end, we state without proof that the
λ2
aperture of an isotropic antenna is given by A = 4π . Since the gain of an antenna is the ratio
of its aperture to that of an isotropic antenna. This implies that the relation between gain and
aperture can be written as
A 4πA
G= 2 = 2 (6.82)
λ /(4π) λ
Assuming that the aperture A scales up in some fashion with antenna size, this implies that, for
a fixed form factor, we can get higher antenna gains as we decrease the carrier wavelength, or
increase the carrier frequency.
Using (6.82) in (6.81), we get two versions of the Friis formula:
Friis formula for free space propagation
λ2
PRX = PT X GT X GRX , in terms of antenna gains (6.83) where
16π 2 R2
AT X ARX
PRX = PT X , in terms of antenna apertures (6.84)
λ2 R 2
• GT X , AT X are the gain and aperture, respectively, of the transmit antenna,
• GRX , ARX are the gain and aperture, respectively, of the receive antenna,
• λ = fcc is the carrier wavelength (c = 3 × 108 meters/sec, is the speed of light, fc the carrier
frequency),
• R is the range (line-of-sight distance between transmitter and receiver).
The first version (6.83) of the Friis formula tells us that, for antennas with fixed gain, we should
try to use as low a carrier frequency (as large a wavelength) as possible. On the other hand,
the second version tells us that, if we have antennas of a given form factor, then we can get
better performance as we increase the carrier frequency (decrease the wavelength), assuming of
course that we can “point” these antennas accurately at each other. Of course, higher carrier
frequencies also have the disadvantage of incurring more attenuation from impairments such as
obstacles, rain, fog. Some of these tradeoffs are explored in the problems.
In order to apply the Friis formula (let us focus on version (6.83) for concreteness) to link budget
analysis, it is often convenient to take logarithms, converting the multiplications into addition.
On a logarithmic scale, antenna gains are expressed in dBi, where GdBi = 10 log10 G for an
antenna with raw gain G. Expressing powers in dBm, we have
λ2
PRX,dBm = PT X,dBm + GT X,dBi + GRX,dBi + 10 log10 (6.85)
16π 2 R2
More generally, we have the link budget equation
PRX,dBm = PT X,dBm + GT X,dBi + GRX,dBi − Lpathloss,dB (R) (6.86)
where Lpathloss,dB (R) is the path loss in dB. For free space propagation, we have from the Friis
formula (6.85) that
16π 2 R2
Lpathloss,dB (R) = 10 log10 path loss in dB for free space propagation (6.87)
λ2
While the Friis formula is our starting point, the link budget equation (6.86) applies more gen-
erally, in that we can substitute other expressions for path loss, depending on the propagation
336
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
environment. For example, for wireless communication in a cluttered environment, the signal
power may decay as R14 rather than the free space decay of R12 . A mixture of empirical mea-
surements and statistical modeling is typically used to characterize path loss as a function of
range for the environments of interest. For example, the design of wireless cellular systems is
accompanied by extensive “measurement campaigns” and modeling. Once we decide on the path
loss formula (Lpathloss,dB (R)) to be used in the design, the transmit power required to attain a
given receiver sensitivity can be determined as a function of range R. Such a path loss formula
typically characterizes an “average” operating environment, around which there might be sig-
nificant statistical variations that are not captured by the model used to arrive at the receiver
sensitivity For example, the receiver sensitivity for a wireless link may be calculated based on the
AWGN channel model, whereas the link may exhibit rapid amplitude variations due to multipath
fading, and slower variations due to shadowing (e.g., due to buildings and other obstacles). Even
if fading/shadowing effects are factored into the channel model used to compute BER, and the
model for path loss, the actual environment encountered may be worse than that assumed in
the model. In general, therefore, we add a link margin Lmargin,dB , again expressed in dB, in an
attempt to budget for potential performance losses due to unmodeled or unforeseen impairments.
The size of the link margin depends, of course, on the confidence of the system designer in the
models used to arrive at the rest of the link budget.
Putting all this together, if PRX,dBm (min) is the desired receiver sensitivity (i.e., the minimum
required received power), then we compute the transmit power for the link to be
Required transmit power
Example 6.5.1 Consider again the 5 GHz WLAN link of Example 5.8.1. We wish to utilize a
20 MHz channel, using Gray coded QPSK and an excess bandwidth of 33 %. The receiver has
a noise figure of 6 dB.
(a) What is the bit rate?
(b) What is the receiver sensitivity required to achieve a BER of 10−6?
(c) Assuming transmit and receive antenna gains of 2 dBi each, what is the range achieved for
100 mW transmit power, using a link margin of 20 dB? Use link budget analysis based on free
space path loss.
Solution (a) For bandwidth B and fractional excess bandwidth a, the symbol rate
1 B 20
Rs = = = = 15 Msymbols/sec
T 1+a 1 + 0.33
and the bit rate for an M-ary constellation is
337
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
We can now invert the formula for free space loss, (6.87), noting that fc = 5 GHz, which implies
λ = fcc = 0.06 m. We get a range R of 107 meters, which is of the order of the advertised ranges
for WLANs under nominal operating conditions. The range decreases, of course, for higher bit
rates using larger constellations. What happens, for example, when we use 16QAM or 64QAM?
Example 6.5.2 Consider an indoor link at 10 meters range using unlicensed spectrum at 60
GHz. Suppose that the transmitter and receiver each use antennas with horizontal beamwidths
of 60◦ and vertical beamwidths of 30◦ . Use the following approximation to calculate the resulting
antenna gains:
41000
G≈
θhoriz θvert
where G denotes the antenna gain (linear scale), θhoriz and θvert denote horizontal and vertical
beamwidths (in degrees). Set the noise figure to 8 dB, and assume a link margin of 10 dB at
BER of 10−6 .
(a) Calculate the bandwidth and transmit power required for a 2 Gbps link using Gray coded
QPSK and 50% excess bandwidth.
(b) How do your answers change if you change the signaling scheme to Gray coded 16QAM,
keeping the same bit rate as in (a)?
(c) If you now employ Gray coded 16QAM keeping the same symbol rate as in (a), what is the
bit rate attained and the transmit power required?
(d) How do the answers in the setting of (a) change if you increase the horizontal beamwidth to
120◦ , keeping all other parameters fixed?
Solution: (a) A 2 Gbps link using QPSK corresponds to a symbol rate of 1 Gsymbols/sec.
Factoring in the 50% excess bandwidth, the required bandwidth is B = 1.5 GHz. The target
BER and constellation are as in the previous example, hence we still have (Eb /N0 )reqd,dB ≈ 10.2
dB. Plugging in Rb = 2 Gbps and F = 8 dB in (6.80), we obtain that the required receiver
sensitivity is PRX,dBm (min) = −62.8 dBm.
The antenna gains at each end are given by
41000
G≈ = 22.78
60 × 30
Converting to dB scale, we obtain GT X,dBi = GRX,dBi = 13.58 dBi.
The transmit power for a range of 10 m can now be obtained using (6.88) to be 8.1 dBm.
(b) For the same bit rate of 2 Gbps, the symbol rate for 16QAM is 0.5 Gsymbols/sec, so that
the bandwidth required is 0.75 GHz, factoring in 50% excess bandwidth.
q The nearest neighbors
4Eb
approximation to BER for Gray coded 16QAM is given by Q 5N0
. Using this, we find that
a target BER of 10−6 requires (Eb /N0 )reqd,dB ≈ 14.54 dB, and increase of 4.34 dB relative to (a).
This leads to a corresponding increase in the receiver sensitivity to -58.45 dBm, which leads to
the required transmit power increasing to 12.4 dBm.
(c) If we keep the symbol rate fixed at 1 Gsymbols/sec, the bit rate with 16QAM is Rb = 4 Gbps.
As in (b), (Eb /N0 )reqd,dB ≈ 14.54 dB. The receiver sensitivity is therefore given by -55.45 dBm,
a 3 dB increase over (b), corresponding to the doubling of the bit rate. This translates directly
to a 3 dB increase, relative to (b), in transmit power to 15.4 dBm, since the path loss, antenna
gains, and link margin are as in (b).
(d) We now go back to the setting of (a), but with different antenna gains. The bandwidth is,
of course, unchanged from (a). The new antenna gains are 3 dB smaller because of the doubling
of horizontal beamwidth. The receiver sensitivity, path loss and link margin are as in (a), thus
the 3 dB reduction in antenna gains at each end must be compensated for by a 6 dB increase in
transmit power relative to (a). Thus, the required transmit power is 14.1 dBm.
Discussion: The parameter choices in the preceding examples illustrate how physical character-
istics of the medium change with choice of carrier frequency, and affect system design tradeoffs.
338
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The 5 GHz system in Example 6.5.1 employs essentially omnidirectional antennas with small
gains of 2 dBi, whereas it is possible to realize highly directional yet small antennas (e.g., using
electronically steerable printed circuit antenna arrays) for the 60 GHz system in Example 6.5.2
by virtue of the small (5 mm) wavelength. 60 GHz waves are easily blocked by walls, hence the
range in Example 6.5.2 corresponds to in-room communication. We have also chosen parameters
such that the transmit power required for 60 GHz is smaller than that at 5 GHz, since it is
more difficult to produce power at higher radio frequencies. Finally, the link margin for 5 GHz
is chosen higher than for 60 GHz: propagation at 60 GHz is near line-of-sight, whereas fading
due to multipath propagation at 5 GHz can be more significant, and hence may require a higher
link margin relative to the AWGN benchmark which provides the basis for our link budget.
For equal priors, the MPE rule coincides with the ML rule:
• For binary hypothesis testing, ML and MPE rules can be written as likelihood, or log likelihood,
ratio tests:
H1 H1
p1 (y) > >
L(y) = 1 or log L(y) 0 ML rule
p0 (y) < <
H0 H0
H1 H1
p1 (y) > π0 > π0
L(y) = or log L(y) MPE/MAP rule
p0 (y) < π1 < π1
H0 H0
339
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Signal space
• M-ary signaling in AWGN in continuous time can be reduced, without loss of information,
to M-ary signaling in finite-dimensional vector space with each dimension seeing i.i.d. N(0, σ 2 )
noise, which corresponds to discrete time WGN. This is accomplished by projecting the received
signal onto the signal space spanned by the M possible signals.
• Decision rules derived using hypothesis testing in the finite-dimensional signal space map
directly back to continuous time because of two key reasons: signal inner products are preserved,
and the noise component orthogonal to the signal space is irrelevant. Because of this equivalence,
we can stop making a distinction between continuous time signals and finite-dimensional vector
signals in our notation.
Optimal demodulation
• For the model Hi = y = si + n, 0 ≤ i ≤ M − 1, optimum demodulation involve computation of
the correlator outputs Zi = hy, si i. This can be accomplished by using a bank of correlators or
matched filters, but any other other receiver structure that yields the statistics {Zi } would also
preserve all of the relevant information.
• The ML and MPE rules are given by
||si ||2
δM L (y) = arg max0≤i≤M −1 hy, si i −
2
||si ||2
δM P E (y) = arg max0≤i≤M −1 hy, si i − + σ 2 log πi
2
When the received signal lies in a finite-dimensional space in which the noise has finite energy,
the ML rule can be written as a minimum distance rule (and the MPE rule as a variant thereof)
as follows:
δM L (y)arg min0≤i≤M −1 ||y − si ||2
δM P E (y) = arg min0≤i≤M −1 ||y − si ||2 − 2σ 2 log πi
Geometry of ML rule: ML decision boundaries are formed from hyperplanes that bisect lines
connecting signal points.
Performance analysis
• For binary signaling, the error probability for the ML rule is given by
s r !
d d2 Eb
Pe = Q =Q
2σ Eb 2N0
where d = ||s1 − s0 || is the Euclidean distance between the signals. The performance therefore
2
depends on the power efficiency ηP = Ed b and the SNR Eb /N0 . Since the power efficiency is scale-
invariant, we may choose any convenient scaling when computing it for a given constellation.
• For M-ary signaling, closed form expressions for the error probability may not be available,
hs ,s i
but we know that the performance depends only on the scale-invariant inner products { Ei bj },
which depend on the constellation “shape” alone, and on Eb /N0 .
• The conditional error probabilities for M-ary signaling can be bounded using the union bound
(these can then be averaged to obtain an upper bound on the average error probability):
s r
2
X dij X dij Eb
Pe|i ≤ Q = Q
2σ Eb 2N0
j6=i j6=i
where dij = ||si − sj || are the pairwise distances between signal points.
• When we understand the shape of the decision regions, we can tighten the union bound into
340
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
an intelligent union bound:
s
d2ij
r
X ||dij || X Eb
Pe|i ≤ Q = Q
2σ Eb 2N0
j ∈ Nml (i) j ∈ Nml (i)
where Nml (i) denotes the set of neighbors of si which define the decision region Γi .
• For regular constellations, the nearest neighbors approximation is given by
s r
2
dmin dmin Eb
Pe|i ≈ Ndmin (i)Q = Ndmin (i)Q
2σ Eb 2N0
s r
d2min
dmin Eb
Pe ≈ N̄dmin Q = N̄dmin Q
2σ Eb 2N0
d2
with ηP = E min
b
providing a measure of power efficiency which can be used to compare across
constellations.
• If Gray coding is possible, the bit error probability can be estimated as
r !
ηP Eb
P (bit error) ≈ Q
2N0
Link budget: This relates (e.g., using the Friis formula for free space propagation) the per-
formance of a communication link to physical parameters such as transmit power, transmit and
receive antenna gains, range, and receiver noise figure. A link margin is typically introduced to
account for unmodeled impairments.
6.7 Endnotes
The geometric signal space approach for deriving and analyzing is now standard in textbooks
on communication theory, such as [7, 8]. It was first developed by Russian pioneer Vladimir
Kotelnikov [33], and presented in a cohesive fashion in the classic textbook by Wozencraft and
Jacobs [9].
A number of details of receiver design have been swept under the rug in this chapter. Our
model for the received signal is that it equals the transmitted signal plus WGN. In practice,
the transmitted signal can be significantly distorted by the channel (e.g., scaling, delay, multi-
path propagation). However, the basic M-ary signaling model is still preserved: if M possible
signals are sent, then, prior to the addition of noise, M possible signals are received after the
deterministic (but a priori unknown) transformations due to channel impairments. The receiver
can therefore estimate noiseless copies of the latter and then apply the optimum demodula-
tion techniques developed here. This approach leads, for example, to the optimal equalization
strategies developed by Forney [34] and Ungerboeck [35]; see Chapter 5 of [7] for a textbook
exposition. Estimation of the noiseless received signals involves tasks such as carrier phase and
frequency synchronization, timing synchronization, and estimation of the channel impulse re-
sponse or transfer function. In modern digital communication transceivers, these operations
are typically all performed using DSP on the complex baseband received signal. Perhaps the
best approach for exploring further is to acquire a basic understanding of the relevant estima-
tion techniques, and to then go to technical papers of specific interest (e.g., IEEE conference
341
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
and journal publications). Classic texts covering estimation theory include Kay [36], Poor [37]
and Van Trees [38]. Several graduate texts in communications contain a brief discussion of the
modern estimation-theoretic approach to synchronization that may provide a helpful orientation
prior to going to the research literature; for example, see [7] (Chapter 4) and [11, 39] (Chapter
8).
6.8 Problems
Hypothesis Testing
Problem 6.1 The received signal in a digital communication system is given by
s(t) + n(t) 1 sent
y(t) =
n(t) 0 sent
where n is AWGN with PSD σ 2 = N0 /2 and s(t) is as shown below. The received signal is passed
s(t)
1 t = t0
ML decision
0 2 t h(t)
4 rule
-1
through a filter, and the output is sampled to yield a decision statistic. An ML decision rule is
employed based on the decision statistic. The set-up is shown in Figure 6.30.
(a) For h(t) = s(−t), find the error probability as a function of Eb /N0 if t0 = 1.
(b) Can the error probability in (a) be improved by choosing the sampling time t0 differently?
(c) Now, find the error probability as a function of Eb /N0 for h(t) = I[0,2] and the best possible
choice of sampling time.
(d) Finally, comment on whether you can improve the performance in (c) by using a linear com-
bination of two samples as a decision statistic, rather than just using one sample.
Problem 6.2 Consider binary hypothesis testing based on the decision statistic Y , where Y ∼
N(2, 9) under H1 and Y ∼ N(−2, 4) under H0 .
(a) Show that the optimal (ML or MPE) decision rule is equivalent to comparing a function of
the form ay 2 + by to a threshold.
(b) Specify the MPE rule explicitly (i.e., specify a, b and the threshold) when π0 = 41 .
(c) Express the conditional error probability Pe|0 for the decision rule in (b) in terms of the Q
function with positive arguments. Also provide a numerical value for this probability.
Problem 6.3 Find and sketch the decision regions for a binary hypothesis testing problem with
observation Z, where the hypotheses are equally likely, and the conditional distributions are
given by
H0 : Z is uniform over [−2, 2]
H1 : Z is Gaussian with mean 0 and variance 1.
342
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 6.4 The receiver in a binary communication system employs a decision statistic Z
which behaves as follows:
Z = N if 0 is sent
Z = 4 + N if 1 is sent
where N is modeled as Laplacian with density
1
pN (x) = e−|x| , −∞<x<∞
2
Note: Parts (a) and (b) can be done independently.
(a) Find and sketch, as a function of z, the log likelihood ratio
p(z|1)
K(z) = log L(z) = log
p(z|0)
where p(z|i) denotes the conditional density of Z given that i is sent (i = 0, 1).
(b) Find Pe|1 , the conditional error probability given that 1 is sent, for the decision rule
0, z < 1
δ(z) =
1, z ≥ 1
(c) Is the rule in (b) the MPE rule for any choice of prior probabilities? If so, specify the prior
probability π0 = P [ 0 sent] for which it is the MPE rule. If not, say why not.
Problem 6.5 Consider the MAP/MPE rule for the hypothesis testing problem in Example
6.1.1.
(a) Show that the MAP rule always says H1 if the prior probability of H0 is smaller than some
positive threshold. Specify this threshold.
(b) Compute and plot the conditional probabilities Pe|0 and Pe|1 , and the average error proba-
bility Pe , versus π0 as the latter varies in [0, 1].
(c) Discuss any trends that you see from the plots in (b).
Problem 6.6 Consider a MAP receiver for the basic Gaussian example, as discussed in Example
6.1.2. Fix SNR at 13 dB. We wish to explore the effect of prior mismatch, by quantifying the
performance degradation of a MAP receiver if the actual priors are different from the priors for
which it has been designed.
(a) Plot the average error probability for a MAP receiver designed for π0 = 0.2, as π0 varies
from 0 to 1. As usual, use a log scale for the probabilities. On the same plot, also plot the error
probability of the ML receiver as a benchmark.
(b) From the plot in (a), comment on how much error you can tolerate in the prior probabilities
before the performance of the MAP receiver designed for the given prior becomes unacceptable.
(c) Repeat (a) and (b) for a MAP receiver designed for π0 = 0.4. Is the performance more or
less sensitive to errors in the priors?
Problem 6.7 Consider binary hypothesis testing in which the observation Y is modeled as uni-
formly distributed over [−2, 2] under H0 , and has conditional density p(y|1) = c(1−|y|/3)I[−3,3](y)
under H1 , where c > 0 is a constant to be determined.
(a) Find c.
(b) Find and sketch the decision regions Γ0 and Γ1 corresponding to the ML decision rule.
(c) Find the conditional error probabilities.
343
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 6.8 Consider binary hypothesis testing with scalar observation Y . Under hypothesis
H0 , Y is modeled as uniformly distributed over [−5, 5]. Under H1 , Y has conditional density
p(y|1) = 18 e−|y|/4 , − ∞ < y < ∞.
(a) Specify the ML rule and clearly draw the decision regions Γ0 and Γ1 on the real line.
(b) Find the conditional probabilities of error for the ML rule under each hypothesis.
Problem 6.9 For the setting of Problem 6.8, suppose that the prior probability of H0 is 1/3.
(a) Specify the MPE rule and draw the decision regions.
(b) Find the conditional error probabilities and the average error probability. Compare with the
corresponding quantities for the ML rule considered in Problem 6.8.
Problem 6.10 The receiver output Z in an on-off keyed optical communication system is mod-
eled as a Poisson random variable with mean m0 = 1 if 0 is sent, and mean m1 = 10 if 1 is sent.
(a) Show that the ML rule consists of comparing Z to a threshold, and specify the numerical
value of the threshold. Note that Z can only take nonnegative integer values.
(b) Compute the conditional error probabilities for the ML rule (compute numerical values in
addition to deriving formulas).
(c) Find the MPE rule if the prior probability of sending 1 is 0.1.
(d) Compute the average error probability for the MPE rule.
Problem 6.11 The received sample Y in a binary communication system is modeled as follows:
Y = A + N if 0 is sent, and Y = −A + N if 1 is sent, where N is Laplacian noise with density
λ
pN (x) = e−λ|x| , − ∞ < x < ∞
2
(a) Find the ML decision rule. Simplify as much as possible.
(b) Find the conditional error probabilities for the ML rule.
(c) Now, suppose that the prior probability of sending 0 is 1/3. Find the MPE rule, simplifying
as much as possible.
(d) In the setting of (c), find the LLR log PP [0|Y =A/2]
[1|Y =A/2]
.
Problem 6.12 Consider binary hypothesis testing with scalar observation Y . Under hypothesis
H0 , Y is modeled as an exponential random variable with mean 5. Under hypothesis H1 , Y is
modeled as uniformly distributed over the interval [0, 10].
(a) Specify the ML rule and clearly draw the decision regions Γ0 and Γ1 on the real line.
(b) Find the conditional probability of error for the ML rule, given that H0 is true.
(c) Suppose that the prior probability of H0 is 1/3. Compute the posterior probability of H0
given that we observe Y = 4 (i.e., find P [H0 |Y = 4]).
Problem 6.13 Consider hypothesis testing in which the observation Y is given by the following
model:
H1 : Y = 6 + N
H0 : Y = N
where the noise N has density pN (x) = 101
1 − |x|
10
I[−10,10] (x).
(a) Find the conditional error probability given H1 for the following decision rule:
H1
>
Y 4
<
H0
(b) Are there a set of prior probabilities for which the decision rule in (a) minimizes the error
probability? If so, specify them. If not, say why not.
344
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(b) Specify the conditional distribution of y = (y0 , y1 , y2, y3 )T , conditioned on s0 being sent.
(c) Specify the ML rule when the observation is y. What is its conditional error probability
given that s0 is sent?
(d) Specify the ML rule when the observation is y0 + y1 + y2 + y3 . What is its conditional error
probability, given that s0 is sent?
(e) Among the error probabilities in (a), (c) and (d), which is the smallest? Which is the biggest?
Could you have rank ordered these error probabilities without actually computing them?
Problem 6.15 The received signal in an on-off keyed digital communication system is given by
s(t) + n(t) 1 sent
y(t) =
n(t) 0 sent
where n is AWGN with PSD σ 2 = N0 /2, and s(t) = A(1−|t|)I[−1,1] (t), where A > 0. The received
signal is passed through a filter with impulse response h(t) = I[0,1] (t) to obtain z(t) = (y ∗ h)(t).
Remark: It would be helpful to draw a picture of the system before you start doing the calculations.
(a) Consider the decision statistic Z = z(0) + z(1). Specify the conditional distribution of Z
given that 0 is sent, and the conditional distribution of Z given that 1 is sent.
(b) Assuming that the receiver must make its decision based on Z, specify the ML rule and its
error probability in terms of Eb /N0 (express your answer in terms of the Q function with positive
arguments).
(c) Find the error probability (in terms of Eb /N0 ) for ML decisions based on the decision statistic
Z2 = z(0) + z(0.5) + z(1).
Problem 6.16 Consider binary signaling in AWGN using the signals depicted in Figure 6.31.
The received signal is given by
s1 (t) + n(t), 1 sent
y(t) =
s0 (t) + n(t), 0 sent
where n(t) is WGN with PSD σ 2 = N0 /2. R
(a) Show that the ML decision rule can be implemented by comparing Z = y(t)a(t)dt to a
threshold γ. Sketch a(t) and specify the corresponding value of γ.
(b) Specify the error probability of the ML rule as a function of Eb /N0 .
(c) Can the MPE rule, assuming that the prior probability of sending 0 is 1/3, be implemented
using the same receiver structure as in (a)? What would need to change? (Be specific.)
(d) Consider now a suboptimal receiver structure in which y(t) is passed through a filter with
impulse response h(t) = I[0,1] (t), and we take three samples: Z1 = (y ∗ h)(1), Z2 = (y ∗ h)(2),
Z3 = (y ∗ h)(3). Specify the conditional distribution of Z = (Z1 , Z2 , Z3 )T given that 0 is sent.
(e) (more challenging) Specify the ML rule based on Z and the corresponding error probability
as a function of Eb /N0 .
345
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
s 1(t)
1
t
1 3
s 0(t)
0 2 t
−1
Problem 6.17 Let p1 (t) = I[0,1] (t) denote a rectangular pulse of unit duration. Consider two
4-ary signal sets as follows:
Signal Set A: si (t) = p1 (t − i), i = 0, 1, 2, 3.
Signal Set B: s0 (t) = p1 (t) + p1 (t − 3), s1 (t) = p1 (t − 1) + p1 (t − 2), s2 (t) = p1 (t) + p1 (t − 2),
s3 (t) = p1 (t − 1) + p1 (t − 3).
(a) Find signal space representations for each signal set with respect to the orthonormal basis
{p1 (t − i), i = 0, 1, 2, 3}.
(b) Find union bounds on the average error probabilities for both signal sets as a function of
Eb /N0. At high SNR, what is the penalty in dB for using signal set B?
(c) Find an exact expression for the average error probability for signal set B as a function of
Eb /N0.
a(t) b(t)
2 2
t t
−2 −1 0 1 2 −2 −1 0 1 2
−2
c(t) d(t)
2 2
0
t t
−2 −1 0 1 2 −2 −1 1 2
−2 −2
Problem 6.18 Consider the 4-ary signaling set shown in Figure 6.32, to be used over an AWGN
channel.
(a) Find a union bound, as a function of Eb /N0 , on the conditional probability of error given
that c(t) is sent.
(b) True or False This constellation is more power efficient than QPSK. Justify your answer.
346
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
QAM1 QAM2
8-PSK
Problem 6.19 Three 8-ary signal constellations are shown in Figure 6.33.
(2) (1)
(a) Express R and dmin in terms of dmin so that all three constellations have the same Eb .
(b) For a given Eb /N0 , which constellation do you expect to have the smallest bit error probability
over a high SNR AWGN channel?
(c) For each constellation, determine whether you can label signal points using 3 bits so that the
label for nearest neighbors differs by at most one bit. If so, find such a labeling. If not, say why
not and find some “good” labeling.
(d) For the labelings found in part (c), compute nearest neighbors approximations for the average
bit error probability as a function of Eb /N0 for each constellation. Evaluate these approximations
for Eb /N0 = 15dB.
Problem 6.20 Consider the signal constellation shown in Figure 6.34, which consists of two
QPSK constellations of different radii, offset from each other by π4 . The constellation is to be
used to communicate over a passband AWGN channel.
(a) Carefully
√ redraw the constellation (roughly to scale, to the extent possible) for r = 1 and
R = 2. Sketch the ML√decision regions.
(b) For r = 1 and R = 2, find an intelligent union bound for the conditional error probability,
given that a sign al point from the inner circle is sent, as a function of Eb /N0 .
(c) How would you choose the parameters r and R so as to optimize the power efficiency of the
constellation (at high SNR )?
347
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 6.21 (Exact symbol error probabilities for rectangular constellations) As-
suming each symbol is equally likely, derive the following expressions for the average error prob-
ability for 4PAM and 16QAM:
r !
3 4Eb
Pe = Q , symbol error probability for 4PAM (6.90)
2 5N0
r ! r !
4Eb 9 4Eb
Pe = 3Q − Q2 , symbol error probability for 16QAM (6.91)
5N0 4 5N0
(Assume 4PAM with equally spaced levels symmetric about the origin, and rectangular 16QAM
equivalent to two 4PAM constellations independently modulating the I and Q components.)
d
d
I
Problem 6.22 The signal constellation shown in Figure 6.35 is obtained by moving the outer
corner points in rectangular 16QAM to the I and Q axes.
(a) Sketch the ML decision regions.
(b) Is the constellation more or less power-efficient than rectangular 16QAM?
Problem 6.23 Consider a 16-ary signal constellation with 4 signals with coordinates (±1, ±1),
four others with coordinates (±3, ±3), and two each having coordinates (±3, 0), (±5, 0), (0, ±3),
and (0, ±5), respectively.
(a) Sketch the signal constellation and indicate the ML decision regions.
(b) Find an intelligent union bound on the average symbol error probability as a function of
Eb /N0.
(c) Find the nearest neighbors approximation to the average symbol error probability as a func-
tion of Eb /N0 .
(d) Find the nearest neighbors approximation to the average symbol error probability for 16QAM
as a function of Eb /N0 .
(e) Comparing (c) and (d) (i.e., comparing the performance at high SNR), which signal set is
more power efficient?
Problem 6.24 A QPSK demodulator is designed to put out an erasure when the decision is
ambivalent. Thus, the decision regions are modified as shown in Figure 6.36, where the cross-
hatched region corresponds to an erasure. Set α = dd1 , where 0 ≤ α ≤ 1.
(a) Use the intelligent union bound to find approximations to the probability p of symbol error
and the probability q of symbol erasure in terms of Eb /N0 and α.
(b) Find exact expressions for p and q as functions of Eb /N0 and α.
348
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
d1
d
(c) Using the approximations in (a), find an approximate value for α such that q = 2p for
Eb /N0 = 4dB.
Remark: The motivation for (c) is that a typical error-correcting code can correct twice as many
erasures as errors.
r R
Problem 6.25 The constellation shown in Figure 6.37 consists of two QPSK constellations lying
on concentric circles, with inner circle of radius r and outer circle of radius R.
(a) For r = 1 and R = 2, redraw the constellation, and carefully sketch the ML decision regions.
(b) Still keeping r = 1 and R = 2, find an intelligent union bound for the symbol error probability
as a function of Eb /N0 .
(c) For r = 1, find the best choice of R in terms of high SNR performance. Compute the gain in
power efficiency (in dB), if any, over the setting in (a)-(b).
Problem 6.26 Consider the constant modulus constellation shown in Figure 6.38. where θ ≤
π/4. Each symbol is labeled by 2 bits (b1 , b2 ) as shown. Assume that the constellation is used
over a complex baseband AWGN channel with noise Power Spectral Density (PSD) N0 /2 in each
dimension. Let (bˆ1 , bˆ2 ) denote the maximum likelihood (ML) estimates of (b1 , b2 ).
(a) Find Pe1 = P [bˆ1 6= b1 ] and Pe2 = P [bˆ2 6= b2 ] as a function of Es /N0 , where Es denotes the
signal energy.
(b) Assume now that the transmitter is being heard by two receivers, R1 and R2, and that R2 is
twice as far away from the transmitter as R1. Assume that the received signal energy falls off as
1/r 4 , where r is the distance from the transmitter, and that the noise PSD for both receivers is
349
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(0,0)
00
11 11111
00000
00
11(1,0)
00000
11111
00000
11111
00000
11111
θ
00000
11111
00
11
(0,1)
00
11
(1,1)
Figure 6.38: Signal constellation with unequal error protection (Problem 6.26).
identical. Suppose that R1 can demodulate both bits b1 and b2 with error probability at least as
good as 10−3 , i.e., so that max{Pe1(R1), Pe2 (R1)} = 10−3 . Design the signal constellation (i.e.,
specify θ) so that R2 can demodulate at least one of the bits with the same error probability,
i.e., such that min{Pe1 (R2), Pe2 (R2)} = 10−3 .
Remark: You have designed an unequal error protection scheme in which the receiver that sees
a poorer channel can still extract part of the information sent.
Q
2 s1
s3 s0
I
−3 −1 0 1 3
−2
s2
Problem 6.27 The 2-dimensional constellation shown in Figure 6.39 is to be used for signaling
over an AWGN channel.
(a)Specify the ML decision if the observation is (I, Q) = (1, −1).
(b) Carefully redraw the constellation and sketch the ML decision regions.
(c) Find an intelligent union bound for the symbol error probability conditioned on s0 being sent,
as a function of Eb /N0 .
Problem 6.28 (Demodulation with amplitude mismatch) Consider a 4PAM system us-
ing the constellation points {±1, ±3}. The receiver has an accurate estimate of its noise level.
An automatic gain control (AGC) circuit is supposed to scale the decision statistics so that the
noiseless constellation points are in {±1, ±3}. ML decision boundaries are set according to this
nominal scaling.
(a) Suppose that the AGC scaling is faulty, and the actual noiseless signal points are at {±0.9, ±2.7}.
Sketch the points and the mismatched decision regions. Find an intelligent union bound for the
symbol error probability in terms of the Q function and Eb /N0 .
350
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(b) Repeat (a), assuming that faulty AGC scaling puts the noiseless signal points at {±1.1, ±3.3}.
(c) AGC circuits try to maintain a constant output power as the input power varies, and can be
viewed as imposing a scale factor on the input inversely proportional to the square root of the
input power. In (a), does the AGC circuit overestimate or underestimate the input power?
Problem 6.29 (Demodulation with phase mismatch) Consider a BPSK system in which
the receiver’s estimate of the carrier phase is off by θ.
(a) Sketch the I and Q components of the decision statistic, showing the noiseless signal points
and the decision region.
(b) Derive the BER as a function of θ and Eb /N0 (assume that θ < π2 ).
(c) Assuming now that θ is a random variable taking values uniformly in [− π4 , π4 ], numerically
compute the BER averaged over θ, and plot it as a function of Eb /N0 . Plot the BER without
phase mismatch as well, and estimate the dB degradation due to the phase mismatch.
Problem 6.30 (Simplex signaling set) Let s0 (t), ..., sM −1 (t) denote a set of equal energy,
orthogonal signals. Construct a new M-ary signal set from these as follows, by subtracting out
the average of the M signals from each signal as follows:
M −1
1 X
uk (t) = sk (t) − sj (t) , k = 0, 1, .., M − 1
M j=0
Problem 6.31 (Soft decisions for BPSK) Consider a BPSK system in which 0 and 1 are
equally likely to be sent, with 0 mapped to +1 and 1 to -1 as usual. Thus, the decision statistic
Y = A + N if 0 is sent, and Y = −A + N if 1 is sent, where A > 0 and N ∼ N(0, σ 2 ).
(a) Show that the LLR is conditionally Gaussian given the transmitted bit, and that the condi-
tional distribution is scale-invariant, depending only on Eb /N0 .
(b) If the BER for hard decisions is 10%, specify the conditional distribution of the LLR, given
that 0 is sent.
Problem 6.32 (Soft decisions for PAM) Consider soft decisions for 4PAM signaling as in
Example 6.1.3. Assume that the signals have been scaled to ±1, ±3 (i.e., set A = 1 in Example
6.1.3. The system is operating at Eb /N0 of 6 dB. Bits b1 , b2 ∈ {0, 1} are mapped to the symbols
using Gray coding. Assume that (b1 , b2 ) = (0, 0) for symbol -3, and (1, 0) for symbol +3.
(a) Sketch the constellation, along with the bit maps. Indicate the ML hard decision boundaries.
(b) Find the posterior symbol probability P [−3|y] as a function of the noisy observation y. Plot
it as a function of y.
Hint: The noise variance σ 2 can be inferred from the signal levels and SNR.
(c) Find P [b1 = 1|y] and P [b2 = 1|y], and plot as a function of y.
Remark: The posterior probability of b1 = 1 equals the sum of the posterior probabilities of all
symbols which have b1 = 1 in their labels.
(d) Display the results of part (c) in terms of LLRs.
P [b1 = 0|y] P [b2 = 0|y]
LLR1 (y) = log , LLR2 (y) = log
P [b1 = 1|y] P [b2 = 1|y]
351
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Plot the LLRs as a function of y, saturating the values as ±50.
(e) Try other values of Eb /N0 (e.g., 0 dB, 10 dB). Comment on any trends you notice. How do
the LLRs vary as a function of distance from the noiseless signal points? How do they vary as
you change Eb /N0 .
(f) In order to characterize the conditional distribution of the LLRs, simulate the system over
multiple symbols at Eb /N0 such that the BER is about 5%. Plot the histograms of the LLRs
for each of the two bits, and comment on whether they look Gaussian. What happens as you
increase or decrease Eb /N0 ?
Hint: Use L’Hospital’s rule on the log of the expression whose limit is to be evaluated.
(c) Substitute (b) into the integral in (a) to infer the desired result.
Problem 6.34 (Effect of Rayleigh fading) Constructive and destructive interference between
multiple paths in wireless systems lead to large fluctuations in received amplitude, modeled as a
Rayleigh random variable A (see Problem 5.21 for a definition). The energy per bit is therefore
proportional to A2 , which, using Problem 5.21(c), is an exponential random variable. Thus,
we can model Eb /N0 as an exponential random variable with mean Ēb /N0 , where Ēb is the
Eb
average energy per bit. Simplify notation by setting N 0
= X, and the mean Ēb /N0 = µ1 , so that
X ∼ Exp(µ).
(a) Show that the average error probability for BPSK with Rayleigh fading can be written as
Z ∞ √
Pe = Q( 2x) µe−µx dx
0
q
2Eb
Hint: The error probability for BPSK is given by Q N0
, where Eb /N0 is a random variable.
We now find the expected error probability by averaging over the distribution of Eb /N0 .
(b) Integrating by parts and simplifying, show that the average error probability can be written
as 1
1 − 21 N0 − 1
Pe = 1 − (1 + µ) = 1 − (1 + ) 2
2 2 Ēb
Hint: Q(x) is defined via an integral, so we can find its derivative (when integrating by parts)
using the fundamental theorem of calculus.
(c) Using the approximation that (1 + a)b ≈ 1 + ba for |a| small, show that
1
Pe ≈
4(Ēb /N0)
352
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
at high SNR. Comment on how this decay of error probability with the reciprocal of SNR
compares with the decay for the AWGN channel.
Ēb
(b) Plot the error probability versus N 0
for BPSK over the AWGN and Rayleigh fading channels
(BER on log scale, NĒ0 in dB). Note that Ēb = Eb for the AWGN channel. At BER of 10−3 , what
is the degradation in dB due to Rayleigh fading?
Problem 6.37 Consider a line-of-sight communication link operating in the 60 GHz band (where
large amounts of unlicensed bandwidth have been set aside by regulators). From version 1 of
the Friis formula (6.83), we see that the received power scales as λ2 , and hence as the inverse
square of the carrier frequency, so that 60 GHz links have much worse propagation than, say, 5
GHz links when antenna gains are fixed. However, from (6.82), we see that the we can get much
better antenna gains at small carrier wavelengths for a fixed form factor, and version 2 of the
Friis formula (6.84) shows that the received power scales as 1/λ2 , which improves with increasing
carrier frequency. Furthermore, electronically steerable antenna arrays with high gains can be
implemented with compact form factor (e.g., patterns of metal on circuit board) at higher carrier
frequencies such as 60 GHz. Suppose, now, that we wish to design a 2 Gbps link using QPSK
with an excess bandwidth of 50%. The receiver noise figure is 8 dB, and the desired link margin
is 10 dB.
(a) What is the transmit power in dBm required to attain a range of 10 meters (e.g., for in-room
communication), assuming that the transmit and receive antenna gains are each 10 dBi?
(b) For a transmit power of 20 dBm, what are the antenna gains required at the transmitter and
receiver (assume that the gains at both ends are equal) to attain a range of 200 meters (e.g., for
an outdoor last-hop link)?
(c) For the antenna gains found in (b), what happens to the attainable range if you account for
additional path loss due to oxygen absorption (typical in the 60 GHz band) of 16 dB/km?
(d) In (c), what happens to the attainable range if there is a further path loss of 30 dB/km due
to heavy rain (on top of the loss due to oxygen absorption)?
353
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
dB, and the transmit and receive antenna gains are 10 dBi each. This is the baseline scenario
against which each of the scenarios in (a)-(c) are to be compared.
(a) Suppose that you change the carrier frequency to 5 GHz, keeping all other link parameters
the same. What is the new range?
(b) Suppose that you change the carrier frequency to 5 GHz and increase the transmit and receive
antenna gains by 3 dBi each, keeping all other link parameters the same. What is the new range?
(c) Suppose you change the carrier frequency to 5 GHz, increase the transmit and receive antenna
directivities by 3 dBi each, and increase the data rate to 40 Mbps, still using 16QAM with excess
bandwidth of 25%. All other link parameters are the same. What is the new range?
Laboratory Assignment
3) BPSK symbol generation: Use part 1 to generate 12000 0/1 bits. Map these to BPSK (±1)
bits using bpskmap. Pass these through the transmit and receive filter in lab 1 to get noiseless
received samples at rate 4/T , as before.
4) Adding noise: We consider discrete time additive white Gaussian noise (AWGN). At the input
to the receive filter, add independent and identically distributed (iid) complex Gaussian noise,
such that the real and imaginary part of each sample are iid N(0, σ 2 ) (you will choose σ 2 = N20
Eb
corresponding to a specified value of N 0
, as described in part 5. Pass these (rate 4/T ) noise
samples through the receive filter, and add the result to the output of part 3.
Remark: If the nth transmitted symbol is b[n], the average received energy per symbol is
Es = E[|b[n]|2 ]||gT ∗ gC ||2 . Divide that by the number of bits per symbol to get Eb . The
354
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
noise variance per dimension is σ 2 = N20 . This enables you to compute Eb /N0 for your simula-
tion model. The signal-to-noise ratio Eb /N0 is usually expressed in decibels (dB): Eb /N0 (dB) =
10 log10 Eb /N0 (raw). Thus, if you fix the transmit and channel filter coefficients, then you can
simulate any given value of Eb /N0 in dB by varying the value of the noise variance σ 2 .
p
5) Plot the ideal bit error probability for BPSK, which is given by Q( 2Eb /N0 ), on a log scale
as a function of Eb /N0 in dB over the range 0-10 dB. Find the value of Eb /N0 that corresponds
to an error probability of 10−2 .
6) For the value of Eb /N0 found in part 5, choose the corresponding value of σ 2 in part 1. Find
the decision statistics corresponding to the transmitted symbols at the input and output of the
receive filter, as in lab 1 (parts 5 and 6). Plot the imaginary versus the real parts of the decision
statistics; you should see a noisy version of the constellation.
7) Using an appropriate decision rule, make decisions on the 12000 transmitted bits based on
the 12000 decision statistics, and measure the error probability obtained at the input and the
output. Compare the results with the ideal error probability from part 5. You should find that
the error probability based on the receiver input samples is significantly worse than that based
on the receiver output, and that the latter is a little worse than the ideal performance because
of the ISI in the decision statistics.
8) Now, map 12000 0/1 bits into 6000 4PAM symbols using function fourpammap (use as input
2 parallel vectors of 6000 bits). As shown in Chapter 6, a good approximation (the nearest
neighbors
q approximation) to the ideal bit error probability for Gray coded 4PAM is given by
4Eb
Q 5N0
. As in part 5), plot this on a log scale as a function of Eb /N0 in dB over the range
0-10 dB. What is the value of Eb /N0 (dB) corresponding to a bit error probability of 10−2?
9) Choose the value of the noise variance σ 2 corresponding to the Eb /N0 found in part 7. Now,
find decision statistics for the 6000 transmitted symbols based on the receive filter output only.
(a) Plot the imaginary versus the real parts of the decision statistics, as before.
(b) Determine an appropriate decision rule for estimating the two parallel bit streams of 6000
bits from the 6000 complex decision statistics.
(c) Measure the bit error probability, and compare it with the ideal bit error probability.
10) Repeat parts 8 and 9 for QPSK, the ideal bit error probability for which, as a function of
Eb /N0, is the same as for BPSK.
11) Repeat parts 8 and 9 for 16QAM (4 bit streams of length 3000 each), the ideal bit error
probability for which, as a function of Eb /N0 , is the same as for 4PAM.
12) Repeat parts 8 and 9 for 8PSK (3 bit streams of length 4000 each). The ideal bit error
probability for Gray
coded 8PSK is approximated by (using the nearest neighbors approximation)
√
q
(6−3 2)Eb
Q 2N0
.
13) Since all your answers above will be off from the ideal answers because of some ISI, run a
simulation with 12000 bits sent using Gray-coded 16-QAM with no ISI. To do this, generate the
decision statistics by adding noise directly to the transmitted symbols, setting the noise variance
appropriately to operate at the required Eb /N0 . Do this for two different values of Eb /N0 , the one
in part 11 and a value 3 dB higher. In each case, compare the nearest neighbors approximation
to the measured bit error probability, and plot the imaginary versus real part of the decision
statistics.
Lab Report: Your lab report should document the results of the preceding steps in order.
Describe the reasoning you used and the difficulties you encountered.
Tips: Vectorize as many of the functions as possible, including both the bit-to-symbol maps and
the decision rules. Do BPSK and 4-PAM first, where you will only use the real part of the complex
355
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
decision statistics. Leverage this for QPSK and 16-QAM, by replicating what you did for the
imaginary part of the decision statistics as well. To avoid confusion, keep different matlab files
for simulations regarding different signal constellations, and keep the analytical computations
and plots separate from the simulations.
Laboratory Assignment
Let us consider the following simple model of a wireless channel (obtained after filtering and
sampling at the symbol rate, and assuming that there is no ISI). If {b[n]} is the transmitted
symbol sequence, then the complex-valued received sequence is given by
where {w[n] = wc [n] + jws [n]} is an iid complex Gaussian noise sequence with wc [n], ws [n] i.i.d.
N(0, σ 2 = N20 ) random variables. We say that w[n] has variance σ 2 per dimension. The channel
sequence {h[n]} is a time-varying sequence of complex gains.
Equation (6.93) models the channel at a given time as a simple scalar gain h[n]. On the other
hand, as discussed in Example 2.5.6, a multipath wireless channel cannot be modeled as a simple
scalar gain: it is dispersive in time, and exhibits frequency selectivity. However, it is shown in
Chapter 8 that we can decompose complicated dispersive channels into scalar models by using
frequency-domain modulation, or OFDM, which transmits data in parallel over narrow enough
frequency slices such that the channel over each slice can be modeled as a complex scalar.
Equation (6.93) could therefore be interpreted as modeling time variations in such scalar gains.
Rayleigh fading: The channel gain sequence {h[n] = hc [n] + jhs [n]}, where {hc [n]} and {hs [n]}
are zero mean, independent and identically distributed p colored Gaussian random processes. The
reason this is called Rayleigh fading is that |h[n]| = h2c [n] + h2s [n] is a Rayleigh random variable.
Remark: The Gaussianity arises because the overall channel gain results from a superposition of
gains from multiple reflections off scatterers.
Simulation of Rayleigh fading: We will use a simple model wherein the colored channel gain
sequence {h[n]} is obtained by passing white Gaussian noise through a first-order recursive filter,
as follows:
hc [n] = ρhc [n − 1] + u[n]
(6.94)
hs [n] = ρhs [n − 1] + v[n]
where {u[n]} and {v[n]} are independent real-valued white Gaussian sequences, with i.i.d. N(0, β 2 )
elements. The parameter ρ (0 < ρ < 1) determines how rapidly the channel varies. The model for
I and Q gains in (6.94) are examples of first-order autoregressive (AR(1)) random processes: au-
toregressive because future values depend on the past in a linear fashion, and first order because
only the immediately preceding value affects the current one.
Setting up the fading simulator
1) Set up the AR(1) Rayleigh fading model in matlab, with ρ and β 2 as programmable parameters.
356
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
2) Calculate E[|h[n]|2 ] = 2E [h2c [n]] = 2v 2 analytically as a function of ρ and β 2 . Use simulation
to verify your results, setting ρ = .99 and β = .01. You may choose to initialize hc [0] and hs [0]
as iid N(0, v 2 ) in your simulation. Use at least 10,000 samples.
2
3) Plot the instantaneous channel power relative to the average channel power, |h[n]|
2v2
in dB as
a function of n. Thus, 0 dB corresponds to the average value of 2v 2 . You will occasionally see
sharp dips in the power, which are termed deep fades.
4) Define the channel phase θ[n] = angle(h[n]) = tan−1 hhsc [n]
[n]
. Plot θ[n] versus n. Compare with
the plot in part 3; you should see sharp phase changes corresponding to deep fades.
QPSK in Rayleigh fading
Now, implement the model (6.93), where {b[n]} correspond to Gray coded QPSK, using an AR(1)
simulation of Rayleigh fading as in (a). Assume that the receiver has perfect knowledge of the
channel gains {h[n]}, and employs the decision statistic Z[n] = h∗ [n]y[n].
Remark: In practice, the channel estimation required for implementing this is achieved by insert-
ing pilot symbols periodically into the data stream. The performance will, of course, be worse
than with the ideal channel estimates considered here.
5) Do scatter plots of the two-dimensional received symbols {y[n]}, and of the decision statistics
{Z[n]}. What does multiplying by h∗ [n] achieve?
6) Implement a decision rule for the bits encoded in the QPSK symbols based on the statistics
{Z[n]}. Estimate by simulation, and plot, the bit error probability (log scale) as a function of
the average Eb /N0 (dB), where Eb /N0 ranges from 0 to 30 dB. Use at least 10,000 symbols for
your estimate. On the same plot, also plot the analytical bit error probability as a function of
Eb /N0 when there is no fading. You should see a marked degradation due to fading. How do
you think the error probability in fading varies with Eb /N0 ?
Relating simulation parameters to Eb /N0 : The average symbol energy is Es = E[|b[n]|2 ]E[|h[n]|2 ],
and Eb = logEsM . This is a function of the constellation scaling and the parameters β 2 and ρ in
2
the fading simulator (see (b)). You can therefore fix Es , and hence Eb , by fixing β, ρ (e.g., as in
part 2), and fix the scaling of the {b[n]} (e.g., keep the constellation points as ±1 ± j). Eb /N0
can now be varied by varying the variance σ 2 of the noise in (6.93).
Diversity
The severe degradation due to Rayleigh fading can be mitigated by using diversity: the proba-
bility that two paths are simultaneously in a deep fade is less likely than the probability that a
single path is in a deep fade. Consider a receive antenna diversity system, where the received
signals y1 and y2 at the two antennas are given by
Thus, you get two looks at the data stream, through two different channels.
Implement the two-fold diversity system in (6.95) as you implemented (6.93), keeping the fol-
lowing in mind:
• The noises w1 and w2 are independent white noise sequences with variance σ 2 = N20 per di-
mension as before.
• The channels h1 and h2 are generated by passing independent white noise streams through a
first-order recursive filter. In relating the simulation parameters to Eb /N0 , keep in mind that the
average symbol energy now is Es = E[|b[n]|2 ]E[|h1 [n]|2 + |h2 [n]|2 ].
• Use the following maximal ratio combining rule to obtain the decision statistic
357
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The decision statistic above can be written as
where w̃[n] is zero mean complex Gaussian with variance σ 2 (|h1 [n]|2 + |h2 [n]|2 ) per dimension.
Thus, the instantaneous SNR is given by
h i
2
E |(|h1 [n]|2 + |h2 [n]|2 )b[n]| |h1 [n]|2 + |h2 [n]|2 E[|b[n]|2 ]
SNR[n] = =
E [|w̃[n]|2 ] 2σ 2
7) Plot |h1 [n]|2 + |h2 [n]|2 in dB as a function of n, with 0 dB representing the average value as
before. You should find that the fluctuations around the average are less than in part 3.
8) Implement a decision rule for the bits encoded in the QPSK symbols based on the statistics
{Z2 [n]}. Estimate by simulation, and plot (on the same plot as in part 5), the bit error proba-
bility (log scale) as a function of the average Eb /N0 (dB), where Eb /N0 ranges from 0 to 30 dB.
Use at least 10,000 symbols for your estimate. You should see an improvement compared to the
situation with no diversity.
Lab Report: Your lab report should document the results of the preceding steps in order.
Describe the reasoning you used and the difficulties you encountered.
Bonus: A Glimpse of differential modulation and demodulation
Throughout this chapter, we have assumed that a noiseless “template”” for the set of possible
transmitted signals is available at the receiver. In the present context, it means assuming that
estimates for the time-varying fading channel are available. But what is these estimates, which
we used to generate the decision statistics earlier in this lab, are not available? One approach that
avoids the need for explicit channel estimation is based on exploiting the fact that the channel
does not change much from symbol to symbol. Let us illustrate this for the case of QPSK. The
model is exactly as in (6.93) or (6.95), but the channel sequence(s) is(are) unknown a priori. This
necessitates encoding the data in a different way. Specifically, let d[n] be a Gray coded QPSK
information sequence, which contains information about the bits of interest. Instead of sending
d[n] directly, we generate the transmitted sequence b[n] by differential encoding as follows:
(You can initialize b(0) as any element of the constellation, known by agreement to both trans-
mitter and receiver. Or, just ignore the first information symbol in your demodulation). At
the receiver, use differential demodulation to generate the decision statistic for the information
symbol d[n] as follows:
Z nc [n] = y[n]y ∗[n − 1] single path
358
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
What we have just shown is that the component of the received signal orthogonal to the signal
space contains the noise component n⊥ only, and thus does not depend on which signal is sent
under a given hypothesis. Since n⊥ is independent of N, the noise vector in the signal space,
knowing n⊥ does not provide any information about N. These two observations imply that y ⊥
is irrelevant for our hypothesis problem. The preceding discussion is illustrated in Figure 6.9,
and enables us to reduce our infinite-dimensional problem to a finite-dimensional vector model
restricted to the signal space.
Note that our irrelevance argument depends crucially on the property of WGN that its projec-
tions along orthogonal directions are independent. Even though y ⊥ does not contain any signal
component (since these by definition fall into the signal space), if n⊥ and N exhibited statis-
tical dependence, one could hope to learn something about N from n⊥ , and thereby improve
performance compared to a system in which y ⊥ is thrown away. However, since n⊥ and N are
independent for WGN, we can restrict attention to the signal space for our hypothesis testing
problem.
359
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
360
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 7
Channel Coding
We have seen in Chapter 6 that, for signaling over an AWGN channel, the error probability
decays exponentially with SNR, with the rate of decay determined by the power efficiency of
the constellation.
q For example, for BPSK or Gray coded QPSK, the error probability is given
2Eb
by p = Q N0
. We have also seen in Chapter 6 how to engineer the link budget so as
to guarantee a certain desired performance. So far, however, we have only considered uncoded
systems, in which bits to be sent are directly mapped to symbols sent over the channel. We
now indicate how it is possible to improve performance by channel coding, which corresponds to
inserting redundancy strategically prior to transmission over the channel.
A bit of historical perspective is in order. As mentioned in Chapter 1, Shannon showed the
optimality of separate source and channel coding back in 1948. Shannon also provided a theory
for computing the limits of communication performance over any channel (given constraints
such as power and bandwidth). He did not provide a constructive means of attaining these
limits; his proofs employed randomized constructions. For reasons of computational complexity,
it was assumed that such strategies could never be practical. Hence, for decades after Shannon’s
1948 publication, researchers focused on algebraic constructions (for which decoding algorithms
of reasonable complexity could be devised) to create powerful channel codes, but never quite
succeeded in attaining Shannon’s benchmarks. This changed with the invention of turbo codes
by Berrou et al in 1993: their conference paper laid out a simple coding strategy that got to within
a dB of Shannon capacity. They took codes which were easy to encode, and used scramblers to
make them random-like. Maximum likelihood decoding for such codes is too computationally
complex, but Berrou et al showed that suboptimal iterative decoding methods provide excellent
performance with reasonable complexity. It was then realized that a different class of random-
like codes, called low density parity check (LDPC) codes, along with an appropriate iterative
decoding procedure, had actually been invented by Gallager in the 1960s. Since then, there has
been a massive effort to devise and implement a wide variety of “turbo-like” codes (i.e., random-
like codes amenable to iterative decoding), with the result that we can now approach Shannon’s
performance benchmarks over almost any channel.
In this chapter, we provide a glimpse of how Shannon’s performance benchmarks are computed,
how channel codes are constructed, and how iterative decoding works. A systematic and com-
prehensive treatment of information theory and channel coding would take up entire textbooks
in itself, hence our goal is to provide just enough exposure to some of the key ideas to encourage
further exploration.
Chapter Plan: In Section 7.1, we discuss two extreme examples, uncoded transmission and
repetition coding, in order to motivate the need for more sophisticated channel coding strategies.
A generic model for channel coding is discussed in Section 7.2. Section 7.3 introduces Shannon’s
information-theoretic framework, which provides fundamental performance limits for any chan-
361
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
nel coding scheme, and discusses its practical implications. Linear codes, which are the most
prevalent class of codes used in practice, are introduced in Section 7.4. Finally, we discuss belief
propagation decoding, which has been crucial for approaching Shannon performance limits in
practice, in Section 7.5.
Software: Concepts in belief propagation decoding are reinforced by Software Lab 7.1.
7.1 Motivation
0
10
−1
10
P(block error)
−2
10
−3
10
−4
10
−8 −7 −6 −5 −4 −3 −2 −1 0
10 10 10 10 10 10 10 10 10
P(bit error)
Figure 7.1: Block error probability versus bit error probability for uncoded transmission (block
size is 1500 bytes).
Uncoded transmission: First, let us consider what happens without channel coding. Suppose
that we are sending a data block of 1500 bytes (i.e., n = 12000 bits, since 1 byte comprises 8
bits) over a binary symmetric channel (see Chapter 5) with bit error probability p, where errors
occur independently for each bit. Such a BSC could be induced, for example, by making q hard
2Eb
decisions for Gray coded QPSK over an AWGN channel; in this case, we have p = Q N0
.
Let us now define block error as the event that one or more bits in the block are in error. The
probability that all of the bits get through correctly is given by (1 − p)n , so that the probability
of block error is given by
PB = 1 − (1 − p)n
Figure 7.1 plots the probability of block error versus the probability of bit error on a log-log
scale. Despite its simplicity, this computation leads to some useful observations.
(a) For p > 10−4 , the probability of block error is essentially one. This is because the expected
number of errors in the block is given by np, and when this is of the order of one, the probability
of making at least one error is very close to one, because of the law of large numbers. Using this
reasoning, we see that it becomes harder and harder to guarantee reliability as the block size
increases, since p must scale as 1/n. Clearly, this is not a sustainable approach. For example,
even the corruption of a single bit in a large computer file can cause chaos, so we must find more
sophisticated means of protecting the data than just trying to drive the raw bit error probability
to zero.
(b) It is often possible to efficiently detect block errors with very high probability. In practice,
this might be achieved by using a cyclic redundancy check (CRC) code, but we do not discuss the
specific error detection mechanism here. If a block error is detected, then the receiver may ask
the transmitter to retransmit the packet, if such retransmissions are supported by the underlying
362
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
protocols. The link efficiency in this case becomes 1 − PB . Thus, if we can do retransmissions,
uncoded transmission may actually not be a terrible idea. In our example, the link is 90% efficient
(PB = 10−1 ) for bit error probability p around 10−6 − 10−5, and 99% efficient (PB = 10−2 ) for p
around 10−7 − 10−6 . q
2Eb
(c) For Gray coded QPSK, p = Q N0
, so that p = 10−6 requires Eb /N0 of about 10.55 dB.
This is exactly the scenario in the link budget example modeling a 5 GHz WLAN link in Chapter
6. We see, therefore, that uncoded transmission, along with retransmissions, is a viable option
in that setting.
−2
10
p=0.1
−4
10 p=0.01
−6
10
−8
10
P(block error)
−10
10
−12
10
−14
10
−16
10
−18
10
−20
10
2 4 6 8 10 12 14 16 18 20 22
n
Figure 7.2: Error probability decays rapidly as a function of blocklength for a repetition code.
Repetition coding: Next, let us consider the other extreme, in which we send n copies of a single
bit over a BSC with error probability p. That is, we either send a string of n zeros, or a string of
n ones. The channel may flip some of these bits. Since the errors are independent, the number of
errors is a binomial random variable, Bin(n, p). For p < 12 , the average number of bits in error,
np < n/2, hence a natural decoding rule is to employ majority logic: decide on 0 if the majority
of received bits is zero, and on 1 otherwise. Taking n to be odd for simplicity (otherwise we
need to specify a tiebreaker when there are an equal number of zeros and ones), a block error
occurs if the number of errors is ⌈n/2⌉ or more. Using the binomial PMF, we have the following
expression for the block error probability:
n
X n
PB = pm (1 − p)m
m
m=⌈n/2⌉
Figure 7.2 plots the probability of block error versus n for p = 10−1 and p = 10−2 . Clearly,
PB → 0 as n → ∞, so we are doing well in terms of reliability. To see why, let us invoke the LLN
again: the average number of errors is np < ⌈n/2⌉, so that, as n → ∞, the number of errors is
smaller than ⌈n/2⌉ with probability one. However, we are only sending one bit of information for
every n bits that we send over the channel, corresponding to a code rate of 1/n (one information
bit for every n transmitted bits), which tends to zero as n → ∞.
We have invoked the LLN to explain the performance of both uncoded transmission and repetition
coding for large n, but neither of these approaches provides reliable performance at nonzero
coding rates. As n → ∞, the block error rate PB → 1 for uncoded transmission, while the
code rate tends to zero for the repetition code. However, it is possible to design channel coding
schemes between these two extremes which provide arbitrarily reliable communication (PB → 0
as n → ∞) at non-vanishing code rates. The existence of such codes is guaranteed by LLN-style
363
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
arguments. For example, for a BSC with crossover probability p, as n gets large, the number
of errors clusters around np. Thus, the basic intuition is that, if we are able to insert enough
redundancy to correct a number of errors of the order of np, then we should be able to approach
zero block error probability. Giving precise form to such existence arguments is the realm of
information theory, which can be used to establish fundamental performance limits for almost
any reasonable channel model, while coding theory concerns itself with constructing practical
coding schemes that approach these performance limits. A detailed exposition of information
and coding theory is well beyond our scope, but our goal here is to provide just enough exposure
to stimulate and guide further exploration.
Hard or Estimated
Codeword soft decisions information
Information
x y ^
u Encoder Channel Decoder u
(k bits) (n bits) (n outputs) (k bits)
Figure 7.3 provides a high-level view of how a binary channel code can be used over a com-
munication link. The encoder maps the k-bit information word u to an n-bit codeword x. As
discussed shortly, the “channel” shown in the figure is an abstraction that includes operations
at the transmitter and the receiver, in addition to the physical channel. The output y of the
channel is a length n vector of hard decisions (bits) or soft decisions (real numbers) on the coded
bits. These are then used by the decoder to provide an estimate û of the information bits. We
declare a block error if û 6= u.
Information Codeword
x Bit to symbol
u Encoder Interleaver Modulator
(k bits) (n bits) (n bits) mapping (n/4 16QAM
symbols)
Physical
channel
Estimated Hard or
soft decisions
information Symbol decisions Equalizer/
^u Deinterleaver
Decoder y to bit decisions n/4 symbol demodulator
(n outputs) mapping
(k bits) decisions
bit decisions (hard or soft)
364
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Figure 7.4 provides a specific example illustrating how the preceding abstraction connects to
the transceiver design framework developed in earlier chapters. It shows a binary code used
for signaling over an AWGN channel using Gray coded 16QAM. We see that n coded bits are
mapped to n/4 complex-valued symbols at the transmitter. Since channel codes are typically
designed for random errors, we have inserted an interleaver between the channel encoder and
the modulator in order to disperse potential correlations in errors among bits. The modulator
could employ linear modulation as described in Chapter 4, with demodulation as in Chapter
6 for an ideal AWGN channel, or more sophisticated equalization strategies for handling the
intersymbol interference due to channel dispersion (see Chapter 8). An alternative frequency
domain modulation strategy, termed Orthogonal Frequency Division Multiplexing (OFDM), for
handling channel dispersion are also discussed in Chapter 8. However, for our present purpose
of discussing channel coding, we abstract all of these details away. Indeed, as shown in Figure
7.4, the “channel” from Figure 7.3 includes all of these operations, with the final output being
the hard or soft decisions supplied to the decoder. Problem 7.4 explores the nature of this
equivalent channel for some example constellations. Often, even if the physical channel has
memory, the interleaving and deinterleaving operations allow us to model the equivalent channel
as memoryless: the output yi depends only on coded bit xi , and the channel is completely
characterized by the conditional density p(yi |xi ). For example, for hard decisions, we may model
the equivalent channel as a binary symmetric channel with error probability p. For soft decisions,
yi may be a real number, or may comprise several bits, hence the channel model would be a little
more complicated.
The preceding approach, which neatly separates out the binary channel code from the signal
processing related to transmitting and receiving over a physical channel, is termed bit interleaved
coded modulation (BICM). If we use a binary code of rate Rc = k/n and a symbol alphabet of
size M, then the overall rate of communication over the channel is given by Rc log2 M bits per
symbol. From Chapter 4, we know that, using ideal Nyquist signaling, we can signal at rate W
complex-valued symbols/sec over a bandlimited passband channel of bandwidth W . Thus, the
rate of communication in bits per second (bps) is given by Rb = Rc W log2 M. The bandwidth
efficiency, or spectral efficiency, can now be defined as
Rb k
r= = Rc log2 M = log2 M (7.1)
W n
in bps/Hz, or bits/symbol (for ideal Nyquist signaling). Comparing with Chapter 4 (where we
termed this quantity ηW ), what has changed is that we must now account for the rate of the
binary code that we have wrapped around our communication link.
We also need to revisit our SNR concepts and carefully keep track of information bits, coded
bits, and modulated symbols, when computing signal power or energy. The quantity Eb refers to
energy per information bit. When we encode these bits using a binary code of rate Rc , the energy
per coded bit is Ec = Rc Eb (information bits per coded bit, times energy per information bit).
When we then put the coded bits through a modulator that outputs M-ary symbols (log2 M
coded bits per symbol), we obtain that the energy per modulated symbol is given by
Es = Ec log2 M = Rc Eb log2 M
In short, we have
Es = rEb (7.2)
which makes sense: energy per symbol equals the number of information bits per symbol, times
the energy per information bit. While we have established (7.2) for BICM, it holds generally,
since it is just a matter of energy bookkeeping.
BICM is a practical approach which applies to any physical communication channel, and the
significant advances in channel coding over the past two decades ensure that there is little loss in
365
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
optimality due to this decoupling of coding and modulation. In the preceding example, we have
used it in conjunction with Nyquist sampling, which transforms the continuous time channel
into a discrete time channel carrying complex-valued symbols. However, we can also view the
Nyquist sampled channel in greater generality, in which the inputs to the effective channel are
complex-valued symbols, and the outputs are the noisy received samples at the output of the
equalizer/demodulator. A code of rate R bits/channel use over this channel is simply a collection
of 2N R discrete time complex-valued vectors of length N, where N is the number of symbols sent
over the channel. In our BICM example with 16 QAM, we have R = Rc log2 M = 4Rc and
N = n/4, but this framework also accommodates approaches which tie coding and modulation
more closely together. The tools of information theory can be used to provide fundamental
performance limits for any such coded modulation strategy. We provide a glimpse of such results
in the next section.
The inputs may be constrained in some manner (e.g., to take values from a finite alphabet, or to
be limited in average or peak power). A channel code of length n and rate R bits per channel use
contains 2nR codewords. That is, we employ M-ary signaling with M = 2nR , where each signal,
(j) (j)
or codeword, is a vector of length n, with the jth codeword denoted by X(j) = (X1 , ..., Xn )T ,
nR
j = 1, ..., 2 .
Shannon’s channel coding theorem gives us a compact characterization of the channel capacity
C (in bits per channel use) for a DMC. It states that, for any code rate below capacity (R < C),
and for large enough block length n, there exist codes and decoding strategies such that the
block error probability can be made arbitrarily small. The converse of this result also holds: for
code rates above capacity (R > C), the block error probability is bounded away from zero for
any coding strategy. The fundamental intuition is that, for large block lengths, events that cause
errors cluster around some well-defined patterns with very high probability (because of the law
of large numbers), hence it is possible to devise channel codes that can correct these patterns as
long as we are not trying to fit in too many codewords.
Giving the expression for the Shannon capacity of a general DMC is beyond our scope, but
we do provide intuitive derivations of the channel capacity for the two DMC models of greatest
importance to us: the discrete time AWGN channel and the BSC. We then discuss, via numerical
examples, how these capacity computations can be used to establish design guidelines.
Discrete time AWGN channel: Let us consider the following real-valued discrete time AWGN
channel model, where we send a codeword consisting of a sequence of real numbers {Xi , i =
1, ..., n}, and obtain the noisy outputs
Yi = Xi + Ni , i = 1, ..., n (7.3)
366
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where Ni ∼ N(0, N) are i.i.d. Gaussian noise samples. We impose a power constraint E[Xi2 ] ≤ S.
This model is called the discrete time AWGN channel. For Nyquist signaling over a continuous-
time bandlimited AWGN channel with bandwidth W , we can signal at the rate of W complex-
valued symbols per second, or 2W real-valued symbols/second. This can be interpreted as getting
to use the discrete time AWGN channel (7.3) 2W times per second. Thus, once we figure out
the capacity for the discrete time AWGN channel in bits per channel use, we will be able to
specify the maximum rate at which information can be transmitted reliably over a bandlimited
continuous time channel.
A channel code over the discrete time channel (7.3) of rate R bits/channel contains 2nR codewords,
(j) (j)
where the jth codeword X(j) = (X1 , ..., Xn )T satisfies the average power constraint if
n
(j)
X
(j) 2
||X || = |Xi |2 ≤ nS
i=1
where N = (N1 , ..., Nn )T is the noise vector. The expected energy of the noise vector equals
n
2
X
E[Ni2 ] = nN
E ||N|| = (7.5)
i=1
2
Pn (j) 2
Pn (j) (j)
E [||Y|| ] = i=1 E[|Xi + Ni | ] = i=1 |Xi |2 + E[Ni2 ] + 2E[Xi Ni ]
(7.6)
(j) 2
= ||X || + nN ≤ n(S + N)
(The cross term involving signal and noise drops away, since they are independent and the noise
is zero mean.)
We now provide a heuristic argument as to how reliable performance can be achieved by letting
the code block length n get large. Invoking the law of large numbers, random quantities cluster
around their averages with highp probability, so that the received vector Y lies inside an n-
dimensional sphere √ of radius n(S + N), and the noise vector N lies inside an n-dimensional
sphere of radius nN . Consider
√ now a “decoding sphere” around each codeword with radius
just a little larger than nN. Then we make correct decisions with high probability: if we
send codeword j, the noise vector N is highly unlikely to push us outside the decoding sphere
(j)
centered√ around X . The question then is: what is the largest number ofpdecoding spheres of
radius nN that we can pack inside the n-dimensional sphere of radius n(S + N) in which
the received vector Y lives? This sphere packing argument, depicted in Figure 7.5, provides an
estimate of the largest number of codewords 2nR that we can accommodate while guaranteeing
reliable communication.
We now invoke the result that the volume of an n-dimensional sphere of radius r is Kn r n , where
Kn is a constant depending on n whose explicit form we do not need. We can now estimate the
maximum achievable rate R as follows:
p n
Kn n(S + N)
S
n/2
nR
2 ≤ √ n = 1+
K nN N
n
367
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
X
(j) nN
...
n(S+N)
Figure 7.5: Sphere packing argument for characterizing rate of reliable communication.
Continuous time bandlimited AWGN channel: We now use this result to compute the
maximum spectral efficiency attainable over a continuous time bandlimited AWGN channel.
The complex baseband channel corresponding to a passband channel of physical bandwidth W
spans [−W/2, W/2] (taking the reference frequency at the center of the passband). Thus, Nyquist
signaling over this channel corresponds to W complex-valued symbols per second, or 2W uses per
second of the real-valued channel (7.3). Since each complex-valued symbol corresponds to two
uses of the real discrete time AWGN channel, the capacity of the bandlimited channel is given by
2W Cd−AW GN bits per second. We still need to specify NS . For each complex-valued sample, the
energy per symbol Es = rEb (bits/symbol, times energy per bit, gives energy per complex-valued
symbol). The noise variance per real dimension is σ 2 = N20 , hence the noise variance seen by a
complex symbol is 2σ 2 = N0 . We obtain
S Es
= (7.8)
N N0
Putting these observations together, we can now state the following formula for the capacity of
the bandlimited AWGN channel.
368
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Capacity of the bandlimited AWGN channel:
Es Es
CBL W, = W log2 1 + bits per second (7.9)
N0 N0
It can be checked that we get exactly the same result for a physical (real-valued) baseband
channel of physical bandwidth W . Such a channel spans [−W, W ], but the transmitted signal is
constrained to be real-valued. Signals over such a channel can therefore be represented by 2W
real-valued samples per second, which is the same as for a passband channel of bandwidth W .
For a system communicating reliably at a bit rate of Rb bps over such a bandlimited channel,
we must have Rb < CBL . Using (7.9), we see that the spectral efficiency r = RWb in bps/Hz of the
system must therefore satisfy
Es Eb
r < log2 1 + = log2 1 + r bps/Hz or bits/complex symbol (7.10)
N0 N0
12
Minimum required Eb/N0 (dB)
10
−2
0 1 2 3 4 5 6 7
Spectral efficiency r (bps/Hz)
369
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
d2min
was defined as Eb . Numerical examples showing how channel coding fundamentally changes the
achievable power-bandwidth tradeoffs are explored in more detail in Section 7.3.1. We provide
here a quick example that illustrates how (7.11) relates to real world scenarios.
Example 7.3.1 (evaluating system feasibility using Shannon limits) A company claims
to have developed a wireless modem with a receiver sensitivity of -82 dBm and a noise figure of
6 dB, operating at a rate of 100 Mbps over a bandwidth of 20 MHz. Do you believe their claim?
Shannon limit calculations: Modeling the channel as an ideal bandlimited AWGN channel, the
proposed modem must satisfy (7.11). Assuming no excess bandwidth, r = 100 Mbps
20 MHz
= 5 bps/Hz
25 −1
or bits/symbol. From (7.11), we know that we must have (Eb /N0 )required > 5 = 6.2, or 7.9
dB. The noise PSD N0 is given by −174 + 6 = −168 dBm over 1 Hz. The energy per bit equals
the received power divided by the bit rate, so that the actual Eb /N0 for the advertised receiver
sensitivity (i.e., the receive power at which the modem can operate) is given by (Eb /N0)actual =
−82−10 log10 108 +168 = 6 dB. This is 1.9 dB short of the Shannon limit, hence our first instinct
is not to believe them.
Tweaking the channel model: What if the channel was not a single AWGN channel, but two
AWGN channels in parallel? As we shall see when we discuss multiple antenna systems in
Chapter 8, it is possible to use multiple antennas at the transmitter and receiver to obtain spatial
degrees of freedom in addition to those in time and frequency. If there are indeed two spatial
channels that are created using multiple antennas and we can model each of them as AWGN,
2.5
then the spectral efficiency per channel is 5/2 = 2.5 bps/Hz, and (Eb /N0)required > 2 2.5−1 = 1.86,
or 2.7 dB. Since the actual Eb /N0 is 6 dB, the system is operating more than 3 dB away from
the Shannon limit. Since we do have practical channel codes that get to within a dB or less of
Shannon capacity, the claim now becomes believable.
Transmitted Received
1−p
0 0
p
p
1 1
1−p
Binary symmetric channel: We now turn our attention to the BSC with crossover probability
p shown in Figure 7.7, which might, for example, be induced by hard decisions on an AWGN
channel. Note that we are only interested in 0 ≤ p ≤ 21 . If p > 12 , then we can switch zeros and
ones at the output of the channel to get back to a BSC with crossover probability p̃ = 1 − p < 21 .
The BSC can also be written as an additive noise channel, analogous to the discrete time AWGN
channel (7.3):
Yi = Xi ⊕ Ni , i = 1, ..., n (7.12)
where the exclusive or (XOR) symbol ⊕ corresponds to addition modulo 2, which follows the
rules:
1⊕1= 0⊕0 =0
(7.13)
1⊕0= 0⊕1 =1
Thus, we can flip a bit by adding (modulo 2) a 1 to it. The probability of a bit flip is p. Thus,
the noise variables Ni are i.i.d. Bernoulli random variables with P [Ni = 1] = p = 1 − P [Ni = 0].
370
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Just as with the AWGN channel, we now develop a sphere packing argument to provide an
intuitive derivation of the BSC channel capacity. Of course, our concept of distance must be
different from the Euclidean distance considered for the AWGN channel. Define the Hamming
distance between two binary vectors of equal length to be the number of places in which they
differ. For a codeword of length n, the average number of errors equals np. Assuming that
the number of errors clusters around this average for large n, define a decoding sphere around a
codeword as all sequences which are at Hamming distance of np or less from it. The number of
such sequences is called the volume of the decoding sphere. By virtue of (7.12) and (7.13), we
see that this volume is exactly equal to the number of noise vectors N = (N1 , ..., Nn )T with np
or fewer ones (the number of ones
in a sequence is called its weight). The number of n-length
n
vectors with weight m equals , hence the number of vectors with weight at most np is
m
given by
np
X n
Vn = (7.14)
m
m=0
We state without proof the following asymptotic approximation for Vn for large n:
1
Vn ≈ 2nHB (p) , 0 ≤ p < (7.15)
2
where HB (·) is the binary entropy function, defined by
HB (p) = −p log2 p − (1 − p) log2 (1 − p) , 0 ≤ p ≤ 1 (7.16)
We plot the binary entropy function in Figure 7.8(a). Note the symmetry around p = 12 . This is
because, as mentioned earlier, we can map p > 21 to 1 − p < 12 by switching the roles of 0 and 1
at the output.
1 1
0.9 0.9
0.8 0.8
BSC Capacity (bits/channel use)
0.7 0.7
Entropy (bits/symbol)
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p Crossover probability
Figure 7.8: The binary entropy function HB (p) and the capacity of a BSC with crossover prob-
ability p, given by 1 − HB (p).
For a length n code of rate R bits/channel use, the number of codewords equals 2nR . The total
number of binary sequences of length n, or the entire volume of the space we are working in, is
2n . Thus, if we wish to put a decoding sphere of volume Vn around each codeword, the maximum
number of codewords we can fit must satisfy
2n 2n
2nR ≤ ≈ nH (p) = 2n(1−HB (p)
Vn 2 B
371
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
which gives
R ≤ 1 − HB (p)
It can be rigorously demonstrated that the right-hand side actually equals the capacity of the
BSC. We therefore state this result formally.
Capacity of BSC: The capacity of a BSC with crossover probability p is given by
372
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
or broadcast), we may use longer block lengths, and may even layer an outer code to clean up the
residual errors from an inner turbo-like coded modulation scheme. Another common feature of
many systems is the use of adaptive coded modulation, in which the spectral efficiency is varied
as a function of the channel quality. BICM is particularly convenient for this purpose, since it
allows us to mix and match a menu of well-optimized binary codes at different rates (e.g., ranging
from 41 to 15
16
) with a menu of standard constellations (e.g., QPSK, 8PSK, 16QAM, 64QAM) to
provide a large number of options.
A detailed description of turbo-like codes is beyond our present scope, but we do provide a
discussion of decoding via message passing after a basic exposition of linear codes.
373
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Notational convention: While we have preferred working with column vectors thus far, in defer-
ence to the convention in most coding theory texts, we express codewords as row vectors. Letting
u and v denote two binary vectors of the same length, we denote by u ⊕ v their component by
component addition over the binary field. For example, (00110) ⊕ (10101) = (10011).
Example 7.4.1 (Repetition code) An (n, 1) repetition code has only two codewords, the all-
one codeword x1 = (1, ..., 1) and the all-zero codeword x0 = (0, ..., 0). We see that x1 ⊕ x1 =
x0 ⊕ x0 = x0 and that x1 ⊕ x0 = x0 ⊕ x1 = x1 , so that this is indeed a linear code. There are
only 21 codewords, so that the dimension k = 1. Thus, the code, when viewed as a vector space
over the binary field, is spanned by a single basis vector, x1 . While the encoding operation is
trivial (just repeat the information bit n times) for this code, let us write it in a manner that
leads into a more general formalism. For example, for the (5, 1) repetition code, the information
bit u ∈ {0, 1} is mapped to codeword x as follows:
x = uG
where
G = (1 1 1 1 1) (7.18)
is a matrix whose rows (just one row in this case) provide a basis for the code.
Example 7.4.2 (Single parity check code) An (n, n − 1) single parity check code takes as
input n − 1 unconstrained information bits u = (u1 , ..., un−1), maps them unchanged to n − 1 bits
in the codeword, and adds a single parity check bit to obtain a codeword x = (x1 , ..., xn−1 , xn ).
For example, we can set the first n−1 code bits to the information bits (x1 = u1 , ..., xn−1 = un−1 )
and append a parity check bit as follows:
xn = x1 ⊕ x2 ... ⊕ xn−1
Here, the code dimension k = n − 1, so that we can describe the code using n − 1 linearly
independent basis vectors. For example, for the (5, 4) single parity check code, a particular
choice of basis vectors, put as rows of a matrix as follows:
1 0 0 0 1
0 1 0 0 1
G= 0 0 1 0 1
(7.19)
0 0 0 1 1
We can now check that any codeword can be written as
x = uG
where u = (u1 , ..., u4 ) is the information bit sequence.
Generator matrix: While the preceding examples are very simple, they provide insight into
the general structure of linear codes. An (n, k) linear code can be represented by a basis with
k linearly independent vectors, each of length n. Putting these k basis vectors as the rows of a
k × n matrix G, we can then define a mapping from k information bits, represented as a 1 × k
row vector u, to n code bits, represented as a 1 × n row vector x, as follows:
x = uG (7.20)
The matrix G is called the generator matrix for the code, since it can be used to generate all 2k
codewords by cycling through all possible values of the information vector u.
Dual codes: Drawing again on our experience with real-valued vector spaces, we know that,
for any k-dimensional subspace C in an n-dimensional vector space, we can find an orthogonal
n − k dimensional subspace C ⊥ such that every vector in C is orthogonal to every vector in C ⊥ .
The subspace C ⊥ is itself an (n, n − k) code, and C and C ⊥ are said to be duals of each other.
374
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Example 7.4.3 (Duality of repetition and single parity check codes) It can be checked
that the (5, 4) single parity check code and (5, 1) repetition codes are duals of each other. That
is, each codeword in the (5, 4) code is orthogonal to each codeword in the (5, 1) code. Since
codewords are linear combinations of rows of the generator matrix, it suffices to check that each
row of a generator matrix for the (5, 4) code is orthogonal to each row of a generator matrix for
the (5, 1) code. Specifically,
1 0 0 0
0 1 0 0
T
G(5,1) G(5,4) = (11111)
0 0 1 0 =0
0 0 0 1
1 1 1 1
Parity check matrix: The preceding discussion shows that we can describe an (n, k) linear
code C by specifying its dual code C ⊥ . In particular, a generator matrix for the dual code serves
as a parity check matrix H for C, in the sense that an n-dimensional binary vector x lies in C if
and only if it is orthogonal to each row of H. That is,
Each row of the parity check matrix defines a parity check equation. Thus, for a parity check
matrix H of dimension (n − k) × n, each codeword must satisfy n − k parity check equations.
Equivalently, if G is a generator matrix for C, then it must satisfy
HGT = 0 (7.22)
In our examples, the generator matrix for the (5, 1) repetition code is a parity check matrix for
the (5, 4) code, and vice versa.
For an (n, k) code with large n and k, it is clearly difficult to check by brute force search
enumeration over 2k codewords whether a particular n-dimensional vector y is a valid codeword.
However, for a linear code, it becomes straightforward to verify this using only n − k parity check
equations, as in (7.21). These parity check equations, which provide the redundancy required
to overcome channel errors, are important not only for verification of correct termination of
decoding, but also play a crucial role during the decoding process, as we illustrate shortly.
Non-uniqueness: An (n, k) linear code C is a unique subspace consisting of a set of 2k code-
words, and its dual (n, n−k) code C ⊥ is a unique subspace comprising 2n−k codewords. However,
in general, neither the generator nor the parity check matrix for a code are unique, since the
choice of basis for a nontrivial subspace is not unique. Thus, while the generator matrix for
the (5, 1) code is unique because of its trivial nature (one dimension, binary field), the generator
matrix for the (5, 4) code is not. For example, by taking linear combinations of the rows in (7.19),
we obtain another linearly independent basis that provides an alternative generator matrix for
the (5, 4) code:
1 1 0 0 0
0 1 1 0 0
G̃ = 1 1 1 0 1
(7.23)
0 1 1 1 1
From (7.20), we see that different choices of generator matrices correspond to different ways of
encoding a k-dimensional information vector u into an n-dimensional codeword x ∈ C.
Systematic encoding: A systematic encoding is one in which the information vector u appears
directly in x (without loss of generality, we can take the bits of u to be the first k bits in x), so
375
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
that there is a clear separation between “information bits” and “parity check” bits. In this case,
the generator matrix can be written as
G = [Ik P] systematic encoding (7.24)
where Ik denotes the k × k identity matrix, and P is a k × (n − k) matrix specifying how the n − k
parity bits depend on the input. The identity matrix ensures that the k rows of G are linearly
independent, so this does represent a valid generator matrix for an (n, k) code. The ith row of
the generator matrix (7.24) corresponds to an information vector u = (u1 , ..., uk ) with ui = 1
and uj = 0, j 6= i. Note that, even when we restrict the encoding to be systematic, the generator
matrix is not unique in general. The generator matrices (7.18) and (7.19) for the (5, 1) and (5, 4)
codes correspond to systematic encoding. The encoding of the (5, 4) code corresponding to the
generator matrix in (7.23) is not systematic.
Reading off a parity check matrix from a systematic generator matrix: If we are given
a systematic encoding of the form (7.24), we can easily read off a parity check matrix as follows:
H = [−PT In−k ]
(7.25)
where the negative sign can be dropped for the binary field. The identity matrix ensures that
n − k rows of H are linearly independent, hence this is a valid parity check matrix for an (n, k)
linear code. We leave it as an exercise to verify, by directly substituting from (7.24) and (7.25),
that HGT = 0.
Example 7.4.4 (running example: a (5, 2) linear code) Let us now construct a somewhat
less trivial linear code which will serve as a running example for illustrating some basic concepts.
Suppose that we have k = 2 information bits u1 , u2 ∈ {0, 1} that we wish to protect. We map
this (using a systematic encoding) to a codeword of length 5 using a combination of repetition
and parity check, as follows:
x = (u1 , u2 , u1, u2 , u1 ⊕ u2 ) (7.26)
A systematic generator matrix for this (5, 2) code can be constructed by considering the two
codewords corresponding to u = (1, 0) and u = (0, 1), respectively, which gives:
1 0 1 0 1
G= (7.27)
0 1 0 1 1
We can read off a parity check matrix using (7.24) and (7.25) to obtain:
1 0 1 0 0
H= 0 1 0 1 0 (7.28)
1 1 0 0 1
Any codeword x = (x1 , ..., x5 ) must satisfy HxT = 0, which corresponds to the following parity
check equations:
x1 ⊕ x3 = 0
x2 ⊕ x4 = 0
x1 ⊕ x2 ⊕ x3 = 0
Suppose, now, that we transmit the (5, 2) code that we have just constructed over a BSC with
crossover probability p. That is, we send a codeword x = (x1 , ..., x5 ) using the channel n = 5
times. According to our discrete memoryless channel model, errors occur independently for each
of the code symbols, and we get the output y = (y1 , ..., y5 ), where P [yi |xi ] = xi with probability
376
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1 − p, and P [yi |xi ] = xi ⊕ 1 (i.e, the bit is flipped) with probability p. How should we try to
decode (i.e., estimate which codeword x was sent from the noisy output y)? And how do we
evaluate the performance of our decoding rule? In order to relate these to the structure of the
code, it is useful to reiterate the notion of Hamming distance, and to introduce the concept of
Hamming weight.
Hamming distance: The Hamming distance dH (u, v) between two binary vectors u and v of
equal length is the number of places in which they differ.
For example, the Hamming distance between the two rows of the generator matrix G in (7.27)
is given by dH (g1 , g2 ) = 4.
Hamming weight: The Hamming weight wH (u) of a binary vector u equals the number of
ones it contains.
For example, the Hamming weight of each row of the generator matrix G in (7.27) is 3.
The Hamming distance between two vectors u and v is the Hamming weight of their binary sum:
dH (u, v) = wH (u ⊕ v) (7.29)
The (5, 2) code is small enough that we can simply list all four codewords: 00000, 10101, 01011,
and 11110, from which we see that wmin = dmin = 3.
Guarantees on error correction: A code is guaranteed to correct t errors if
2t + 1 ≤ dmin (7.31)
It is quite easy to see why: we can set up non-overlapping “decoding spheres” of radius t around
any codeword. The decoding sphere of radius t around a codeword x is defined as the set of
vectors y within Hamming distance t of the codeword, as follows:
Dt (x) = {y : dH (y, x) ≤ t}
The condition (7.31) guarantees that these decoding spheres do not overlap. Thus, if we make at
most t errors, we are guaranteed that the received vector falls into the unique decoding sphere
corresponding to the transmitted codeword.
Erasures: There are some scenarios for which it is useful to introduce the concept of erasures,
which correspond to assigning a “don’t know” to a symbol rather than making a hard decision.
Using a similar argument as before, we can state that a code is guaranteed to correct t errors
and e erasures if
2t + e + 1 ≤ dmin (7.32)
377
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Since it is “twice as easy” to correct erasures than to correct errors, we may choose to design
a demodulator to put out erasures in regions where we are uncertain about our hard decision.
For a binary channel, this means that our input alphabet is {0, 1} but our output alphabet is
{0, 1, ǫ}, where ǫ denotes erasure. As we see in Section 7.5, we can go further down this path in
hedging our bets, with the decoder using soft decisions which take values in a real-valued output
alphabet.
Running example: Our (5, 2) code has dmin = 3, and hence can correct 1 error or 2 erasures (but
not both). Let us see how we would structure brute force decoding of a single error, by writing
down which vectors fall within decoding spheres of unit radius around each codeword, and also
pointing out which vectors are left over. This is done by writing all 25 possible binary vectors in
what is termed a standard array.
00000 10101 01011 11110
10000 00101 11011 01110
01000 11101 00011 10110
00100 10001 01111 11010
00010 10111 01001 11100
00001 10100 01010 11111
11000 01101 10011 00110
01100 11001 00111 10010
Let us take advantage of this example to describe the general structure of a standard array
for an (n, k) linear code. The array has 2n−k rows and 2k columns, and contains all possible
binary vectors of length n. The first row of the array consists of the 2k codewords, starting with
the all-zero codeword. The first column consists of error patterns ordered by weight (ties broken
arbitrarily), starting with no errors in the first row, e1 = 0. In general, denoting the first element
of the ith row as the error pattern ei , the jth element in the ith row is ai,j = ei + xj , where xj
denotes the jth codeword, j = 1, ..., 2k . That is, the (i, j)th element in the standard array is the
jth codeword translated by the ith error pattern. For the standard array in Table 7.1 for (5, 2)
code, the first row consists of the four codewords. We demarcate it from all the other entries in
the table, which are not codewords, by a horizontal line. The next five rows correspond to the
five possible one-bit error patterns, which we know can be corrected. Thus, for the jth column,
the first six rows correspond to the decoding sphere of Hamming radius one around codeword xj .
We demarcate this by drawing a double line under the sixth row. Beyond these, the first entries
of the remaining row are arbitrarily set to be minimum weight binary vectors that have not
appeared yet. We cannot guarantee that we can correct these error patterns. For example, the
first and fourth entries in rows 7 and 8 are both equidistant from the first and fourth codewords,
hence neither of these patterns can be mapped unambiguously to a decoding sphere.
Bounded distance decoding: For a code capable of correcting at least t errors, bounded distance
decoding at radius t corresponds to the following rule: decode a received word to the nearest
codeword (in terms of Hamming distance), as long as the distance is at most t, and declare decod-
ing failure if there is no such codeword. A conceptually simple, but computationally inefficient,
way to think about this is in terms of the standard array. For our running example in Table 7.1,
bounded distance decoding with t = 1 could be implemented by checking if the received word is
anywhere in the first six rows, and if so, decode it to the first element of the column it falls in.
For example, the received word 10001 is in the fourth row and second column, and is therefore
decoded to the second codeword 10101. If the received word is not in the first six columns,
then we declare decoding failure. For example, the received word 01101 is in the seventh row
and hence does not fall in the decoding sphere of radius one for any codeword, hence we would
declare decoding failure if we received it.
378
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Each row of the standard array is the translation of the code C by its first entry, ei , and is called
a coset of the code. The first entry ei is called the coset leader for the ith coset, i = 1, ..., 2n−k .
We now note that a coset can be described far more economically than by listing all its elements.
Applying a parity check matrix to the jth element of the ith coset, H(xj ⊕ ei )T = HeTi , we
get an answer that depends only on the coset leader, since HxT = 0 for any codeword x. We
therefore define the syndrome for the ith coset as si = HeTi . The syndrome is a binary vector of
length n − k, and takes 2n−k possible values. The coset leaders and syndromes corresponding to
Table 7.1, using the parity check matrix (7.25), are listed in Table 7.2.
Table 7.2: Mapping between coset leaders and syndromes for the (5, 2) code using (7.25)
Bounded distance decoding using syndromes: Consider a received word y. Compute the syndrome
s = HyT . If the syndrome corresponds to a coset leader e that is within the decoding sphere of
interest, then we estimate the transmitted codeword as x̂ = y + e. Consider again the received
word y = 10001 and compute its syndrome s = HyT = 100. This corresponds to the fourth
row in Table 7.2, which we know is within a decoding sphere of radius one. The coset leader is
e = 00100. Adding this to the received word, we obtain x̂ = y + e = 10101, which is the same
result that we obtained by direct look-up in the standard array.
Performance of bounded distance decoding: Correct decoding occurs if the received word is
mapped to the transmitted word. For bounded distance decoding with t = 1 for the (5, 2)
code, this happens if and only if there is at most one error. Thus, when a codeword for the (5, 2)
code is sent over a BSC with crossover probability p, the probability of correct decoding is given
by
5 5
Pc = (1 − p) + p(1 − p)4
1
If the decoding is not correct, let us term the event incorrect decoding. One of two things
happen when the decoding is incorrect: the received word falls outside the decoding sphere of all
codewords, hence we declare decoding failure, or the received word falls inside the decoding sphere
of one of the incorrect codewords, and we have an undetected error. The sum of the probabilities
of these two events is Pe = 1 − Pc . Since decoding failure (where we know something has gone
wrong) is less damaging than decoding error (where we do not realize that we have made errors),
we would like its probability Pdf to be much larger than the probability Pue of undetected error.
For large block lengths n, we can typically design codes for which this is possible, hence we
often take Pe as a proxy for decoding failure. For our simple running example, we compute the
probabilities of decoding failure and decoding error in Problem 7.13. Exact computations of Pdf
and Pue are difficult for more complex codes, hence we typically resort to bounds and simulations.
Even when we use syndromes to infer coset leaders rather than searching the entire standard
array, look-up based approaches to decoding do not scale well as we increase the code block
length n and the decoding radius. A significant achievement of classical coding theory has
been to construct codes whose algebraic structure can be exploited to devise efficient means
of mapping syndromes to coset leaders for bounded distance decoding (such methods typically
379
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
involve finding roots of polynomials over finite fields). However, much of the recent progress
in coding has resulted from the development of iterative decoding algorithms based on message
passing architectures, which permit efficient decoding of very long, random-looking codes which
can approach Shannon limits. We now provide a simple illustration of message passing via our
running example of the (5, 2) code.
x1
c1
x2
x3 c2
x4
c3
x5
Figure 7.9: Tanner graph for (5, 2) code with parity check matrix given by (7.28).
Tanner graph: A binary linear code with parity check matrix H can be represented as a Tanner
graph, with variable nodes representing the coded bits, and check nodes representing the parity
check equations. A variable node is connected to a parity check node by an edge if it appears
in that parity check equation. A Tanner graph for our running example (5, 2) code, based on
the parity check matrix (7.28), is shown in Figure 7.9. Check node c1 corresponds to the parity
check equation specified by the first row of (7.28), x1 ⊕ x3 = 0, and is therefore connected to x1
and x3 . Check node c2 corresponds to the second row, x2 ⊕ x4 = 0, and is therefore connected to
x2 and x4 . Check node c3 corresponds to the third row, x1 ⊕ x2 ⊕ x5 = 0, and is connected to x1 ,
x2 , and x5 . The degree of a node is defined to be the number of edges incident on it. The variable
nodes x1 , ..., x5 have degrees 2, 2, 1, 1, and 1, respectively. The check nodes c1 , c2 , c3 have degrees
2, 2 and 3, respectively. The success of message passing on Tanner graphs is sensitive to these
degrees, as we shall see shortly.
a b c
b a c
c a b
Bit flipping based decoding: Let us now consider the following simple message passing
algorithm, illustrated via the example in Figure 7.11. As shown in the example, each variable
node maintains an estimate of the associated bit, initialized by what was received from the
channel. In the particular example we consider, the received sequence is 10000. We know from
Table 7.1 that a bounded distance decoder would map this to the codeword 00000. In message
380
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
passing for bit flipping, each variable node sends out its current bit estimate on all outgoing
edges. Each check node uses these incoming messages to generate new messages back to the
variable nodes, as illustrated in Figure 7.10, which shows a check node of degree 3. That is, the
message sent back to a variable node is the value that bit should take in order to satisfy that
particular parity check, assuming that the messages coming in from the other variable nodes
are correct. When the variable nodes get these messages, they flip their bits if ”enough” check
node messages tell them to. In our example of a (5, 2) code, let us employ the following rule: a
variable node flips its channel bit if (a) all the check messages coming into it tell it to, and (b)
the number of check messages is more than one (so as to provide enough evidence to override
the current estimate).
Figure 7.11 shows how bit flipping can be used to correct the one-bit error pattern 10000. Both
check node messages to variable node x1 say that it should take value 0, and cause it to flip
to the correct value. On the other hand, Figure 7.12 shows that bit flipping gets stuck for the
one bit error pattern 00001, because there is only one check message coming into variable node
x5 , which is not enough to flip it. Note that both of these error patterns are correctable using
bounded distance decoding, using Table 7.1 or Table 7.2. This reveals an important property of
iterative decoding on Tanner graphs: its success depends critically on the node degrees, which of
course depend on the particular choice of parity check matrix used to specify the Tanner graph.
Flip this bit
x1 x1
1 1 0
1
0
x2 c1 x2 c1
0 0 0 0
1
x3 0 x3
0 c2 0 1 c2
0
x4 0 1 x4
0 0 1
c3 c3
x5 0 x5
0 0 1
Leave unchanged
Figure 7.11: Bit flipping based decoding for the (5, 2) code is successful for this error pattern.
Can we fix the problem revealed by the example in Figure 7.12? Perhaps we can choose a different
parity check matrix for which the Tanner graph has variable nodes of degree at least 2, so that
bit flipping has a chance of working? For codes over large block lengths, it is actually possible
to use a randomized approach for the design of parity check matrices yielding desirable degree
distributions, enabling spectacular performance approaching Shannon limits. In these regimes,
iterative decoding goes well beyond the error correction capability guarantees associated with
the code’s minimum distance. However, such results do not apply to the simple example we are
considering, where iterative decoding is having trouble decoding even up to the guarantee of t = 1
associated with a minimum distance dmin = 3. However, this gives us the opportunity to present
a trick that can be useful even for large block lengths: use redundant parity check nodes, adding
one or more rows to the parity check matrix that are linearly dependent on other rows. Figure
7.13 shows a Tanner graph for the (5, 2) code with a redundant check node c4 corresponding to
x3 ⊕ x4 ⊕ x5 = 0. That is, we have added a fourth row 00111 to the parity check matrix (7.25).
This row is actually a sum of the first three rows, and hence would add no further information
if we were just performing look-up based bounded distance decoding. However, revisiting the
troublesome error pattern 00001, we see that this redundant check makes all the difference in
the performance of bit flipping based decoding; as Figure 7.14 shows, the pattern can now be
381
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
x1 x1
0 0 0
0
1
x2 c1 x2 c1
0 0 0 0
1
x3 0 x3
0 c2 0 0 c2
0
x4 0 0 x4
0 0 0
c3 c3
x5 1 x5
1 1 0
All bits remain unchanged
Figure 7.12: Bit flipping based decoding for the (5, 2) code is unsuccessful for this error pattern,
even though it is correctable using bounded distance decoding.
corrected.
x1
c1
x2
x3 c2
x4
c3
x5
c4
(redundant parity check)
Figure 7.13: Tanner graph for (5, 2) code with one redundant parity check.
382
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
These bits remain
x1 unchanged x1
0 0 0
0
1
x2 c1 x2 c1
0 0 0 0
1
x3 0 x3
0 c2 0 0 c2
0
1
x4 0 0 x4
0 0 0
1 c3 1 c3
x5 x5 0
1 0 0 1
c4 0 c4
1 Flip this bit
Figure 7.14: Bit flipping based decoding for the (5, 2) code using a redundant parity check is
now successful for the 00001 error pattern.
The advantage of this map is that it transforms binary addition into real-valued multiplication.
That is, x[m1 ] ⊕ x[m2 ] maps to b[m1 ]b[m2 ]. The BPSK symbols are transmitted over a discrete
time AWGN channel, with received symbols given by
P [x = 0] P [b = +1]
L(x) = log = log (7.35)
P [x = 1] P [b = −1]
where we omit the conditioning on y to simplify notation. We can go from LLRs to bit proba-
bilities as follows:
eL(x) 1
P [x = 0] = L(x) , P [x = 1] = L(x) (7.36)
e +1 e +1
We can go from LLRs to hard decisions as follows:
383
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
as its prior probability by another decoder component. We can now apply Bayes’ rule to show
(see Problem 7.17) that the LLR decomposes as follows:
L(x) = Lprior (x) + Lchannel (x) (7.38)
where
π0 (x)
Lprior (x) = log (7.39)
1 − π0 (x)
and
2Ay
Lchannel (x) = (7.40)
σ2
Thus, the use of the logarithm enables an additive decomposition of information from independent
sources, which is both intuitively pleasing and computationally useful. For our present purpose,
we can assume that x takes values from {0, 1} with equal probability, so that Lprior = 0. The
LLR L(x) represents the strength of our belief in whether the bit is 0 or 1, and LLR-based
message passing for iterative decoding is referred to as belief propagation.
u3 v3 = u 1 + u 2 + L c
Lc Lc
u2 v2 = u 1 + u 3 + L c
u1 v1 = u 2 + u 3 + L c
Belief propagation: We describe belief propagation over a Tanner graph for a linear block code
by specifying message generation at a generic variable node and a generic check node. In belief
propagation, the message going out on an edge is a function of the messages coming in on all of
the other edges. At a variable node, all of the LLRs involved refer to a given bit, with information
coming in from the channel and from check nodes. A key approximation in belief propagation
is to approximate all of these as independent sources of information, so that the corresponding
LLRs add up; this an excellent approximation for large block lengths that may not really apply
to our small running example, but we will go ahead and use it anyway in our numerical examples.
Figure 7.15 shows generation of an outgoing message from a variable node: the outgoing message
on an edge is the sum of the incoming message from all other edges (including from the channel
as well as from the check nodes). Thus, the outgoing message on a given edge equals the sum
of all incoming messages, minus the incoming message on that edge, and this is the way we
implement it in the code fragment below. For simplicity, a node of degree three (not counting
the edge coming from the channel) is shown in Figure 7.15, but the computation (and the code
fragment implementing it) applies to variable nodes of arbitrary degrees.
384
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
%outgoing message on an edge = sum of incoming messages on all other edges
%(including LLR from channel)
%Efficient computation: sum over all edges and subtract incoming message for each edge
Lout = sum(Lin) + Lchannel - Lin; %vector of the same dimension as Lin
Exercise: A variable node of degree 3 has channel LLR 0.25, and incoming LLR messages from
check nodes −1.5, 0.5, −2.
(a) If you had to make a hard decision on the variable based on this information, what would it
be?
(b) What are the outgoing messages back to the check nodes?
Answers: (a) The hard decision would be x̂ = 1 (b̂ = −1). (b) The outgoing messages to the
check nodes are −1.25. − 3.25, −0.75.
v1 u1
v2 u2
v3 u3
Figure 7.16: Check node update. The outgoing messages are computed using the tanh rule:
tanh(uk /2) = Πi6=k tanh(vi /2).
Message generation for check nodes, depicted in Figure 7.16, is more complicated. Consider
a check node of degree three, corresponding to the parity check equation x1 ⊕ x2 ⊕ x3 = 0.
Suppose that the incoming messages tells us that the LLRs for these three bits are v1 = Lin (x1 ),
v2 = Lin (x2 ), and v3 = Lin (x3 ). Let us compute the outgoing message u3 = Lout (x3 ) on the edge
corresponding to variable x3 . We have
tanh (u3 /2) = tanh (v1 /2) tanh (v2 /2) (7.42)
We can decompose the preceding into (intermediate) hard decisions and reliabilities as follows:
385
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Code Fragment 7.5.2 Check Node Update
function Lout = check_update( Lin )
%computes messages going out from a check node
%Lin = vector of messages coming in from variable nodes
%Lout = vector of messages going out to variable nodes
%convert LLRs to reliabilities and signs
reliabilities_in = log(abs(tanh(Lin/2)));
signs_in = sign(Lin);
%compute check update
reliabilities_out = sum(reliabilities_in) - reliabilities_in;
sign_product = prod(signs_in);
signs_out = sign_product.*signs_in;
%convert reliabilities and signs back to LLRs
Lout = 2*atanh(exp(reliabilities_out)).*signs_out;
Exercise: A check node of degree 4 has incoming LLRs −3.5, 2.2, 0.25, 1.3.
(a) Is the check satisfied by the incoming messages? That is, if we made hard decisions based on
the incoming LLRs, would they satisfy the parity check equation corresponding to this node?
(b) Use code fragment 7.5.2 to determine the corresponding outgoing LLRs. How are the signs
and reliabilities of the outgoing LLRs related to those of the incoming messages?
Answers: (a) No. (b) The outgoing LLRs are 0.1139, −0.1340, −0.9217, −0.1880. The signs are
flipped, and the larger reliabilities become smaller, while the smallest reliability increases. Why
does this make sense?
Once we have defined the computations at the variable and check nodes, all that is needed to
implement belief propagation is to route messages according to the edges defined by a given
parity check matrix (of course, the choice of code and parity check matrix determines whether
iterative decoding is effective). At any stage of iterative decoding, we can make hard decisions
at a variable node using (7.37), where the LLR is the sum of all incoming LLRs, including
the channel LLR. If the resulting estimated vector x̂ satisfies Hx̂ = 0, then we know that we
have obtained a valid codeword and we can terminate the decoding. Typically, if we do not
obtain a valid codeword after a specified number of iterations, then we declare decoding failure.
We implement belief propagation based iterative decoding in Software Lab 7.1; while we use our
running example (5, 2) code, the software developed in this lab provides a generic implementation
of belief propagation for any linear block code once the parity check matrix is specified.
386
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
that there is a well-defined maximum possible rate of reliable communication, termed the channel
capacity. For a passband bandlimited AWGN channel with bandwidth W (Hz), the capacity is
given by W log2 (1 + Es /N0), which translates to the following fundamental power-bandwidth
tradeoff:
2r − 1
Es /N0 > 2r − 1, Eb /N0 >
r
where Es is the energy per transmitted symbol, Eb is the energy per information bit, and r is the
spectral efficiency (the information bit rate normalized by the bandwidth). These results were
derived after first showing that the capacity of a discrete time real AWGN channel is given by
1
2
log2 (1 + S/N) bits per channel use.
• The channel capacity for a BSC with crossover probability p is 1 − HB (p) = 1 + p log2 p +
(1 − p) log2 (1 − p) bits per channel use. For BICM, such a channel is obtained, for example, by
making hard decisions on Gray coded constellations.
• Shannon limits can be used for guidelines for choosing system sizing: for example, the combi-
nation of code rate and constellation size that is appropriate for a given SNR.
• The performance of a given coded modulation strategy can be compared to fundamental limits
by comparing the SNR at which it attains a certain performance (e.g., a BER of 10−5 ) with the
minimum SNR required for reliable communication at that spectral efficiency.
Linear codes
• Linear codes are a popular and well-understood design choice in modern communication sys-
tems. The 2k codewords in an (n, k) binary linear code C form a k-dimensional subspace of the
space of n-dimensional binary vectors, under addition and multiplication over the binary field.
The dual code C ⊥ is an (n, n − k) linear code such that each codeword in C is orthogonal (under
binary inner products) to each codeword in C ⊥ .
• A basis for an (n, k) linear code C can be used to form a generator matrix G. A k-dimensional
information vector u can be encoded into an n-dimensional codeword x using the generator ma-
trix: x = uG.
• A basis for the dual code C ⊥ can be used to form a parity check matrix H satisfying HxT = 0
for any x ∈ C.
• The choices for G and H are not unique, since the choice of basis for a linear vector space is
not unique. Furthermore, we may add redundant rows to H to aid in decoding.
• The number of errors t that a code can be guaranteed to correct satisfies 2t + 1 ≤ dmin , where
dmin is the minimum Hamming distance between codewords. For a linear code, dmin equals the
minimum weight among nonzero codewords, since the all-zero vector is always a codeword, and
since the difference between codewords is a codeword.
• The translation of codewords by error vectors can be enumerated in a standard array, whose
rows correspond to translations of the entire code, termed cosets, by a given error pattern, termed
coset leader. A more compact representation lists only coset leaders and syndromes, obtained
by operating the parity check matrix on a given received word. These can be used to carry out
a look-up based implementation of bounded distance decoding.
Tanner graphs
• An (n, k) linear code with n × r (r ≥ n − k) parity check matrix H can be represented by a
Tanner graph, with n variable nodes on one side, and r check nodes on the other side, with an
edge between the jth variable and ith check node if and only if H(i, j) = 1.
• Message passing on the Tanner graph can be used for iterative decoding, which scales well
to very large code block lengths. One approach is to employ bit flipping algorithms with hard
decision inputs and binary messages, but a more powerful approach is to use soft decisions and
belief propagation.
Soft decisions and belief propagation
• The messages passed between the variable and check nodes are the bit LLRs. The message
going out on an edge depends on the messages coming in on all the other edges.
387
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
• Outgoing messages from a variable node are generated simply by summing LLRs. Outgoing
messages from a check node are more complicated, but can be viewed as a product of signs, and
a sum of reliabilities.
7.7 Endnotes
The material in this chapter has been selected to make two points: (a) information theory
provides fundamental performance benchmarks that can be used to guide parameter selection
for communication links; (b) coding theory provides constructive strategies for approaching these
fundamental benchmarks. We now list some keywords associated with topics that a systematic
exposition of information and coding theory might cover, and then provide some references for
further study.
Keywords: A systematic study of information theory, and its application to derive theorems in
source and channel coding, includes concepts such as entropy, mutual information, divergence
and typicality. A systematic study of the structure of algebraic codes, such as BCH and RS codes,
is required to understand their construction, their distance properties and decoding algorithms
such as the Berlekamp-Massey algorithm. A study of convolutional codes, their decoding using
the Viterbi algorithm, and their performance analysis, is another important component of a
study of channel coding. Tight integration of convolutional codes with modulation leads to
trellis coded modulation. Suitably interleaving convolutional codes leads to turbo codes, which
can be decoded iteratively using the forward-backward, or BCJR, algorithm. LDPC codes, which
can be decoded iteratively by message passing over a Tanner graph (as described here and in
software lab 7.1), are of course an indispensable component in modern communication design.
Further reading: One level up from the glimpse provided here is a self-contained introduction
to “just enough” information theory to compute performance benchmarks for communication
systems, and a selection of constructive coding and decoding strategies (including convolutional,
turbo, and LDPC codes), in the author’s graduate text [7] (Chapters 6 and 7). The textbook
by Cover and Thomas [40] is highly recommended for a systematic and lucid exposition of
information theory. Shannon’s beautifully written work [41] establishing the field is also highly
recommended as a source of inspiration. The textbook by McEliece [42] is a good source for a first
exposure to information theory and algebraic coding. A detailed treatment of algebraic coding
is provided by the textbook by Blahut [43], while comprehensive treatments of channel coding,
including both algebraic and turbo-like codes, are provided in the texts by Lin and Costello [44]
and Moon [45].
7.8 Problems
Shannon limits
Problem 7.1 Consider a coded modulation strategy pairing a rate 14 binary code with QPSK.
Assuming that this scheme performs 1.5 dB away from the Shannon limit, what are the minimum
values of Es /N0 (dB) and Eb /N0 (dB) required for the scheme to work?
Problem 7.2 At BER of 10−5 , how far away are the following uncoded constellations from
the corresponding Shannon limits: QPSK, 8PSK, 16QAM, 64QAM. Use the nearest neighbors
approximation for BER of Gray coded constellations in Section 6.4.
Problem 7.3 Consider Gray coded QPSK, 8PSK, 16QAM, and 64QAM.
(a) Assuming that we make ML hard decisions, use the nearest neighbors approximation for
388
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
BER of Gray coded constellations in Section 6.4 to plot the BER as a function of Es /N0 (dB)
for each of these constellations.
(b) The hard decisions induce a BSC with crossover probability given by the BERs computed in
(a). Using the BSC capacity formula (7.17), plot the capacity in bits per symbol as a function
of Es /N0 (dB) for each constellation. Also plot for comparison the capacity of the bandlimited
AWGN channel given by (). Comment on the penalty for hard decisions, as well as any other
trends that you see.
Problem 7.4 Consider a BICM system employing a rate 23 binary code with Gray coded QPSK
modulation.
(a) What is Es /N0 in terms of Eb /N0 ?
(b) Based on the AWGN capacity region (7.11), what is the Shannon limit for this system (i.e.,
the minimum required Eb /N0 in dB)?
(c) Now, consider the suboptimal strategy of making hard decisions, thus inducing a BSC. What
is the Shannon limit for the system? What is the degradation in dB due to making hard decisions?
Hint: Hard decisions
on Gray coded QPSK symbols induce a BSC with crossover probability
p
p=Q 2Es /N0 , whose capacity is given by (7.17). The Shannon limit is the minimum value
of Eb /N0 for the capacity to be larger than the code rate being used.
Problem 7.5 A rate 1/2 binary code is employed using bit interleaved coded modulation with
QPSK, 16QAM, and 64QAM.
(a) What are the bit rates attained by these three schemes when operating over a passband
channel of bandwidth 10 MHz (ignore excess bandwidth).
(b) Assuming that each coded modulation scheme operates 2 dB from the Shannon limit, what
is the minimum value of Es /N0 (dB) required for each of the three schemes to provide reliable
communication?
(c) Assuming that these three schemes are employed in an adaptive modulation strategy which
adapts the data rate as a function of the range. Assuming that the largest attainable range
among the three schemes is 10 km. Assuming inverse square path loss, what are the ranges
corresponding to the other two schemes.
(d) Now, if we add binary codes of rate 32 and 43 , plot the attainable bit rate versus range for
an adaptive modulation scheme allowing all possible pairings of code rates and constellations.
Assume that each scheme is 2 dB away from the corresponding Shannon limit.
Problem 7.6 (a) Apply L’Hospital’s rule to evaluate the limit of the right-hand side of (7.11) as
r → 0. What is the minimum possible Eb /N0 in dB at which reliable communication is possible
over the AWGN channel?
(b) Re-plot the region for reliable communication shown in Figure 7.6, but this time with spectral
efficiency r (bps/Hz) versus the SNR Es /N0 (dB). Is there any lower limit to Es /N0 below which
reliable communication is not possible? If so, what is it? If not, why not?
Problem 7.7 A parity check matrix for the (7, 4) Hamming code is given by
1 0 0 1 0 1 1
H= 0 1 0 1 1 0 1 (7.45)
0 0 1 0 1 1 1
(a) Find a generator matrix for the code.
(b) Find the minimum distance of the code. How many errors can be corrected using bounded
389
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
distance decoding?
Answer: dmin = 3, hence a bounded distance decoder can correct one error.
(c) Write down the standard array. Comment on how any structural differences you see between
this and the standard array for the (5, 2) code in Table 7.1.
Answer: Unlike in Table 7.1, no binary vectors are “left over” after running through the single
error patterns. The Hamming code is a perfect code: the decoding spheres of radius one cover
the entire space of length-7 binary vectors. “Perfect” in this case just refers to how well decoding
spheres can be packed into the available space; it definitely does not mean “good,” since the
Hamming code is a weak code.
(d) Write down the mapping between coset leaders and syndromes for the given parity check
matrix (as done in Table 7.2 for the (5, 2) code).
Problem 7.8 Suppose that the (7, 4) Hamming code is used over a BSC with crossover proba-
bility p = 0.01. Assuming that bounded distance decoding with decoding radius one is employed,
find the probability of correct decoding, the probability of decoding failure, and the probability
of undetected error.
Problem 7.9 Append a single parity check to the (7, 4) Hamming code. That is, given a
codeword x = (x1 , ..., x7 ) for the (7, 4) code, define a new codeword z = (x1 , ..., x7 , x8 ) by
appending a parity check on the existing code bits:
x8 = x1 ⊕ x2 ⊕ ... ⊕ x7
This new code is called an extended Hamming code.
(a) What are n and k for the new code?
(b) What is the minimum distance for the new code?
Problem 7.10 Hamming codes of different lengths can be constructed using the following pre-
scription: the parity check matrix consists of all nonzero binary vectors of length m, where m is
a positive integer.
(a) What is the value of m for the (7, 4) Hamming code?
(b) For arbitrary m, what are the values of code block length n and the number of information
bits k as a function of m?
Hint: The code block length is the number of columns in the parity check matrix. The dimension
of the dual code is the rank of the parity check matrix. Remember that row rank equals the
column rank. Which is easier to find in this case?
Problem 7.11 BCH codes (named after their discoverers, Bose, Ray-Chaudhury, and Hoc-
quenghem) are a popular class of linear codes with a well-defined algebraic structure and well-
understood algorithms for bounded distance coding. For a positive integer m, we can construct
a binary BCH code which can correct at least t errors with the following parameters:
n = 2m − 1, k ≥ n − mt, dmin ≥ 2t + 1 (7.46)
so that the code rate R = nk ≥ 1 − 2mmt−1 , where the inequality for k is often tight for small values
of t. For example, Hamming codes are actually (2m − 1, 2m − 1 − m) BCH codes with t = 1.
Remark: The price of increasing the block length of a BCH code is decoding complexity. Algebraic
decoding of a code of length n = 2m − 1 requires operations over GF (2m ).
(a) Consider a (1023, 923) BCH code. Assuming that the inequality for k is tight, how many
errors can it correct?
Answer: t = 10.
(b) Assuming that the inequality for k in (7.46) is tight, what is the rate of a BCH code with
n = 511 and t = 10?
390
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 7.12 Consider an (n, k) linear code used over a BSC channel with crossover probability
p. The number of errors among n code bits is X ∼ Bin(n, p). A bounded distance decoder of
radius t is used to decode it (assume that the code is capable of correcting at least t errors). The
probability of incorrect decoding is therefore given by
n
X n
Pe = P [X > t] = pk (1 − p)n−k (7.47)
k
k=t+1
Problem 7.13 Use the standard array in Table 7.1 for an exact computation of the probabilities
of decoding failure and decoding error for the (5, 2) code, for bounded distance decoding with
t = 1 over a BSC with crossover probability p. Plot these probabilities as a function of p on a
log-log scale.
Hint: Assume that the all-zero codeword is sent. Find the number and weight of error patterns
resulting in decoding failure and decoding error, respectively
Problem 7.14 For the binomial tail probability (7.47) associated with the probability of incor-
rect decoding, we are often interested in large n and relatively small t; for example, consider the
(1023, 923) BCH code in Problem 7.11, for which t = 10. While recursive computations as in
Problem 7.12 are relatively numerically stable, we are often interested in quick approximations
that do not require the evaluation of a large summation with (n − t) terms. In this problem, we
discuss some simple approximations.
(a) We are interested in designing systems to obtain small values of Pe , hopefully significantly
smaller than the input BER p. Argue that p ≥ nt is an uninteresting regime from this point of
view. What is the uninteresting regime for the (1023, 923) BCH code?
(b) For p ≪ nt , argue that the sum in (7.47) is well approximated by its first term.
(c) Since X is a sum of n i.i.d. Bernoulli random variables, show that the CLT can be used to
approximate its distribution by a Gaussian: X ∼ N(np, np(1 − p)).
(d) For the (1023, 923) BCH code, compute a numerical estimate of the probability of incorrect
decoding for t = 10 and p = 10−3 in three different ways: (i) direct computation, (ii) estimation
by the first term as in (b), (iii) estimation using the Gaussian approximation.
(e) Repeat (d) for p = 10−4 .
(f) Comment on the match (or otherwise) between the three estimates in (d) and (e). What
happens with smaller p?
Problem 7.15 Here are the (n, k, t) parameters for some other binary BCH codes for which the
computations of Problem 7.12 can be repeated: (1023, 863, 16), (511, 421, 10), (255, 215, 5).
Problem 7.16 Reed-Solomon (RS) codes are a widely used class of codes on non-binary alpha-
bets. While we do not discuss the algebraic structure of any of the codes we have mentioned, we
391
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
state in passing that RS codes can be viewed as a special class of BCH codes. The symbols in
an RS code come from GF (2m ) (a finite field with 2m elements, where m is a positive integer),
hence each symbol can be represented by m bits. The code block length equals n = 2m − 1. The
minimum distance is given by
dmin = n − k + 1 (7.49)
This is actually the best possible minimum distance attainable for an (n, k) code. It is possible
to extend the RS code by one symbol to obtain n = 2m , and to shorten the code to obtain
n < 2m − 1, all the while maintaining the minimum distance relationship (7.49). Bounded
distance decoding can be used to correct up to ⌊ dmin2 −1 ⌋ = ⌊ n−k
2
⌋ or dmin − 1 = n − k erasures,
or any pattern of t errors and e erasures satisfying 2t + e + 1 ≤ dmin = n − k + 1. One drawback
of RS codes: it is not possible to obtain code block lengths larger than 2m , the alphabet size.
(a) What is the maximum number of symbol errors that a (255, 235) RS code can correct? How
many bits does each symbol represent? In the worst case, how many bits can the code correct?
How about in the best case?
(b) The (255, 235) RS code is used as an outer code in a system in which the inner code produces
a BER of 10−3 . What is the symbol error probability, assuming that the bit errors are i.i.d.?
Assuming bounded distance decoding up to the maximum possible number of correctable errors,
find the probability of incorrect decoding.
Note: The symbol error probability p = 1 − (1 − pb )m , where pb is the BER and m the number
of bits per symbol.
(c) What is the BER that the inner code must produce in order for the (255, 235) RS code to
attain a decoding failure probability of less than 10−12 ?
(d) If the BER of the inner code is fixed at 10−3 and the block length and alphabet size of the
RS code are as in (b)-(c), what is the value of k for which the decoding failure probability is less
than 10−12 ?
Remark: While we consider random bit errors in this problem, inner decoders may often output
a burst of errors, and this is where outer RS codes become truly valuable. For example, a burst of
errors spanning 30 bits corresponds to at most 5 symbol errors in an RS code with 8-bit symbols.
On the other hand, correcting up to 30 errors using, say, a binary BCH code would cost a lot in
terms of redundancy.
LLR computations
Problem 7.17 Consider a BPSK system with a typical received sample given by
Y = A(−1)x + N (7.50)
where A > 0 is the amplitude, x ∈ {0, 1} is the transmitted bit, and N ∼ N(0, σ 2 ) is the noise.
Let π0 = P [x = 0] denote the prior probability that x = 0.
(a) Show that the LLR
P [x = 0|y] π0 p(y|0)
L(x) = log = log
P [x = 1|y] (1 − π0 )p(y|1)
Conclude that
L(x) = Lchannel (x) + Lprior (x)
p(y|0) π0
where Lchannel (x) = log p(y|1) and Lprior (x) = log 1−π 0
. (b) Write down the conditional densities
p(y|x = 0) and p(y|x = 1).
(c) Show that the channel LLR Lchannel (x) is given by
p(y|0) 2Ay
Lchannel (x) = log = 2
p(y|1) σ
392
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
(d) Specify (in terms of the parameters A and σ) the conditional distribution of Lchannel , condi-
tioned on x = 0 and x = 1.
(e) Suppose that the preceding is used to model either the I sample or Q sample of a Gray coded
QPSK system. Express Es /N0 for the system in terms of A and σ.
Es 2
Answer: N 0
= Aσ2 .
(f) Suppose that we use BICM using a binary code of rate Rcode prior to QPSK modulation.
Express Es /N0 for the QPSK symbols in terms of Eb /N0 .
Es Eb
Answer: N 0
= 2Rcode N 0
.
(e) For Eb /N0 of 3 dB and a rate 2/3 binary code, what is the value of A if the noise variance
per dimension is scaled to σ 2 = 1.
(g) For the system parameters in (f), specify numerical values for the parameters governing the
conditional distributions of the LLR found in (c).
(h) For the system parameters in (f), specify the probability of bit error for hard decisions based
on Y .
Problem 7.18 In this problem, we derive the tanh rule (7.42) for the check update, hopefully
in a ways that provides some insight into where the tanh comes from.
L
(a) For any bit x with LLR L, we have observed that P [x = 0] = eLe+1 . Now, show that
1 1
δ = P [x = 0] − = tanh(L/2) (7.51)
2 2
Thus, the tanh provides a measure of how much the distribution of x deviates from an equiprob-
able distribution.
Now, suppose that x3 = x1 ⊕ x2 , where x1 and x2 are modeled as independent for the purpose of
belief propagation. Let Li denote the LLR for xi , and set P [xi = 0] − 21 = δi , i = 1, 2, 3. (Note
that P [xi = 1] = 12 − δi .) Under our model,
(b) Plug in expressions for these probabilities in terms of the δi and simplify to show that
δ3 = 2δ1 δ2
−3A −A +A +3A
00 10 11 01
Figure 7.17: Gray coded 4PAM constellation.
Problem 7.19 Consider the Gray coded 4PAM constellation depicted in Figure 7.17. Denote
the label for each constellation point by x1 x2 , where x1 , x2 ∈ {0, 1}. The received sample is given
by
Y =s+N
393
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where s ∈ {−3A, −A, A, 3A} is the transmitted symbol, and N ∼ N(0, σ 2 ) is noise.
(a) Find expressions for the channel LLRs for the two bits:
p(y|x1 = 0) p(y|x2 = 0)
Lchannel (x1 ) = log , Lchannel (x2 ) = log
p(y|x1 = 1) p(y|x2 = 1)
Hint: Note that
p(y|x1 = 0) = p(y|x1x2 = 00) + p(y|x1 x2 = 01)
(b) Simulate the system for A = 2, normalizing σ = 1, and choosing the bits x1 and x2 inde-
pendently and with equal probability from {0, 1}. Plot the histogram for LLR1 conditioned on
x1 = 0 and conditioned on x1 = 1 on the same plot. Plot the histogram for LLR2 conditioned
on x2 = 0 and conditioned on x2 = 1 on the same plot. Are the conditional distributions in each
case well separated?
(c) You wish to design a BICM system with a binary code of rate Rcode to be used with 4PAM
modulation with A and σ as in (b). Using the formula (7.7) for the discrete time AWGN channel
to estimate the code rate to be used, assuming that you can operate 3 dB from the Shannon
limit.
Hint: Compute the SNR in terms of A and σ, but then reduce by 3 dB before plugging into (7.7)
to find the bits per channel use.
(d) Repeat (b) and (c) for A = 1, σ = 1.
Lab Assignment
1) Write a function implementing belief propagation. The inputs are the parity check matrix, the
channel LLRs, and the maximum number of iterations. The outputs are a binary vector which
is an estimate of the transmitted codeword, a bit indicating of whether this binary vector is a
valid codeword, and the number of iterations actually taken. To be concrete, we start defining
the function below.
394
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
One possible approach to filling in the function is to take the following steps:
(a) Build the Tanner graph: Given the parity check matrix H, find and store the neighbors for
each variable node and each check node. This can be done using a cell array, as follows.
(b) Build the message data structure: We can maintain messages (LLRs) in a matrix of the
same dimension as H, with nonzero entries only where H is nonzero. The jth variable node
will read/write its messages from/to the jth column, while the ith check node will read/write
its messages from/to the ith row. Initialize messages from variable nodes to the channel LLRs,
and from check nodes to zeros. We maintain two matrices, one corresponding to messages from
variable nodes, and one corresponding to messages from check nodes.
(c) Implement message passing: We can now use the variable update and check update func-
tions (code fragments 7.5.1 and 7.5.2 respectively), along with the preceding data structure, to
implement message passing.
395
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Putting (a)-(c) together gives the desired function.
2) Write a program to check that the preceding belief propagation function works for our example
(5, 2) code, using the parity check matrix corresponding to the Tanner graph in Figure 7.13, and
generating the channel LLRs as described in Problem 7.17. Specifically, consider Gray coded
QPSK modulation, where the I and Q components follow the BPSK model in Problem 7.17.
Note that A and σ in Problem 7.17 must be chosen appropriately (fix one, say σ = 1, and scale
the other) based on the spectral efficiency r (r = 4/5 for QPSK with the (5, 2) code) and Eb /N0 .
Assume, without loss of generality, that the all-zero codeword is sent. Decoding error therefore
occurs when the belief propagation function returns a nonzero codeword, or reports that a valid
codeword was not found after the maximum allowed number of iterations.
3) Use simulations to estimate and plot the probability of decoding error (log scale) with BP
as a function of Eb /N0 (dB). On the same graph, also plot the probability of decoding error for
bounded distance decoding with hard decisions (this can be computed analytically, as described
in Problem 7.12), and the probability of error for uncoded QPSK. Comment on the results.
Does BP with soft decisions provide an improvement over bounded distance decoding? Is the
performance better than that of uncoded QPSK? For your reference, an example unlabeled plot
is provided in Figure 7.18. Guess the labels for the three plots before verifying them using your
own computations and simulations.
0
10
−1
10
P[decoding error]
−2
10
−3
10
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Eb/N0 (dB)
Figure 7.18: Performance of the (5, 2) code with QPSK modulation, comparing belief propagation
with soft decisions against bounded distance decoding with hard decisions. Also plotted for
comparison is the performance of uncoded QPSK. Which curve is which?
Array codes: We now introduce the class of array codes, whose parity check matrix is charac-
terized by three positive integers (p, J, L), and is of the following form:
I I I ... I
I
P P2 ... PL−1
2 4 2(L−1)
H= I P P ... P (7.52)
...
I PJ−1 P2(J−1) ... P(J−1)(L−1)
where I denotes a p × p identity matrix, and p is a prime number. The matrix P is obtained by
cyclically shifting the rows of I by one. Thus, for p = 3, we have
1 0 0 0 1 0
I = 0 1 0 , P = 0 0 1 , for p = 3
0 0 1 1 0 0
396
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The matrix P is a permutation matrix, in the sense that for any p × 1 vector u = (u1 , ..., up )T ,
the vector Pu is a permutation of u. For this choice of P, the vector Pu = (up , u1 , ..., up−1)T
is a cyclic shift of u by one. Raising P to an integer power k simply corresponds to applying k
successive cyclic shifts, so that Pk is a cyclic shift of the rows of I by k, and Pk u is a cyclic shift
of u by k.
The parity check matrix H in (7.52) consists of JL p × p blocks, with the (j, l)th block being
P(j−1)(l−1) , 1 ≤ j ≤ J, 1 ≤ l ≤ L. The code length n = pL, the number of columns (or variable
nodes). The column weight equals J for each column (make sure you check this); that is, each
variable node has degree J. The number of rows (or check nodes) equals pJ, but some of these
rows may be redundant, so the dimension of the dual code n − k ≤ pJ. In fact, it can be shown
that exactly J − 1 rows are redundant. (To see why this might be true, add the first p rows, and
then the next p rows. What answers do you get? What does this tell you about the number of
linearly independent rows?) Thus, the rank of H equals n − k = pJ − (J − 1), so that the code
dimension k = p(L − J) + J − 1. We therefore summarize as follows:
Popular choices of the variable node degree J = 3, 4. Analysis of code properties show that we
should restrict L ≤ p. We can, for example, use a large prime p and moderate sized L, or set
L = p for a relatively small value of p.
4) Write a function to generate the parity check matrix of an array code, whose inputs are p, J, L
and outputs are H, n, k.
5) Consider an array code with p = 11, with L = p and J = 4, used as before with Gray coded
QPSK and BICM. As before, use simulations to estimate and plot the probability of decoding
error (log scale) with BP as a function of Eb /N0 (dB) for a BICM system employing QPSK.
Compare the performance (Eb /N0 for decoding error probability of 10−4 ) with the Shannon
limit for that spectral efficiency. To limit the simulation cost, you may wish to use a relatively
small number of simulation runs to generate your plots, and to estimate the value of Eb /N0 the
probability of decoding error starts falling below, say, 10−2 , and then use a larger number of runs
for a few carefully chosen values of Eb /N0 to see when the decoding error probability hits 10−4 .
How does this Eb /N0 compare with that required for 10−4 BER with uncoded QPSK?
6) Repeat 5) for larger values of the prime number p (still keeping L = p and J = 4), within the
limits of your computational infrastructure. For example, try p = 47.
397
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
7) Repeat 5) with large p and relatively small L; for example, p = 911 and L = 8, still keeping
J = 4. How does the code rate and spectral efficiency (with QPSK) compare with 5) and 6)?
Lab Report: Your lab report should answer the preceding questions in order, and should document
the reasoning you used and the difficulties you encountered. Comment on the decoding error
probability trends as you vary the code parameters.
398
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Chapter 8
From the material in Chapters 4-6, we now have an understanding of commonly used modulation
formats, noise models, and optimum demodulation for the AWGN channel model. Chapter 7
discusses channel coding strategies for these idealized models. In this final chapter, we discuss
more sophisticated channel models, and the corresponding signal processing schemes required at
the demodulator.
We first consider the following basic model for a dispersive channel: the transmitted signal passes
through a linear time-invariant system, and is then corrupted by white Gaussian noise. The LTI
model is broadly applicable to wireline channels, including copper wires, cable and fiber optic
communication (at least over shorter distances, over which fiber nonlinearities can be neglected),
as well as to wireless channels with quasi-stationary transmitters and receivers. For wireless
mobile channels, the LTI model is a good approximation over durations that are small compared
to the time constants of mobility, but still fairly long on an electronic timescale (e.g., of the order
of milliseconds). Methods for compensating for the effects of a dispersive channel are generically
termed equalization. We introduce two common design approaches for this purpose.
The first approach is singlecarrier modulation, which refers to the linear modulation schemes
discussed in Chapter 4, where the symbol sequence modulates a transmit pulse occupying the
entire available bandwidth. We discuss linear zero forcing (ZF) and Minimum Mean Squared
Error (MMSE) equalization techniques, which are suboptimal from the point of view of mini-
mizing error probability, but are intuitively appealing and less computationally complex than
optimum equalization. (We refer the reader to more advanced texts for discussion of optimum
equalization and its performance analysis.) We discuss adaptive implementation and geometric
interpretation for linear equalizers.
The second approach to channel dispersion is Orthogonal Frequency Division Multiplexing (OFDM),
where linear modulation is applied in parallel to a number of subcarriers, each of which occupies
a bandwidth which is small compared to the overall bandwidth. OFDM may be viewed as a
mechanism for ISI avoidance. It is based on the observation that any complex exponential ej2πf0 t
passes through any LTI system with transfer function H(f ) unchanged except for multiplication
by H(f0 ). Thus, we can send a number of complex exponentials {ej2πfi t }, termed subcarriers,
in parallel through the channel, each multiplied by an information-bearing symbol, such that
interference across subcarriers is avoided. The task of channel equalization therefore reduces to
compensating separately for the channel gains H(fi ) for each such subcarrier. Parallelizing the
problem of equalization in this manner is particularly attractive when the underlying time domain
impulse response h(t) is complicated (e.g., an indoor wireless channel where there are a large
number of paths with multiple bounces off walls and ceilings between transmitter and receiver).
We discuss how this intuition is translated into practice using transceiver implementations using
digital signal processing (DSP).
Finally, we discuss multiple antenna communication, also popularly known as Multiple Input
399
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Multiple Output (MIMO), or space-time, communication. There is a great deal of commonality
between signal processing for dispersive channels and for MIMO, which is why we treat these
topics within the same chapter. Furthermore, the combination of OFDM with MIMO allows
parallelization of transceiver signal processing for complicated channels, and has become the
architecture of choice for both WiFi (the IEEE 802.11n standard) and for fourth generation
cellular systems (LTE, or long term evolution). Three key concepts for MIMO are covered:
beamforming (directing energy towards a desired communication partner), diversity (combating
fading by using multiple paths from transmitter to receiver), and spatial multiplexing (using
multiple antennas to support parallel data streams).
Chapter Plan: Compared to the earlier chapters, this chapter has a somewhat unusual orga-
nization. For dispersive channels, a key goal is to provide hands-on exposure via software labs.
A model for singlecarrier linear modulation over a dispersive channel, including code fragments
for modeling the transmitter and the channel, is presented in Section 8.1. Linear equalization
is discussed in Section 8.2. Sections 8.1 and 8.2.1 provide just enough background, including
code fragments, for Software Lab 8.1 on adaptive implementation of linear equalization. Section
8.2.2 provides geometric insight into why the implementation in Software Lab 8.1 works, and
provides a framework for analytical computations related to MMSE equalization and the closely
related notion of zero-forcing (ZF) equalization. It is not required for actually doing Software
Lab 8.1. The key concepts behind OFDM and its DSP-centric implementation are discussed in
Section 8.3, whose entire focus is to provide background for developing a simplified simulation
model for an OFDM link in Software Lab 8.2. Finally, MIMO is discussed in Section 8.4, with
the signal processing concepts for MIMO communication reinforced by Software Lab 8.3. The
problems at the end of this chapter focus on linear equalization concepts discussed in Section
8.2.2, and on performance evaluation of core MIMO techniques (beamsteering, diversity and
spatial multiplexing) discussed in Section 8.4.
Software: As already mentioned, this chapter is structured to give an exposure to advanced
concepts through the associated software labs: Software Lab 8.1 for singlecarrier modulation
over dispersive channels, Software Lab 8.2 for OFDM, and Software Lab 8.3 for MIMO signal
processing.
400
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
I
Transmit DAC
Symbols filter I
Up
(implemented Q
rate 1/T Q converter
at rate m/T) DAC
Receive I
DSP for ADC
Estimated filter I Down
receiver
(implemented Q converter
symbols functions Q
at rate m/T) ADC
(includes coarse analog
(synchronization,
passband filtering)
equalization,
demodulation)
Figure 8.1: Typical DSP-centric transceiver realization. Our model does not include the blocks
shown in dashed lines. Finite precision effects due to digital to analog conversion (DAC) and
analog to digital conversion (ADC) are not considered. The upconversion and downconversion
operations are not modeled. The passband channel is modeled as an LTI system in complex
baseband.
Sampler,
rate m/T
Symbols Transmit Receive
Channel Receiver Estimated
{b[n]} Filter Filter Filter Signal
Rate 1/T g (t) g (t) g RX(t) Processing symbols
TX C
(Synchronization,
Noise Equalization,
Demodulation)
Figure 8.2: Block diagram of a linearly modulated system, modeled in complex baseband.
401
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
waveform shown in Figure 8.3(a). The Matlab code used for generating this plot is given below.
We have sampled much faster than the symbol rate (at 32/T ) in order to obtain a smooth plot.
In practice, we would typically sample at a smaller multiple of the symbol rate (e.g. at 4/T ) to
generate the input to the DAC in Figure 8.1.
1 0.5
0.8 0.4
0.6 0.3
0.4 0.2
0.2 0.1
0 0
−0.2 −0.1
−0.4 −0.2
−0.6 −0.3
−0.8 −0.4
−1 −0.5
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12
t/T t/T
Figure 8.3: The outputs of the transmit and receive filter without channel dispersion. The
symbols can be read off from sampling each waveform at the times indicated by the stem plot.
We provide Matlab code fragments that convey the concepts underlying discrete-time modeling
and implementation. The code fragments also show how some of the plots here are generated,
with cosmetic touches omitted.
The following code fragment shows how to work with discrete time samples using oversampling
at rate m/T , including how to generate the plot of the transmitted waveform in Figure 8.3(a).
402
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
hold on;
%choose sampling times in accordance with peak of transmit filter response
[maxval maxloc] = max(transmit_filter); %find peak location
sampling_times = maxloc:m:(nsymbols-1)*m+maxloc;
sampled_outputs = transmit_output(sampling_times);
stem((sampling_times-1)/m,sampled_outputs,’r’)
hold off;
If this waveform now goes through an ideal channel, and we use a receive filter with impulse
response matched to the transmitted pulse, then the waveform we obtain is shown in Figure
8.3(b). The transmit filter impulse response is time limited to length T and hence square root
Nyquist (see Chapter 4), hence the net response to a single symbol, which is a cascade of the
transmit filter with its matched filter, is Nyquist. It follows that, by sampling at the right
moments (as marked on the plot), we can recover the symbols exactly.
We now provide a code fragment to model the channel and receive filter; it can be employed for
modeling both ideal and dispersive channels. Appending it to code fragment 8.1.1 generates and
plots the noiseless received waveform.
dispersive = 0; %set this to 0 for ideal channel, and to 1 for dispersive channel
if dispersive == 0,
channel = 1;
else
channel = [0.8;zeros(m/2,1);-0.7;zeros(m,1);-0.6];
%(or substitute your favorite choice of dispersive channel)
end
%noiseless receiver input
receive_input = conv(transmit_output,channel);
t2 = (cumsum(ones(length(receive_input)))-1)/m;
figure;
plot(t2,receive_input);
xlabel(’t/T’);
%receive filter matched to transmit filter
%(would also need to conjugate if complex-valued)
receive_filter = flipud(transmit_filter);
%receive filter output (normalized to account for oversampling)
receive_output = (1/m)*conv(receive_input,receive_filter);
t3 = (cumsum(ones(length(receive_output),1))-1)/m;
%plot receive filter output together with sample locations chosen based on peak of net respo
figure;
plot(t3,receive_output,’b’);
xlabel(’t/T’);
hold on;
%effective pulse at channel output
pulse = conv(transmit_filter,channel);
%effective pulse at receive filter output (normalized to account for oversampling)
rx_pulse = conv(pulse,receive_filter)/m;
[maxval maxloc] = max(rx_pulse);
rx_sampling_times = maxloc:m:(nsymbols-1)*m+maxloc;
rx_sampled_outputs = receive_output(rx_sampling_times);
stem((rx_sampling_times-1)/m,rx_sampled_outputs,’r’);
hold off;
403
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2 0.2
0
−0.2
−0.4
−0.2
−0.6
−0.4
−0.8 0 2 4 6 8 10 12 14
−0.5 0 0.5 1 1.5 2
t/T t/T
Figure 8.4: When the transmitted waveform passes through the dispersive channel shown, we
can no longer read off the symbols reliably by sampling the output of the receive filter. For this
particular set of symbols, one of the symbols is estimated incorrectly, even though there is no
noise.
Figure 8.4 shows a dispersive channel and the corresponding noiseless receive filter output. The
effective pulse given by the cascade of the transmit, channel and receive filters is no longer
Nyquist, hence we do not expect a symbol decision based on a single sample to be reliable.
Figure 8.4(b) shows the severe distortion due to ISI with a “best effort” choice of sampling times
(chosen based on the peak of the effective pulse). In particular, for the specific symbol sequence
shown, one (out of ten) of the symbol estimates obtained by taking the signs of these samples is
incorrect.
20 30
15
20
10
10
5
0 0
−5
−10
−10
−20
−15
−20 −30
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
t/T t/T
Figure 8.5: Eye diagrams with and without channel dispersion. The eye is closed for the channel
considered, which means that reliable symbol decisions are not possible without equalization.
Eye diagrams: A classical technique for visualizing the effect of ISI is the eye diagram. It
is constructed by overlapping multiple segments of the received waveform over a fixed window,
which tells us how different combinations of symbols could potentially create ISI. For an ideal
channel and square root Nyquist pulses at either end, the eye is open, as shown in Figure 8.5(a).
However, for the dispersive channel in Figure 8.4(b), we see from Figure 8.5(b) is closed. An open
eye implies that, by an appropriate choice of sampling times, we can make reliable single-sample
symbol decisions, while a closed eye means that more sophisticated equalization techniques are
404
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
needed for symbol recovery.
Physically, an eye diagram can be generated using an oscilloscope with the baseband modulated
signal as the vertical input, with horizontal sweep triggered at the symbol rate. A code fragment
for generating the eye pattern from discrete time samples at rate m/T is given below. (While
Matlab has its own eye diagram routine, this code fragment is provided in order to clearly convey
the concept.) The output of the receive filter generated in code fragment 8.1.2 is the input to
this fragment, but in general, we could plot an eye diagram based on the baseband waveform at
any stage in the system. For complex baseband signals, we would plot the eye diagrams for the
I and Q components separately.
where p(t) = (gT X ∗ gC )(t) is the “effective pulse” given by the cascade of the transmit pulse
and the channel filter, {b[n]} is the symbol sequence, which is in general complex-valued, and
n(t) is complex WGN with PSD σ 2 = N20 . We translate this model directly into discrete time
by constraining t = kT /m + τ , where m/T is the sampling rate (m a positive integer) and τ
equals the sampling offset. The noise at the input to the receive filter is now modeled as discrete
time white Gaussian noise (WGN) with variance σ 2 = N20 per dimension. As we well know from
Chapter 6, the absolute value of the noise variance is meaningless unless we also specify the signal
scaling, hence we fix either the signal or noise strength, and set the other based on SNR measures
such as Eb /N0 or Es /N0 . Here Es = E [|b[n]|2 ] ||p||2 for the model (8.1), and Eb = Es / log2 M
as usual, where M is the constellation size. Inner products and norms are computed in discrete
time.
Note that, with the preceding convention, the noise energy in a fixed time interval scales up with
the sampling rate, and so does the signal energy (since we have more samples whose energies
we are adding up), with the SNR converging to the continuous-time SNR as the sampling rate
gets large. However, for a sampling rate that is a small multiple of the symbol rate, the SNR for
405
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
the discrete time system can, in general, be different from that in the original continuous time
system. We do not worry about this distinction here. The following code fragment illustrates
adding noise to our simulation model.
We now provide a code fragment which adds discrete time WGN to the receive filter input,
resulting in colored noise at the output. We add this to the signal component already computed
in code fragment 8.1.2.
406
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0.5 x b[−1]
Interfering symbol
−0.25
−0.5 +
1
0.5 x b[0]
Desired symbol
−0.25
−0.5
1
0.5 x b[1]
Interfering symbol
−0.25
−0.5
1
0.5 x b[2]
Falls outside
observation interval
−0.25
−0.5
Figure 8.6: The observation interval used to make a decision on b[0] sees contributions from the
desired symbol b[0] and interfering symbols b[−1] and b[1].
407
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0.5 x b[−1]
Falls outside
observation interval
−0.25
−0.5 +
1
0.5 x b[0]
Interfering symbol
−0.25
−0.5
1
0.5 x b[1]
Desired symbol
−0.25
−0.5
1
0.5 x b[2]
Interfering symbol
−0.25
−0.5
Figure 8.7: The observation interval used to make a decision on b[1] sees contributions from the
desired symbol b[1] and interfering symbols b[0] and b[2]. Comparing with Figure 8.6, the roles
of the symbols has shifted by one.
408
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
by integer multiples of T . Such periodicity in the statistics is termed cyclostationarity. This
implies that the statistics of the noise and ISI seen in different observation intervals are identical:
the only change is in which symbol plays the role of desired symbol. In particular, comparing
Figures 8.6 and 8.7, we see that the roles of desired and interfering symbols shifts by one as we go
from the observation interval for b[0] to that for b[1]. Thus, an appropriately designed strategy
for handling ISI over a given observation interval should work for other observation intervals as
well. This opens up the possibility of realizing adaptive equalizers which can learn enough about
the statistics of the ISI and noise to compensate for them.
We focus here on linear equalization, which corresponds to using the decision statistic cT r[n] to
estimate b[n], where c is an appropriately chosen correlator. The choice of c can be independent
of n, by virtue of cyclostationarity. For BPSK signaling, for example, this leads to a decision
rule
b̂[n] = sign cT r[n]
(8.2)
Minimizing the MSE in this fashion leads to minimizing the contribution due to ISI and noise
at the correlator output, which is clearly a desirable outcome.
The MSE is a quadratic function of c, and can therefore be minimized by setting its gradient
with respect to c to zero. Due to linearity, the gradient can be taken inside the expectation, and
we obtain
∇c J(c) = 2E r[n](cT r[n] − b[n]) = 2E r[n](rT [n]c − b[n])
Defining
R = E r[n]rT [n] ,
p = E [b[n]r[n]] (8.4)
we can rewrite the gradient of the MSE as
Setting the gradient to zero yields the following expression for the MMSE correlator:
cM M SE = R−1 p (8.6)
In order to compute this, we must know, or be able to estimate, the expectations in (8.4). If we
know the transmit filter, the channel filter, the receive filter, the sampling times, and the noise
PSD, we can compute these expectations using a model such as (8.13). However, we often do
not have explicit knowledge of one or more of these quantities. Thus, an attractive approach in
practice is to exploit the stationarity of the model as we vary n to estimate expectations using
their empirical averages. These expectations involve the received vectors r[n], which we of course
have access to, and the symbols b[n], which we assume we have access to over a training period in
which a known sequence of symbols is transmitted. This approach leads to adaptive equalization
techniques that do not require explicit knowledge or estimates of the model parameters.
409
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Least Squares Adaptation: Assuming that the first ntraining symbols are known, least
squares adaptation corresponds to replacing the expectations in (8.4) by their empirical averages
as follows:
ntraining ntraining
1 X
T 1 X
R̂ = r[n]r [n] , p̂ = b[n]r[n] (8.7)
ntraining n=1
ntraining n=1
1
where the normalization by ntraining is not needed, but is put in to make the averaging interpre-
tation transparent. The MMSE correlator is now approximated by the least squares solution:
This correlator can now be used to make decisions on the unknown symbols following the training
period. It can be checked that the preceding solution minimizes the empirical MSE over the
training period:
ntraining
X 2
ˆ =
MSE cT r[n] − b[n]
n=1
410
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
for n = 1:ntraining,
rn=r(1+q*(n-1)+offset:L+q*(n-1)+offset); %current received vector r[n]
phat = phat + symbols(n)*rn;
Rhat = Rhat + rn*rn’;
end
%least squares estimate of MMSE correlator
cLS = Rhat\phat; %often more stable computation than inv(Rhat)*phat
%implement equalizer as filter
h_equalizer = flipud(cLS); %would also need conjugation for complex signals
equalizer_output = conv(r,h_equalizer);
%sample filter output at symbol rate after appropriate delay
delay = length(h_equalizer)+offset;
%symbol decision statistics
decision_stats = equalizer_output(delay:q:delay+(nsymbols-1)*q);
%payload = non-training symbols
payload = symbols(ntraining+1:nsymbols);
%estimate of payload (for BPSK)
payload_estimate = sign(decision_stats(ntraining+1:nsymbols));
%number of errors
nerrors = sum(ne(payload,payload_estimate))
%COMPARE WITH UNEQUALIZED ESTIMATES
%unequalized estimates obtained by sampling at peaks of effective response
[maxval maxloc] = max(h);
sampling_times = maxloc:q:(nsymbols-1)*q+maxloc;
unequalized_decision_stats = r(sampling_times);
sampled_outputs = transmit_output(sampling_times);
%estimate of payload (for BPSK)
payload_estimate_unequalized = sign(unequalized_decision_stats(ntraining+1:nsymbols));
%number of errors
nerrors_unequalized = sum(ne(payload,payload_estimate_unequalized))
Putting code fragments 8.1.1, 8.1.2, 8.1.4 and 8.2.1 together, we obtain a simulation model for
adaptive linear equalization over a dispersive channel. As a quick example, for the dispersive
channel considered, at Eb /N0 of 7 dB, we estimate (using nsymbols = 10000, ntraining = 100)
the error probability after equalization at rate 2/T (q = 2) to be about 3.5 × 10−3 and the
unequalized error probability to be about 0.16. Linear equalization is quite effective in this case,
although it exhibits some degradation relative to the ideal BPSK error probability of 7.7 × 10−4 .
We can now build on this code base to run a variety of experiments, as suggested in Software
Lab 8.1: for example, probability of error as a function of Eb /N0 for different equalizer lengths,
for different channel models, and for different choices of the transmit and receive filters. Our
model extends easily to complex-valued constellations, as discussed below.
Extension to complex-valued signals: All of the preceding development goes through for
complex-valued constellations and signals, except that vector transposes xT are replaced by
conjugate transposes xH . Indeed, the Matlab code fragments we provide here already include
this level of generality, since we use the conjugate transpose operation x′ when computing the
transpose for real-valued x. All that is needed to employ these code fragments is to make the
symbols complex-valued, and to add an imaginary component to the noise model in code fragment
8.1.4. We skip derivations, and state that the decision statistics are given by cH r[n], the MSE
expression is
MSE = J(c) = E |cH r[n] − b[n]|2
411
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
As before, these statistical expectations can be replaced by empirical averages for a least squares
implementation.
We now have the background required for a hands-on exposure to equalization through Software
Lab 8.1.
Consider an observation interval (i.e., equalizer length) of length L = 4, aligned with the response
to the desired symbol as depicted in Figures 8.6 and 8.7. As shown in code fragment 8.2.1, we
can also choose smaller or larger observation intervals, and optimize their alignment using some
criterion (in the code fragment, the criterion is maximizing the energy of the desired response
falling into the observation interval). In addition to the contribution to r[n] due to b[n], we also
have contributions from other symbols before and after it in the sequence, corresponding to parts
of appropriately shifted versions of the response h. For example, the response to b[n + 1] falling
in the nth observation interval is obtained by shifting h by q = 2 and then windowing. The
received vector r[n] can therefore be written as follows.
Model for L = 4: Two interfering symbols fall into the observation interval. The observation
interval is large enough to accommodate the entire response due to the desired symbol.
−0.5 0 0.5
+ b[n + 1] 0 + b[n − 1] −0.25 + w[n]
1
r[n] = b[n]
0.5 −0.5 0 (8.10)
−0.25 1 0
where our convention is that time progresses downward, and where w[n] denotes noise. The
vector multiplying b[n] is the desired vector, while the others are interference vectors. Figure 8.6
corresponds to n = 0, while Figure 8.7 corresponds to n = 1.
In order to obtain the preceding model, the vector corresponding to a given symbol is obtained
by appropriately shifting h, and then windowing to the observation interval. In order to ensure
that the modeling approach is clear, we also provide the model for L = 3, where the observation
interval is lined up with the first three elements of the response to the desired symbol, and L = 6,
where the observation interval contains two additional samples on either side of the response to
the desired symbol.
Model for L = 3: The observation interval is smaller than the desired symbol response. Two
interfering symbols fall in the interval.
−0.5 0 0.5
r[n] = b[n] 1 + b[n + 1] 0 + b[n − 1] −0.25 + w[n] (8.11)
0.5 −0.5 0
412
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Model for L = 6: The observation interval is larger than the desired symbol response. Four
interfering symbols fall in the interval.
0 0
−0.5 0 0
0
1 0
0.5 + b[n + 1] −0.5 + b[n + 2] 0
r[n] = b[n]
0
−0.25 1
−0.5
0 0.5
(8.12)
1 −0.25
0.5 0
−0.25 0
+ b[n − 1] 0 + b[n − 2] 0 + w[n]
0 0
0 0
Vector model for ISI: In general, we can write the received vector over observation interval n
as follows: X
r[n] = b[n]u0 + b[n + k]uk + w[n] (8.13)
k6=0
where b[n], u0 are the desired symbol and vector, respectively; b[n + k], uk for k 6= 0 are
interference symbols and vectors, respectively; and w[n] ∼ N(0, Cw ) denotes the vector of noise
samples at the output of the receive filter, windowed to the current observation interval. For
an equalizer working with rate q/T samples, we have already noted that successive observation
intervals are offset by q samples. Clearly, the structure of the ISI remains the same as we go from
observation interval n to n + 1, but the roles of the symbols are shifted by one: for the n + 1st
observation interval, b[n + 1] is the desired symbol multiplying u0 , while b[n + 1 + k] for k 6= 0
is the interfering symbol multiplying uk .
Modeling the output of a linear correlator: A linear correlator c operating on the received
vector produces the following output:
X
cT r[n] = b[n]cT u0 + b[n + k]cT uk + cT w[n] (8.14)
k6=0
where the first term is the desired term, the second term is the ISI at the correlator output,
and the third term is the noise at the correlator output. While the ultimate performance metric
of interest is the error probability, a convenient metric that is easy to compute is the signal-to-
interference-plus-noise ratio (SINR) at the output of the linear correlator, defined as the ratio of
the average energy of the desired term, to those of the undesired terms:
E |b[n]cT u0 |2
SINR = h P i (8.15)
E | k6=0 b[n + k]cT uk + cT w[n]|2
Assuming that the symbols are uncorrelated with E[|b[n]|2 ] ≡ σb2 and are independent of the
noise, we obtain the following expression for the SINR:
σb2 |cT u0 |2
SINR = 2
P T 2 T
(8.16)
σb k6=0 |c uk | + c Cw c
413
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Choosing c to minimize the MSE (8.3) means that we would like to have cT r[n] ≈ b[n]. This
means that, if the linear MMSE equalizer is working, then cT u0 ≈ 1, and the ISI terms cT uk ,
k 6= 0, and the output noise variance cT Cw c, are small. The MMSE criterion represents a
tradeoff between ISI and noise at the output. To see why, let us consider the closely related
criterion of zero-forcing equalization. While the noise in the example considered in our code
fragments is colored, let us first consider white noise for simplicity: w[n] ∼ N(0, σ 2 I), so that
the output noise cT w[n] ∼ N(0, σ 2 ||c||2).
PI u 0 u0
Desired vector
... u1
u −1
Interference subspace
Figure 8.8: The zero-forcing correlator projects the received signal along P⊥
I u0 , the projection
of the desired signal vector orthogonal to the interference subspace.
414
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
linearly independent of the interference vectors, and we expect the ZF correlator to exist. For
the model (8.10) for L = 3, we again have 2 interference vectors, and it again appears that
the ZF correlator should exist, although we would expect the performance to be poorer because
the relative length of the orthogonal projection can be expected to be smaller. Of course, such
intuition must be quantified by explicit computation of the ZF correlator and its performance,
which we discuss next.
Computation of the ZF correlator: Let us now obtain an explicit expression for the ZF
correlator given the vector ISI model (8.13). Suppose that the signal vectors {uk } are written as
columns in a matrix U as follows:
U = [...u−1 u0 u1 ...] (8.19)
The ZF conditions (8.17)-(8.18) can be compactly written as
UT cZF = e (8.20)
where e = (...0, 1, 0, ...)T is a unit vector with one corresponding to the column u0 and zeros cor-
responding to columns uk , k 6= 0. Further, we can write the ZF correlator as a linear combination
of the signal vectors (any component orthogonal to all of the {uk } can only add noise):
cZF = Ua (8.21)
415
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
against. In particular, the noise enhancement ζ can be defined as the ratio by which the ZF SNR
is smaller than the MF benchmark:
SNRM F
ζ= = ||u0 ||2||cZF ||2 , noise enhancement for white noise (8.25)
SNRZF
Let us first interpret this geometrically. Setting cZF = αP⊥ I u0 , the condition (8.18) corresponds
to
1 = hcZF , u0 i = αhP⊥ ⊥
I u0 , u0 i = α||PI u0 ||
2
(8.26)
The last equality follows because u0 decomposes into its projection onto the interference sub-
space PI u0 and its orthogonal projection P⊥
I u0 . Since these two components are orthogonal by
definition, we have
hP⊥ ⊥ ⊥ ⊥ ⊥
I u0 , u0 i = hPI u0 , PI u0 i + hPI u0 , PI u0 i = 0 + ||PI u0 ||
2
In other words, a ZF correlator satisfying (8.17)-(8.18) can be written in terms of the projection
of the desired vector orthogonal to the interference subspace as follows:
P⊥I u0
cZF = ⊥
(8.27)
||PI u0 ||2
from which it follows that
1
||cZF ||2 = (8.28)
||P⊥
I u0 ||
2
Thus, the smaller the orthogonal projection P⊥ I u0 , the more we must scale up the correlator
in order to maintain the normalization (8.18) of the contribution of the desired symbol at the
output. Plugging into (8.25), we obtain the following geometric interpretation for the noise
enhancement:
SNRM F ||u0 ||2
ζ= = (8.29)
SNRZF ||P⊥I u0 ||
2
This is intuitively reasonable: the noise enhancement is the inverse of the factor by which the ef-
fective signal energy is reduced because of looking along the orthogonal projection P⊥I u0 , instead
of along the desired vector u0 .
The following code fragment computes the ZF correlator and the noise enhancement for the
model (8.10). We find that the noise enhancement is 4.4 dB.
Code Fragment 8.2.2 Computing the ZF solution and its noise enhancement
%ZF example
%matrix with signal vectors as columns
U=transpose([0.5 -0.25 0 0;-0.5 1 0.5 -0.25;0 0 -0.5 1]);
%unit vector with one corresponding to u0
e=transpose([0 1 0]);
%coeffs of linear combination
a=(U’*U)\e;
%ZF correlator: linear comb of cols of U
czf=U*a;
%desired vector is second column
416
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
u0=U(:,2);
%check that ZF equations are satisfied
U’*czf %should be equal to the vector e
%noise_enhancement
noise_enhancement = (u0’*u0)*(czf’*czf)
%in dB
noise_enhancement_db = 10*log10(noise_enhancement)
While the matrix U is specified manually in the preceding code fragment, for longer channels, we
would typically automate the generation of U given the channel impulse response h, the equalizer
length L, the oversampling factor q, and the specification of how the observation interval lines
up with the response to the desired symbol (i.e., how to generate u0 from h).
ZF correlator for colored noise: Let us now discuss how to generalize the expressions for
the ZF correlator and its noise enhancement for colored noise, where w[n] has covariance matrix
Cw (assumed to be strictly positive definite, and hence invertible). We limit ourselves here to
stating the results; guidance for deriving these results is provided in Problem 8.9. The optimal
ZF solution, in terms of maximizing the output SNR while satisfying (8.17)-(8.18), is given by
−1
cZF = C−1 T −1
w U U Cw U e , ZF correlator for colored noise (8.30)
The corresponding SNR is given by
σb2
SNRZF = , ZF SNR for colored noise (8.31)
cTZF Cw cZF
If there were no ISI, then the optimal correlator is given by the whitened matched filter c = C−1
w u0 ,
and the corresponding matched filter bound on SNR is given by
SNRM F = σb2 uT0 C−1
w u0 , matched filter bound for colored noise (8.32)
Proceeding as before, the noise enhancement is given by
SNRM F
= uT0 C−1
T
ζ= w u0 c ZF C w c ZF , noise enhancement for colored noise (8.33)
SNRZF
The reader is encouraged to check that, when we set Cw = σ 2 I in the preceding expressions, we
recover the expressions derived earlier for white noise.
MMSE correlator: While we have seen how to adaptively implement the MMSE equalizer, if
we are given the vector ISI model (8.13), then we can compute the MMSE solution analytically
(see Problem 8.8 for the derivation) as follows:
cM M SE = R−1 p
(8.34)
R = σb2 UUT + Cw , p = σb2 u0
417
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
We summarize as follows. The zero-forcing equalizer drives the ISI to zero, while the linear
MMSE equalizer trades off ISI and noise at its output so as to maximize the SINR. For large
SNR, the contribution of the ISI is dominant, and the MMSE equalizer tends in the limit to the
zero-forcing equalizer (if it exists), and hence pays the same asymptotic penalty in terms of noise
enhancement. In practice, the MMSE equalizer often performs significantly better than the ZF
equalizer at moderate SNRs, but in order to improve equalization performance at high SNR, one
must look to nonlinear equalization strategies, which are beyond our present scope.
Extension to complex-valued signals: All of the preceding development applies to complex-
valued constellations and signals, except that vector transposes xT are replaced by conjugate
transposes xH , and the noise covariance matrix must include the effect of both the real and
imaginary parts of the noise.
Noise model: In order to model complex-valued WGN, we set Cw = 2σ 2 I. This can be gener-
ated by setting Re(w) and Im(w) to be i.i.d. N(0, σ 2 ). More generally, we consider circularly
symmetric, zero mean, complex Gaussian noise vectors w, which are completely characterized
by their complex covariance matrix,
Cw = E (w − E[w])(w − E[w])H = E wwH
We use the notation w ∼ CN(0, Cw ). Detailed discussion of circularly symmetric Gaussian ran-
dom vectors would distract us from our present purpose. Suffice it to say that circular symmetry
and Gaussianity is preserved under linear transformations. The covariance matrix evolves as
follows: if w = Bw̃, then Cw = BCw̃ BH . Thus, we can generate colored circularly symmetric
Gaussian noise w by passing complex WGN through a linear transformation. Specifically, if we
can write Cw = BBH (this can always be done for a positive definite matrix), we can generate
w as w = Bw̃, where w̃ ∼ CN(0, I).
The expressions for the ZF and MMSE correlators are as follows:
−1
cZF = C−1 H −1
w U U Cw U e , ZF correlator for complex − valued signals
(8.35)
cM M SE = R−1p , MMSE correlator for complex − valued signals
(R = σb2 UUH + Cw , p = σb2 u0 , σb2 = E[|b[n]|2 ])
MMSE and SINR: While the SINR for any linear correlator can be computed as in (8.16),
we can obtain particularly simple expressions for the MSE and SINR achieved by the MMSE
correlator, as follows.
418
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
P2) Complex exponentials at different frequencies are orthogonal:
Z Z ∞
∗
hsn , sm i = sn (t)sm (t)dt = ej2π(fn −fm )t dt = δ(fm − fn ) = 0 , fn 6= fm
−∞
This is analogous to the properties of eigenvectors of matrices. Thus, complex exponentials are
eigenfunctions of any LTI system, as already pointed out in Chapter 2.
Conceptual basis for OFDM: For frequency domain transmission with symbol B[k] modu-
lating the complex exponential sn (t) = ej2πfn t , the transmitted signal is given by
X
u(t) = B[n]ej2πfn t
n
When this goes through a dispersive channel h(t), we obtain (ignoring noise)
X
(u ∗ h)(t) = B[n]H(fn )ej2πfn t
n
Note that the symbols {B[n]} do not interfere with each other after passing through the channels,
since different complex exponentials are orthogonal. Furthermore, regardless of how complicated
the time domain channel h(t) is, we have managed to parallelize the problem of equalization by
going to the frequency domain. Thus, we only need to estimate and compensate for the complex
scalar H(fn ) in demodulating the nth symbol. We now discuss how to translate this concept
into practice.
Finite signaling interval: The first step is to constrain the signaling interval, say to length T .
The complex baseband transmitted signal is therefore given by
N
X −1 N
X −1
j2πfn t
u(t) = B[n]e I[0,T ] (t) = B[n]pn (t) (8.37)
n=0 n=0
where B[n] is the symbol transmitted using the modulating signal pn (t) = ej2πfn t I[0,T ] , using the
nth subcarrier at frequency fn . Let us now see how the properties P1 and P2 are affected by
time limiting. The time limited tone pn (t) has Fourier transform Pn (f ) = T sinc((f −fn )T )e−πf T ,
which decays quickly as |f − fn | takes on values of the order of Tk . For a channel whose impulse
response h(t) is approximately timelimited to Td (the channel delay spread), the transfer function
is approximately constant over frequency intervals of length Bc roughly inversely proportional
to 1/Td (the channel coherence bandwidth). If the signaling interval is large compared to the
channel delay spread (T ≫ Td , then 1/T is small compared to the channel coherence bandwidth
( T1 ≪ Bc ), so that the gain seen by Pn (f ) is roughly constant, and the eigenfunction property
is roughly preserved. That is, when Pn (f ) goes through a channel with transfer function H(f ),
the output
Qn (f ) = H(f )Pn (f ) ≈ H(fn )Pn (f ) (8.38)
Regarding the orthogonality property P2, two complex exponentials that are constrained to an
interval of length T are orthogonal if the frequency separation is an integer multiple of 1/T :
Z T
ej2π(fn −fm )T − 1
ej2πfn t e−j2πfm t dt = = 0 , for (fn − fm )T = nonzero integer (8.39)
0 j2π(fn − fm )
Thus, if we wish to send N symbols in parallel using N subcarriers (the term used for each time-
constrained complex exponential), we need a bandwidth of roughly N/T in order to preserve
orthogonality among the timelimited tones. Of course, even if we enforce orthogonality in this
419
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
fashion, the timelimited tones are not eigenfunctions of LTI systems, so the output corresponding
to the nth timelimited tone is not just a scalar multiple of itself. However, using (8.38), we can
approximate the channel output for the nth timelimited tone as H(fn )ej2πfn t I[0,T ] (t). Thus, the
output corresponding to the transmitted signal (8.37) can be approximated as follows:
N
X −1
y(t) ≈ B[n]H(fn )ej2πfn t I[0,T ] (t) + n(t) (8.40)
n=0
To summarize, once we limit the signaling duration to be finite, the ISI avoidance property of
OFDM is approximate rather than exact. However, as we now discuss, orthogonality between
subcarriers can be restored exactly in digital implementations of OFDM. Before discussing such
implementations, we provide some background on discrete time signal processing.
We can recognize this simply as the inverse DFT of the symbol sequence {B[n]}. We make this
explicit in the notation as follows:
N
X −1
b[k] = u(kTs ) = B[n]ej2πnk/N (8.41)
n=0
If N is a power of 2 (which can be achieved by zeropadding if necessary), the samples {b[k]} can
be efficiently generated from the symbols {B[n]} using an inverse Fast Fourier Transform (IFFT).
The complex baseband waveform u(t) can now be obtained from its samples by digital-to-analog
(D/A) conversion. This implementation of an OFDM transmitter is as shown in Figure 8.9: the
bits are mapped to symbols, the symbols are fed in parallel to the inverse FFT (IFFT) block,
and the complex baseband signal is obtained by D/A conversion of the samples (after insertion
of a cyclic prefix, to be discussed after we motivate it in the context of receiver implementation).
Typically, the D/A converter is an interpolating filter, so that its effect can be subsumed within
the channel impulse response.
Note that the relation (8.41) can be inverted as follows:
N −1
1 X
B[n] = b[k]e−j2πnk/N (8.42)
N k=0
This is exploited in the digital implementation of the OFDM receiver, discussed next.
Remark on Matlab FFT and IFFT conventions: Matlab puts a factor of 1/N in the IFFT rather
than in the FFT as done in (8.41) and (8.42). In both cases, however, IFFT followed by FFT
420
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Parallel to serial
...
...
Serial to parallel Digital−to−analog
Bits Modulator IFFT converter To
converter converter
from (bits to symbols) (insert cyclic upconverter
encoder prefix)
N complex N complex
symbols in samples out
gives the identity. Note also that Matlab numbers vector entries starting with one, so the FFT
of x[n], n = 1, ..., N is given by:
N
X
X[k] = x[n]ej2π(n−1)(k−1)/N
n=1
We have observed that, once we limit the signaling duration to be finite, the ISI avoidance
property of OFDM is approximate rather than exact. However, as we now show, orthogonality
between subcarriers can be restored exactly in discrete time by using a cyclic prefix, which allows
for efficient demodulation using an FFT. The noiseless received OFDM signal is modeled as
N
X −1
v(t) = b[k]p(t − kTs )
k=0
where the “effective” channel impulse response p(t) includes the effect of the D/A converter at
the transmitter, the physical channel, and the receive filter. When we sample this signal at rate
1/Ts , we obtain the discrete-time model
N
X −1
v[m] = b[k]h[m − k] (8.43)
k=0
where {h[l] = p(lTs )} is the effective discrete time channel of length L, assumed to be smaller
than N. We assume, without loss of generality, that h[l] = 0 for l < 0 and l ≥ L. We can rewrite
(8.43) as
L−1
X
v[m] = h[l]b[m − l] (8.44)
l=0
As noted in (8.42), the DFT of {b[k]} is the symbol sequence B[n] (the normalization is chosen
differently in (8.42) and (8.45) to simplify the forthcoming equations.) In order to parallelize
421
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
equalization across the N subcarriers, we would like the noiseless signal to equal V [n] = H[n]B[n].
However, this is not quite satisfied in our setting. We now discuss why not, and how to modify the
system so as to indeed enforce such a relationship. Before doing this, we need a brief discussion
of the DFT and its dual operation, the cyclic convolution.
DFT multiplication and cyclic convolution: The time domain samples {b[k]} defined via
the IDFT in (8.41) have range 0 ≤ k ≤ N − 1. If we now plug in integer values of k outside
this range, we simply get a periodic extension bN [k] of these samples with period N, satisfying
bN [k + N] = bN [k] for all k, with bN [k] = b[k], 0 ≤ k ≤ N − 1. Thus, the IDFT can be viewed as
a discrete time analogue of a Fourier series for a periodic time domain sample sequence {bN [k]}.
We know that, for the Fourier transform, “multiplication in the frequency domain corresponds to
convolution in the time domain.” We skipped the analogous result for Fourier series in Chapter
2 because we did not need it then. Now, however, we establish the appropriate result for the
discrete time Fourier series of interest here: for the DFT, if we multiply two sequences in the
frequency domain, then it corresponds to a cyclic, or periodic, convolution in the time domain.
While the result we wish to establish is general, let us stick with the notation we have already
established. Consider the “desired” sequence Ṽ [n] = H[n]B[n], n = 0, ..., N − 1, that we would
like to get when we take the DFT of the output of the channel. What is the corresponding time
domain sequence? To see this, take the IDFT:
N
X −1
ṽ[m] = H[n]B[n]ej2πmn/N
n=0
Plugging in the expression (8.45) for the channel DFT coefficients, we obtain
PN −1 PL−1 −j2πnl/N
ṽ[m] = n=0 l=0 h[l]e B[n]ej2πmn/N
PL−1 N −1 (8.46)
B[n]ej2π(m−l)n/N
P
= l=0 h[l] n=0
Now, the summation over n corresponds to an IDFT, and therefore gives us b[m − l] as long as
0 ≤ m − l ≤ N − 1. Outside this range, it gives us the periodic extension {bN [m − l]}:
N
X −1
B[n]ej2π(m−l)n/N = bN [m − l]
n=0
where we have introduced the notation h ⊙ b to denote the cyclic convolution of h and b. While
we have derived this result in our particular context, it is worth stating that it holds generally:
the cyclic convolution modulo N between two sequences p and q, each of length at most N (it is
often convenient to think of them as having length N, using zeropadding if necessary) is defined
as the convolution over a period of length N of their periodic extensions with period N:
N
X −1
(p ⊙ q)[m] = pN [l]qN (m − l)
l=0
The N point DFT of the cyclic convolution of these two sequences is the product of their DFTs.
Figure 8.10 illustrates cyclic convolution modulo N = 4 between a sample sequence {b[k]} of
length 4 and a channel impulse response of length 2, while Figure 8.11 illustrates the correspond-
ing linear convolution.
422
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
b[2] h[0] b[0] b[3] h[0] b[1] b[0] h[0] b[2] b[1] h[0] b[3]
h[1] h[1] h[1] h[1]
v[0] = b[0] h[0] + b[3]h[1] v[1] = b[1] h[0] + b[0] h[1] v[2] = b[2] h[0] + b[1] h[1] v[3] = b[3] h[0] + b[2] h[1]
Figure 8.10: Example of cyclic convolution. Time progresses clockwise on the circle. The se-
quence {b[k]} is flipped, and hence goes counter-clockwise. We then “slide” this flipped sequence
clockwise in order to compute successive outputs. Clearly, the output is periodic with period
N = 4.
h[0] h[1]
Channel impulse response {h[k]}
Figure 8.11: Linear convolution of the two sequences in Figure 8.10 leads to an aperiodic sequence
of length 2 + 4 − 1 = 5. Note that the outputs at times 1,2 and 3 coincide with the outputs of
the corresponding cyclic convolution.
423
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Flip and slide periodic extension of sample sequence {b[k]} Only need this extra sample to ensure
complete overlap with channel coefficients
b[3] b[2] b[1] b[0] b[3] b[2] b[1] b[0]
v[0] = b[0] h[0] + b[3] h[1] Linear convolution with cyclic prefix
coincides with cyclic convolution
b[3] b[2] b[1] b[0]
v[1] = b[1] h[0] + b[0] h[1]
Linear convolution
b[3] b[2] b[1] b[0] coincides with cyclic convoluaion
v[2] = b[2] h[0] + b[1] h[1]
b[3] b[2] b[1] b[0] v[3] = b[3] h[0] + b[2] h[1]
h[0] h[1]
Channel impulse response {h[k]}
Let us summarize where we now stand. In order to parallelize the channel in the DFT domain, we
need a cyclic convolution in the time domain given by (8.47). However, what the physical channel
actually gives us is the linear convolution of the form (8.44). In order to get the cyclic convolution
we want, we simply need to send an appropriately large segment of a periodic extension of the
time domain samples {b[k]} through the channel. Indeed, if we only want to get N outputs
corresponding a single period of the output of the circular convolution, then we do not need a
full-fledged periodic extension. Figure 8.12 shows how to get the first N = 4 outputs of a linear
convolution to be equal to a period of the cyclic convolution in Figure 8.10 by inserting a single
sample. More generally, we need a cyclic prefix of length L − 1 for a channel of length L, as
discussed below.
Since L < N, we can write the circular convolution (8.47) as
min(L−1,m) L−1
X X
ṽ[m] = h[l]b[m − l] + h[l]b[m − l + N] (8.48)
l=0 l=m+1
Comparing the linear convolution (8.47) and the cyclic convolution (8.48), we see that they are
identical except when the index m − l takes negative values: in this case, b[m − l] = 0 in the
linear convolution, while b[(m − l) mod N] = bN (m − l) = b[m − l + N] contributes to the circular
convolution. Thus, we can emulate a cyclic convolution using the physical linear convolution by
sending a cyclic prefix; that is, by sending
b[k] = bN [k] = b[N + k], k = −(L − 1), −(L − 2), ..., −1
before we send the samples b[0], ..., b[N − 1]. That is, we transmit the samples
b[N − L + 1], ..., b[N − 1], b[0], ..., b[N − 1]
incurring an overhead of (L − 1)/N which can be made small by choosing N to be large.
In the example depicted in Figure 8.12, N = 4 and L = 2, and we insert the cyclic prefix b[3],
sending b[3], b[0], b[1], b[2], b[3] (when this is flipped for the pictorial convolution in the figure, the
extra sample b[3] appears at the end).
At the receiver, the complex baseband signal is sampled at rate 1/Ts to obtain noisy versions of
the samples {b[k]}. The FFT of these samples then yields the model
Y [n] = H[n]B[n] + N[n] (8.49)
424
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
...
...
Serial to parallel Parallel to serial To demodulator
Analog−to−digital
Sampler converter FFT
From converter converter and decoder
downconverter
N complex N complex
samples in decision statistics
out
Figure 8.13: DSP-centric implementation of an OFDM receiver. Carrier and timing synchro-
nization blocks are not shown.
where the frequency domain noise samples N[n] are modeled as i.i.d. complex Gaussian, with
Re(N[n]) and Im(N[n]) being i.i.d. N(0, σ 2 ). If the receiver knows the channel, then it can
implement ML reception based on the statistic H ∗ [n]Y [n]. Thus, the task of channel equalization
has been reduced to compensating for scalar channel gains for each subcarrier. This makes
OFDM extremely attractive for highly dispersive channels, for which time domain singlecarrier
equalization strategies would be difficult to implement.
Channel estimation: Channel estimation (along with timing and carrier synchronization,
which are not considered here) is accomplished by sending pilot symbols. In Software Lab 8.2,
we send an entire OFDM symbol as a pilot, followed by a succession of other OFDM symbols
with payload.
8.4 MIMO
The term Multiple Input Multiple Output (MIMO), or space-time communication, refers to com-
munication systems employing multiple antennas at the transmitter and receiver. We now pro-
vide a brief introduction to key concepts in MIMO systems, along with pointers for further
exploration.
While much effort and expertise must go into the design of antennas and their interface to
RF circuits, the following abstract view suffices for our purpose here: at the transmitter, an
antenna transduces electrical signals at radio frequencies into electromagnetic waves at the same
frequency that propagate in space; at the receiver, the antenna transduces electromagnetic waves
in a certain frequency range into electrical signals at the same set of frequencies. Antennas which
are insensitive to the direction of arrival/departure of the waves are termed omnidirectional or
isotropic (while there is no such thing as an ideal isotropic antenna, it is a convenient conceptual
building block). Antennas which are sensitive to the direction of arrival or departure are termed
directional. It is possible to synthesize directional responses using an array of omnidirectional
antenna elements, as we discuss next.
425
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Broadside
θ
...
d sin θ
d
Figure 8.14: A plane wave impinging on a linear array.
propagation (equal to 3 × 108 m/s in free space). For carrier frequency fc , the corresponding
phase shift is φ = 2πfc τ = 2πfc d sin θ/c. The two expressions are equivalent, since λ = fcc .
The narrowband assumption: What is the effect of the differences in delays seen by successive
elements? Suppose that the wave impinging on element 1 is represented as
where u(t) = uc (t) + jus (t) is the complex envelope, assumed to be of bandwidth W . Suppose
that the bandwidth W ≪ fc : this is the so-called “narrowband assumption,” which typically
holds in most practical settings. For the scenario shown in the figure, the wave arrives τ = ℓ/c
time units earlier at element 2. The wave impinging on element 2 can therefore be represented
as
where φ = 2πfc τ . Thus, the complex envelope of the wave at element 2 is v(t) = u(t+ τ )ejφ . The
time shift τ has two effects on the complex envelope: a time shift in the baseband waveform u,
along with a phase rotation φ due to the carrier. However, for most settings of interest, the time
shift in the baseband waveform can be ignored. To see why, suppose that the array parameters
are such that φ is of the order of 2π or less, in which case τ is of the order of f1c or less. Under
the narrowband assumption, the time shift τ produces little distortion in u. To see this, note
that
u(t + τ ) ↔ U(f )ej2πf τ
As f varies over a range W , the frequency-dependent phase change produced by the time shift
varies over a range 2πW τ ∼ 2πW/fc ≪ 2π for W ≪ fc . Thus, we can ignore the effect of the time
shift on the complex envelope, and model the complex envelope at element 2 as v(t) ≈ u(t)ejφ .
Similarly, for element 3, the complex envelope is well approximated as u(t)ej2φ .
Array response and spatial frequency: Under the narrowband assumption, if the complex envelope
at element 1 is u(t), then the complex envelopes at the various elements can be collected into a
vector u(t)a, where
a = (1, ejφ , ej2φ , ..., ej(N −1)φ )T (8.50)
is the array response for a particular AoA. Making the dependence on the AoA θ explicit for
the linear array, we have φ(θ) = 2πd sin θ/λ, which yields a corresponding array response a(θ).
The linear increase in phase across antenna elements (i.e, across space) is analogous to the
426
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
linear increase of phase across time for a sinusoid. Thus, we call φ = φ(θ) the spatial frequency
corresponding to AoA θ. The collection of array responses {a(θ), θ ∈ [−π, π]} as we vary the AoA
is termed the array manifold.
Reciprocity: While Figure 8.14 depicts an antenna array receiving a wave, exactly the same
reasoning applies to an antenna array emitting a wave. In particular, the principle of reciprocity
tells us that the propagation channel from transmitter to receiver is the same as that from receiver
to transmitter. Thus, the array response of a linear array for angle of arrival θ is the same as
the array response for angle of departure θ.
Antenna 1 I
Downconverter
Baseband
. Q
Processing
. (of N complex
. baseband
Antenna N I waveforms)
Downconverter
Q
Common LO
for downconversion
Figure 8.15: MIMO signal processing architecture. There is one “RF chain” per antenna, down-
converting the signal received at that antenna to I and Q components.
Signal processing architecture: What the preceding complex baseband model means physically
is that, if we downconvert the RF signals at the outputs of the antenna elements (using the
same LO frequency and phase, and filters with identical responses, in each such “RF chain”),
then the complex envelopes corresponding to the different antenna elements will be related as
described above. Once the I and Q components for these complex envelopes are obtained,
they would typically be sampled and quantized using analog-to-digital converters (ADCs), and
then processed digitally. Such a DSP-centric signal processing architecture, depicted in Figure
8.15, allows the implementation of sophisticated MIMO algorithms in today’s cellular and WiFi
systems. While the figure depicts a receiver architecture, an entirely analogous block diagram can
be drawn for a MIMO transmitter, simply by reversing the arrows and replacing downconverters
by upconverters.
While the DSP-centric architecture depicted in Figure 8.15 has been key to enabling the widespread
deployment of low-cost MIMO transceivers, it may need to be revisited as carrier frequencies,
and the available signaling bandwidths, scale up. Both the cost and power consumption of ADCs
with adequate precision can be prohibitively large at very high sampling rates, hence alternative
architectures with MIMO processing done, wholly or in part, prior to ADC may need to be
considered. See the epilogue for further discussion.
8.4.2 Beamsteering
Once we know the array response for a given direction, we can maximize the received power
(for a receive antenna array) or the transmitted power (for a transmit antenna array) in that
direction by employing a spatial matched filter or spatial correlator. If the first antenna element
receives a complex baseband waveform (after downconversion and sampling) s[n] from AoA θ,
427
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
then the output of the antenna array is modeled as a vector of complex baseband discrete time
signals with kth component
yk [n] = ej(k−1)φ(θ) s(t) + wk [n] , k = 1, 2, ..., N (8.51)
where φ(θ) is the spatial frequency corresponding to θ, and where wk [n] are typically modeled
as complex WGN, independent across space and time: Re(wk [n]) and Im(wk [n]) i.i.d. N(0, σ 2 )
for all k, n. In vector notation, we can write
y[n] = a(θ)s[n] + w[n] (8.52)
where y[n] = (y1 [n], ..., yN [n])T , w[n] = (w1 [n], ..., wN [n])T , and a(θ) is the array response cor-
responding to direction θ. We have not discussed complex WGN in detail in this text, but in
analogy with the results in Chapters 5 and 6 for real WGN, it is possible to show that correlation
against a noiseless signal template is the right thing to do. Thus, regardless of the value of the
time domain sample s[n], the spatial processing that maximizes SNR is to correlate against the
noiseless template a(θ). That is, we wish to compute the decision statistics
Z[n] = hy[n], a(θ)i = aH (θ)y[n] (8.53)
Correlating the spatial signal against the array response in this fashion is termed beamform-
ing. The desired signal contribution to the decision statistic obtained from beamforming is
||a(θ)||2 s[n] = Ns[n]. Thus, the signal amplitude gets scaled by a factor of N, and hence the
signal power gets scaled by a factor N 2 . It can be shown that the variance of the noise contri-
bution to the decision statistic gets amplified by a factor of N. Thus, the SNR gets amplified
by a factor of N by beamforming at the receiver. This is called the beamforming gain. Receive
beamforming is also termed maximal ratio combining, because it combines the spatial signal in
a manner that maximizes the signal-to-noise ratio.
Receive beamforming gathers energy coming from a given direction. Conversely, transmit beam-
forming can be used to direct energy in a given direction. For example, if a linear transmit
antenna array seeks to direct energy towards an angle of departure θ, then, in order to send
a time domain samples s[n], it should transmit the spatial vector s[n]aH (θ). Since the spatial
channel to the receiver is a(θ), the signal received is given by s[n]aH (θ)a(θ) = Ns[n]. Thus, the
received amplitude scales as N, and the received power as N 2 . Since the noise at the receiver does
not get the benefit of this transmit beamforming gain, transmit beamforming with N antennas
leads to an SNR gain of N 2 relative to a single antenna system, if we fix the per-antenna emitted
power. The signal transmitted from antenna k is s[n]ej(k−1)φ(θ) , which has power |s[n]|2 , and
since we have N antenna elements, we are transmitting at N times the power. The additional
factor of N in received power comes from the fact that, by choosing the beamforming coefficients
appropriately, we are ensuring that the signals from these N antenna elements add up in phase
at the receiver, which leads to an N-fold gain.
Thus, both transmit and receive beamforming perform spatial matched filtering, leading to a
beamforming gain of N. That is, the SNR is enhanced by a factor of N. In addition, if each
element in a transmit antenna array transmits at a power equal to that of a reference single ele-
ment antenna, then we have an additional power combining gain of N for transmit beamforming,
leading to a net SNR gain of N 2 .
Beamforming directs energy in a given direction by ensuring that the radio waves emitted or
received from that direction (or their complex envelopes) add constructively, or in phase. The
radio waves in other directions may add constructively or destructively, depend on the array
geometry. Thus, it is of interest to characterize the beam pattern corresponding to a particular
set of beamforming coefficients. If we are beamforming in direction θ0 , then the gain in an
arbitrary direction θ is given by
G(θ; θ0 ) = |ha(θ), a(θ0 )i| = aH (θ0 )a(θ)
The following code fragment computes and plots the beam pattern for a linear array.
428
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Code Fragment 8.4.1 Plotting beam patterns for a linear array
10 10
5
5
0
−5
Gain (dB)
Gain (dB)
−5 −10
−15
−10
−20
−15
−25
−20 −30
−100 −80 −60 −40 −20 0 20 40 60 80 100 −100 −80 −60 −40 −20 0 20 40 60 80 100
Angle with respect to broadside Angle with respect to broadside
Figure 8.16: Example beam patterns with a linear array, generated using code fragment 8.4.1.
Array spacing: In the preceding code fragment, we have set the element spacing at λ/3. In
Problem 8.10, we explore the effect of varying the element spacing, and in particular, what
happens as the element spacing exceeds λ/2.
Notational convention: We say that we employ beamforming weights or coefficients c = (c1 , ..., cN )T
when we apply the coefficient c∗i to the ith antenna element. For a receive beamformer, if the
spatial signal being received is y = (y1 , ..., yN )T , then the use of beamforming weights c corre-
sponds to computing the inner product hy, ci = cH y = N ∗
P
i=1 ci yi . With this convention, the
beamforming weights for directing a beam in direction θ are given by c = a(θ).
Steering nulls: As we see from Figure 8.16, when we form a beam in a given direction, we
maximize the beam pattern in that direction, creating a main lobe in the beam pattern, while
also generating other local maxima (typically of lower strength) in other directions. The latter
are called sidelobes, and are often small enough compared to the main lobe that we do not worry
about them. Sometimes, however, we want to be extra careful in guaranteeing that power is not
429
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
accidentally steered in an undesired direction. For example, a cellular base station employing
a beamforming array to receive a signal from mobile A may wish to null out interference from
mobile B. We can use a ZF approach, analogous to the one discussed in detail in Section 8.2.2.
If mobile A is in direction θA and mobile B in direction θB , then we wish to align c with a(θA ) as
best we can, while staying orthogonal to a(θB ). Thus, we can choose the beamforming weights
to be a scaled version of the projection of a(θA ) orthogonal to the interference subspace spanned
by a(θB ), which is given by
a(θB )
cA = a(θA ) − ha(θA ), a(θB )i (8.54)
ha(θB ), a(θB )i
While the ZF approach has the advantage of having a clear geometric interpretation, in practice,
when implementing this at the receiver, we may often employ the MMSE criterion (see Sections
8.2 and 8.2.2), which lends itself to adaptive implementation.
We can combine beam and null steering in this fashion at the transmitter as well as the receiver.
There are some additional issues when employing this approach at the transmitter. First, the
transmitter must know the array responses corresponding to the different receivers it is steering
beams or nulls towards, which requires either explicit feedback, or implicit feedback derived
from reciprocity. Second, we must scale the weights appropriately depending on constraints on
transmit power: average power scales with ||c||2, while peak power scales with maxi |ci |2 .
Space division multiple access (SDMA): Beamforming and nullforming can enable a single receiver
to receive from multiple transmitters, and conversely, a single transmitter to transmit separate
messages to different receivers, using a common set of time-frequency resources. This is termed
space division multiple access (SDMA). For example, in order to send a message signal sA (t)
to mobile A without interfering with mobile B, and message signal sB (t) to mobile B without
interfering with mobile A, the transmitter sends the “space-time” signal
where cA is the zero-forcing solution in (8.54), and cB is a zero-forcing solution with the roles
of A and B interchanged. That is, the signal transmitted from the ith antenna is a linear
combination of the two message signals, yi (t) = sA (t)c∗i,A + sB (t)c∗i,B , where the conjugation of
the beamforming weights is in accordance with the convention discussed earlier. A receiver with
an antenna array can use similar techniques to receive signals from multiple transmitters at the
same time. SDMA is explored further in Problem 8.11.
430
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
TX
RX
Figure 8.17: Ray tracing to determine paths between a transmitter-receiver inside a “two-
dimensional room.” All first-order reflections, and two second-order reflections, are shown. The
lightly shaded circles depict “virtual sources” employed to perform ray tracing.
Base Station
(narrow angular
spread seen by
base station)
Ring of scatterers
Mobile
(sees rich scattering)
Figure 8.18: A typical propagation environment between an elevated base station and a mobile
in urban clutter. The mobile sees a rich scattering environment locally, due to reflections from
building and street surfaces. However, from the base station’s viewpoint, the paths to the mobile
fall within a narrow angular spread.
431
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Exercise: What is the total number of second-order reflections in the scenario depicted in Figure
8.17?
Even in outdoor settings, such as for cellular networks, mobiles in an urban environment may
see rich scattering because of bounces from buildings around them. An elevated base station,
however, may still see a relatively sparse scattering environment. Such a situation is depicted in
Figure 8.18. Since the base station sees a narrow angular spread, it may be able to employ beam-
forming strategies effectively (e.g., forming a beam along the “mean” angle of arrival/departure).
However, the mobile transceiver must account for the rich scattering environment that it sees.
At this point, the reader is encouraged to quickly review Section 2.9. As we noted there, a
multipath channel has a transfer function which is “frequency-selective” (i.e., it varies with
frequency). Now that we have multiple antennas, each antenna sees a frequency-selective channel,
so that the net array response is frequency-selective. However, we can model the array response as
constant for a small enough frequency slice (smaller than the coherence bandwidth–see discussion
in Section 2.9). OFDM (see Section 8.3) naturally decomposes the channel into such slices, and
each subcarrier in a MIMO-OFDM system may see a different array response. Thus, we can apply
MIMO processing in parallel to each subcarrier after downconversion and OFDM processing, as
shown in Figure 8.19.
Antenna 1 I
Downconverter S/P FFT
Subcarrier k
. Q
.
. . Per subcarrier
. . MIMO
processing
Antenna N I
Downconverter S/P FFT
Subcarrier k
Q
Common LO
for downconversion
Figure 8.19: Typical MIMO-OFDM receiver architecture. After downconverting and sampling
the received signal from each antenna, we apply OFDM processing to separate out the subcarriers.
After the FFT, the samples for a given subcarrier, say k, from the different antennas are collected
together for per-subcarrier MIMO processing. Thus, each subcarrier sees a different narrowband
MIMO channel.
Focusing on a single subcarrier in a MIMO-OFDM system (this model also applies to narrowband
signaling with bandwidth smaller than the channel coherence bandwidth), consider a link with
M transmit antennas and N receive antennas. Over a subcarrier, the channel from transmit
element m to receive element n is a complex-valued scalar, which we denote by Hnm . If the
transmitter sends a complex symbol x[m] from antenna m, then the nth receive antenna sees the
linear combination
M
X
yn = Hnm xm + wn (8.56)
m=1
where wn is the complex-valued noise seen at the nth receive antenna. The preceding can be
written in matrix-vector notation as
y = Hx + w (8.57)
where y = (y1 , ..., yN )T is the received vector, x = (x1 , ..., xM )T is the transmitted vector, and H
is the N × M channel matrix, whose mth column is the receive array response seen by the mth
transmit element.
Noise model: The complex-valued noise wn is typically modeled as follows: Re(wn ), Im(wn ) are
432
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
i.i.d. N(0, σ 2 ), and are independent across receive antennas. The noise vector w is said to be
a complex Gaussian random vector which is completely characterized by its mean E[w] = 0
and covariance matrix Cw = E (w − E[w])(w − E[w])H = 2σ 2 I. Its distribution is denoted by
w ∼ CN(0, 2σ 2 I), and the distribution of any entry is specified as wn ∼ CN(0, 2σ 2 ).
Remark on notation: According to our convention (which is consistent with most literature in
the field), for an M × N MIMO system (i.e., with M transmit antennas and N receive antennas),
the channel matrix H is an N × M matrix. The reason for this choice of convention is that we
like working with column vectors: x is the M × 1 column vector of symbols transmitted from the
different transmit antennas, the mth column of H is the receiver’s spatial response to the mth
transmit antenna, and y is the N × 1 column vector of received samples.
Operations such as beamforming and nullforming can now be performed separately for each
subcarrier. However, these operations are no longer associated with directing energy or nulls
towards particular physical directions, since the spatial response in each subchannel is a linear
combination of array responses associated with many directions. A particularly simple model for
the resulting channel gains for a given subcarrier is described next.
Rich scattering model: The path gains H(n, m) for a given subcarrier are a function of the
channel impulse responses between each transmit/receive pair, but are often modeled statistically
in order to provide quick insights into design tradeoffs in a manner that is independent of the
specific propagation geometry. We now discuss a particularly simple model, motivated by “rich
scattering environments” in which there are a large number of paths of roughly equal strength
between the transmitter and receiver. Let h = H(m, n) denote the complex gain between a
typical transmit/receive antenna pair. We can write
L
X
h= Ai ejθi
i=1
where L is the number of paths, and where Ai ≥ 0, θi ∈ [0, 2π] are the amplitude and phase of
the complex-valued path gain for the given subcarrier. We therefore have
L
X L
X
Re(h) = Ai cos θi , Im(h) = Ai sin θi
i=1 i=1
If the differences between the lengths of the different paths are comparable to, or larger than,
a carrier wavelength (which is typically the case even for WiFi links indoors, and certainly for
cellular links outdoors), then we can model the phases θi as i.i.d. uniform over [0, 2π]. Now, if
the amplitudes for the different paths are roughly comparable, then we can applyP the central
limit theorem to approximate the joint distribution of Re(h) and Im(h) as i.i.d. N(0, Li=1 A2i /2).
Let us now normalize Li=1 A2i = 1 without loss of generality; we can scale the noise variance
P
2]
to adjust the average SNR: SNR = E[|h| 2σ2
= 2σ1 2 for the model. We can therefore model h as a
zero mean complex Gaussian random variable: h ∼ CN(0, 1). Furthermore, for rich scattering
environments, it is assumed that the phases seen by different transmit/receive antenna pairs
are sufficiently different that we can model the gains H(n, m) as i.i.d. CN(0, 1) for different
transmit/receive antenna pairs (m, n).
8.4.4 Diversity
When the transmitter and receiver each have only one antenna (M = N = 1), under the rich
scattering model, the SNR is given by
|h|2
SNR = (8.58)
2σ 2
433
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Since h is a random variable, so is the SNR. In fact, since Re(h) and Im(h) are i.i.d. N(0, 21 ),
the sum of their squares is an exponential random variable (see Problems 5.11 and 5.21). Taking
into account the scaling by 2σ 2 , we can show that SNR is an exponential random variable with
mean equal to the average SNR SNR = 2σ1 2 . If we now design our coded modulation strategy
for a nominal SNR of SNR0 , we say that the system is in outage when the SNR is smaller than
this value. The probability of outage is given by
We would typically choose the nominal SNR, SNR0 , to be smaller than the average SNR, SNR,
by a link margin. For example, for a link margin of 10 dB, we have SNR0 = 0.1SNR, so that
Pout = 1−e−0.1 ≈ 0.1 (for |x| small, ex ≈ 1+x for |x| small, so that 1−e−x ≈ x). Thus, even after
giving up 10 dB in link margin, we still get a relatively high outage rate of 10%. Of course, there
is a nontrivial probability that the SNR with fading is higher than the nominal, hence we can
have negative link margins if we are willing to live with large enough outage rates. For example,
a link margin of - 3 dB corresponds to SNR0 = 2SNR, with outage rate Pout = 1 − e−2 = 0.865
(too high for most practical applications).
In order to reduce the outage rate without increasing the link margin, we must employ diversity,
which is a generic term used for any strategy that gets multiple, approximately independent,
“looks” at a fading channel. We saw diversity in action for our simulation-based model in
Software Lab 2.2. We now explore it for the rich scattering model, skipping some details in the
derivation in the interest of arriving quickly at the key insights.
Benchmark: We continue to define our link margin relative to an unfaded single input single
output (SISO) system with average SNR of SNR = 2σ12 .
Receive diversity: Consider a receiver equipped with two antennas (N = 2). If they are spaced
far enough apart in a rich scattering environment, we can assume that the channel gains (for a
given subcarrier) seen by the two antennas are i.i.d. CN(0, 1) random variables. The received
samples at the two antennas are modeled as
yn = hn x + wn , n = 1, 2
where x is the transmitted symbol, h[1], h[2] ∼ CN(0, 1) are i.i.d. (independent Rayleigh fading),
and w[1], w[2] ∼ CN(0, 2σ 2 ) are i.i.d. (independent noise samples). The optimal decision statistic
is obtained using receive beamforming, and is given by
Z = h∗1 y1 + h∗2 y2 (8.60)
It can be shown that the SNR is now given by
434
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Equations (8.61) and (8.63) generalize directly to N receive antennas:
As N gets large, the fluctuations due to fading get smoothed away, and Gdiversity → 1 by the law
of large numbers. In practice, however, even small values of N (e.g., N = 2, 4) give significant
performance gains.
0
10
−1
10
−2
10
−3
10
Outage Probability
−4
10
−5
10
−6
10
−7
10
−8
10
−9
1x1
10 1x2
1x4
−10
10
−5 0 5 10 15 20
Link Margin (dB)
Figure 8.20: Probability of outage versus link margin (dB) for receive diversity in 1 × N MIMO
systems.
Suppose now that we design our coding and modulation so as to provide reliable performance at
a nominal SNR, say SNR0 , which is smaller than the SISO benchmark SNR by a link margin
of L dB: SNR0 (dB) = SNR(dB) − L(dB). The probability of outage is given by
Figure 8.20 plots the outage probability as a function of link margin for several different values
of N. The plots are obtained using the procedure outlined in Problem 8.12.
Transmit diversity: If the transmitter has multiple antennas, it can beamform towards the
receiver if it has implicit or explicit feedback regarding the channel, as already noted. When
such feedback is not available, we would like to use open loop strategies which provide diversity.
Consider a transmitter with two antennas communicating with a receiver with a single antenna
(M = 2, N = 1). In a MIMO-OFDM system, for a given subcarrier, suppose that the transmit
antenna 1 sends the sample x1 and transmit antenna 2 sends the sample x2 . If the transmitter
knows the channel coefficients h1 and h2 from the two transmit antennas to the receive antenna,
then it could choose x1 = h∗1 x, x2 = h∗2 x, where x is the symbol to be transmitted. What do we
do when h1 and h2 are unknown? In general, if we send x1 , x2 from the two transmit elements,
then the received sample is given by
y = h1 x1 + h2 x2 + w (8.66)
435
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
where w is noise. Thus, if x1 and x2 are two independent symbols,√then they interfere with each
other at the receiver. On the other hand, if we set x1 = x2 = x/ 2 (normalizing the transmit
power across the two antennas to that of a transmitter with a single antenna), then we receive
h1 + h2
y= √ x+w
2
If h1 , h2 are i.i.d. CN(0, 1), it is easy to show that the effective channel coefficient hef f = h1√+h
2
2
is also CN(0, 1). Thus, we still have Rayleigh fading, and have not made any progress relative to
a single antenna transmitter! An ingenious solution to this problem is the Alamouti space-time
code (named after its inventor), which resolves the interference between the signals sent by the
two transmit antennas over two time samples. Let s[1] and s[2] be two symbols to be transmitted.
For a single antenna transmitter, they would be transmitted in sequence. For the two antenna
transmitter now being considered, let us expand the signal space dimension to two at the receiver
by considering two successive time samples. This allows us to orthogonalize the contributions of
these two symbols at the receiver. Denoting by xi [1] and xi [2], i = 1, 2 the samples transmitted
from antenna i at two successive time intervals, we set
√ √
x1 [1] = b[1]/ 2,√ x2 [1] = b[2]/ 2 √
(8.67)
x1 [2] = −b∗ [2]/ 2, x2 [2] = b∗ [1]/ 2
where we have again normalized the net transmit power to that of a single antenna system.
Figure 8.21 depicts the operation of the Alamouti space-time code, taking a sequence of symbols
as input, and mapping them in groups of two to a sequence of samples at the output of each
antenna.
b[1],−b*[2],b[3],−b*[4],...
b[1], b[2], b[3], b[4],... Alamouti
space−time
b[2], b*[1], b[4], b*[3],...
code
Figure 8.21: The transmitter in an Alamouti space-time code takes two symbols at a time, and
maps them to two consecutive symbols to be sent from each of the two transmit antennas. The
input to the space-time encoder is the sequence of symbols to be transmitted, {b[n]}, √
while the
outputs are the sequences {xi [n]}, i = 1, 2, to be transmitted from antenna i. The 1/ 2 factor
for power normalization is omitted from the figure.
The received samples in the two successive time intervals are given by
We assume that the receiver has estimates of the channel coefficients h1 and h2 (e.g., using known
training signals). We would like to write the two observations as a received vector in which each
symbol modulates a different signal vector. Since the symbols are conjugated when sent over the
second time interval, we conjugate the second received sample when creating the received vector.
This yields the following vector model:
h1
! h2
!
y[1] √ √ w[1]
2 2
ỹ = = b[1] h∗ + b[2] h∗ + = b[1]u1 + b[2]u2 + w (8.69)
y ∗[2] √2 − √1 w ∗ [2]
2 2
436
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
The vectors u1 = √12 (h1 , h∗2 )T and u2 = √12 (h2 , −h∗1 )T are orthogonal (i.e., uH
1 u2 = 0), regardless
of the values of the channel coefficients, hence the symbols b[1] and b[2] do not interfere with
each other. The vector w ∼ CN(0, 2σ 2 I). The optimal decision statistic Z[i] for symbol b[i] is
given by matched filtering against ui :
Z[i] = uH
i ỹ, i = 1, 2 (8.70)
Exercise: Write out these decision statistics explicitly in terms of y[1], y[2], h1 , h2 .
Answer: Z[1] = h∗1 y[1] + h2 y ∗ [2], Z[2] = h∗2 y[1] − h1 y ∗ [2] (up to scale).
The SNR seen by each symbol is given by
||ui ||2 (|h1 |2 + |h2 |2 )/2
SNRAlamouti = = = GAlamouti SNR (8.71)
2σ 2 2σ 2
where
|h1 |2 + |h2 |2
GAlamouti = (8.72)
2
Comparing with (8.64) for receive diversity, we see that the Alamouti scheme in a 2 × 1 system
achieves the same diversity gain as receive diversity in a 1 × 2 system, but does not provide the
coherent gain obtained from averaging across receive antennas in the latter. Of course, as we see
in Problem 8.13 and in Software Lab 8.3, the Alamouti scheme applies to 2 × N MIMO systems
for arbitrary N, so that we can get noise averaging and receive diversity gains for N > 1. The
outage rates computed in Problem 8.13 are plotted in Figure 8.22.
0
10
−2
10
−4
10
Outage Probability
−6
10
−8
10
−10
10
−12
10
−14
10 2x1 Alamouti
2x2 Alamouti
2x4 Alamouti
−16
10
−5 0 5 10 15 20
Link Margin (dB)
Figure 8.22: Probability of outage versus link margin (dB) for Alamouti space-time coding for
2 × N MIMO with rich scattering.
The simplicity of the Alamouti construction (and its optimality for 2 × 1 MIMO) has led to its
adoption by a number of cellular and WiFi standards (just do an Internet search to see this).
Unfortunately, the orthogonalization provided by the Alamouti space-time code does not scale
to more than two transmit antennas. There are a number of “quasi-orthogonal” constructions
that have been investigated, but so far, these have had less impact on practice. Indeed, when
there are a large number of transmit antennas, the trend is to engineer the system so that
the transmitter has enough information about the channel to perform some form of transmit
beamforming (possibly using multiple beams).
437
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
b[1],b[3],...
b[1], b[2], b[3], b[4],... Serial
to b[2],b[4],...
aggregate symbol rate 2/T parallel
per−stream symbol rate 1/T
Figure 8.23: For spatial multiplexing, the transmitter may take a sequence of incoming symbols
{b[n]}, and do a serial-to-parallel conversion to map them to subsequences to be transmitted from
the different antennas. In the example shown, the odd symbols are transmitted from antenna 1,
and the even symbols from antenna 2. The aggregate symbol rate is twice the per-stream symbol
rate.
For example, suppose that the transmitter and receiver in a MIMO-OFDM system each have
two antennas (M = N = 2). For a given subcarrier in an OFDM system, consider a particular
time interval. Suppose that the transmitter sends x1 from antenna 1 and x2 from antenna 2
(referring to Figure 8.23, x1 = b[1] and x2 = b[2] in the first time interval). The samples at the
two receive elements are given by
y1 = H11 x1 + H12 x2 + w1
y2 = H21 x1 + H22 x2 + w2
where u1 is the response seen by transmit element 1 at the two receive antennas, u2 is the
response seen by transmit element 2 at the receive antennas, and w ∼ CN(0, 2σ 2 ) is complex
WGN. While we have considered a 2 × 2 MIMO system for illustration, the model is generally
applicable for 2 × N MIMO systems with N ≥ 2, with u1 and u2 denoting the two columns
of the channel matrix H, corresponding to the received responses for each of the two transmit
antennas, respectively.
In a MIMO-OFDM system, we have eliminated interference across subcarriers using OFDM, but
we have introduced interference in space by sending multiple symbols from different antennas.
The vector spatial interference model (8.73) for each subcarrier is analogous to the vector time
domain interference model for ISI in singlecarrier systems discussed in Chapter 6. Just as we can
compensate for ISI using a time-domain equalizer if the time domain channel has appropriate
438
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
characteristics, we can compensate for spatial interference using a spatial equalizer if the spatial
channel has appropriate characteristics. For example, if u1 and u2 are linearly independent,
then we can use linear ZF or MMSE techniques to demodulate the symbols x1 and x2 . Thus,
if there are M parallel data streams being sent from the transmit antennas, then we need at
least M receive antennas in order to obtain a signal space of large enough dimension for the
linear independence condition to be satisfied. Indeed, it can be shown more generally (without
restricting ourselves to ZF or MMSE techniques) that, for rich scattering models, the capacity
of an M × N MIMO channel scales as min(M, N), the minimum of the number of transmit and
receive antennas.
The ZF and MMSE receivers have been discussed in detail Section 8.2.2. For our purpose, the
relevant expressions are those for complex-valued signals in (8.35). We reproduce the expression
for ZF correlator here before adapting it to our present purpose.
−1
cZF = C−1 H −1
w U U Cw U e
Recall that U is a matrix containing the signal vectors as columns, and that e is a unit vector
with nonzero entry corresponding to the desired vector u0 in the ISI vector model (8.13). In
our spatial multiplexing model (8.73), U = H (the signal vectors are simply the columns of
the channel matrix), the noise covariance Cw = 2σ 2 I, and we wish to demodulate the data
corresponding to both of the signal vectors. Letting e1 = (1, 0)T and e2 = (0, 1)T , the ZF
correlators for the two streams can be written as (dropping scale factors corresponding to the
noise variance)
−1 −1
c1 = H H H H e1 , c2 = H HH H e2
We can represent this compactly as a single ZF matrix CZF = [c1 c2 ] containing these correlators
are columns. Noting that [e1 e2 ] = I, we obtain that
−1
CZF = H HH H , ZF matrix for spatial demultiplexing (8.74)
The decision statistics for the multiplexed streams are given by
Z = CH
ZF y (8.75)
While we have focused on the 2 × 2 example (8.73) in this derivation, it applies in general to M
spatially multiplexed streams in an M × N MIMO system with N ≥ M. The outage rate with
zero-forcing reception for 2 × 2 and 2 × 4 is plotted in Figure 8.24, using software developed in
Problem 8.14.
The MMSE receiver can be similarly derived to be
−1
CM M SE = HHH + 2σ 2 I H, MMSE matrix for spatial demultiplexing (8.76)
where we have normalized the transmitted symbols to unit energy (E [|b[n]|2 ] = σb2 = 1).
It is interesting to compare the spatial multiplexing model (8.73) with the diversity model (8.69)
for the Alamouti space-time code. The Alamouti code does not rely on the receiver having
multiple antennas, and therefore uses time to create enough dimensions for two symbols to be
sent in parallel. Furthermore, the vectors u1 and u2 in the Alamouti model (8.69) are constructed
such that they are orthogonal regardless of the propagation channel. In contrast, the spatial
multiplexing model (8.73) relies on nature to provide vectors u1 and u2 that are “different
enough” to support two parallel symbols. We explore these differences in Software Lab 8.3. The
spectral efficiency of spatial multiplexing is twice that of the Alamouti code, but the diversity
gain that it sees is smaller, as is evident from a comparison of the outage rate versus link margin
curves in Figures 8.22 and 8.24. It is possible to systematically quantify the tradeoff between
diversity and multiplexing, but this is beyond our scope here.
439
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
0
10
2x2 spatial mux
2x4 spatial mux
−1
10
−2
10
Outage Probability
−3
10
−4
10
−5
10
−6
10
−10 0 10 20 30 40 50 60
Link Margin (dB)
Figure 8.24: Outage rate versus link margin (with respect to the SISO benchmark) for 2 × N
spatial multiplexing (N = 2, 4) with zero-forcing reception.
440
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
• Explicit analytical formulas can be given for the ZF and MMSE equalizer, and the associated
SINRs, once the vector ISI model is specified.
• At high SNR, the MMSE equalizer converges to the ZF equalizer, which (for white noise)
can be interpreted geometrically as projecting the received vector orthogonal to the interference
subspace spanned by the interference vectors, thus nulling out the ISI while incurring noise en-
hancement. The ZF equalizer exists only if the desired vector is linearly independent of the
interference vectors.
• The geometric interpretation and analytical formulas for the ZF and MMSE equalizers devel-
oped for white noise can be extended to colored noise, with the derivation using the concept of
noise whitening.
OFDM
• Since complex exponentials are eigenfunctions of any LTI system, multiple complex expo-
nentials, each modulated by a complex-valued symbol, do not interfere with each other when
transmitted through a dispersive channel. Each complex exponential simply gets scaled by the
channel transfer function at that frequency. The task of equalization corresponds to undoing this
complex gain in parallel for each complex exponential. This is the conceptual basis for OFDM,
which enables parallelization of the task of equalization even for very complicated channels by
transmitting along subcarriers.
• OFDM can be implemented efficiently in DSP using an IDFT at the transmitter (frequency
domain symbols to time domain samples) and a DFT at the receiver (time domain samples to
frequency domain observations, which are the symbols scaled by channel gain and corrupted by
noise).
• In order to maintain orthogonality across subcarriers (required for parallelization of the task
of equalization) when we take the DFT at the receiver, the effect of the channel on the trans-
mitted samples must be that of a circular convolution. Since the physical channel corresponds
to linear convolution, OFDM systems emulate circular convolution by inserting a cyclic prefix in
the transmitted time domain samples.
The second part of the chapter provides an initial exposure to how multiple antennas at the
transmitter and receiver (i.e., MIMO, or space-time techniques) can be employed to enhance
the performance of wireless systems. Three key techniques, which in practice are combined in
various ways, are beamforming, diversity and spatial multiplexing.
Beamforming and Nullforming
• The array response for a linear array can be viewed as a mapping from the angle of ar-
rival/departure to a spatial frequency.
• For an N-element array, spatial matched filter, or beamforming, leads to a factor of N gain in
SNR. For transmit beamforming in which each antenna element is transmitting at a fixed power,
we obtain an additional power combining gain of N.
• By forming a beam at a desired transceiver and nulls at other transceivers, an antenna array
can support SDMA.
MIMO-OFDM abstraction
• Decomposing a time domain channel into subcarriers using OFDM allows a simple model for
MIMO systems, in which the channel between each pair of transmit and receive antennas is
modeled as a single complex gain for each subcarrier.
• When the propagation environment is complex enough, the central limit theorem motivates
modeling the channel gains between transmit-receive antenna pairs as i.i.d. zero mean complex
Gaussian random variables. We term this rich scattering model.
• Under the rich scattering model, each transmit-receive antenna pair sees Rayleigh fading, but
performance degradation due to fading can be alleviated using diversity.
Diversity
• Diversity strategies average over fades by exploiting roughly independent looks at the channel.
• Receive spatial diversity using spatial matched filtering provides a channel averaging gain (av-
441
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
eraging the fading gains across antennas) and a noise averaging gain (due to coherent combining
across antennas).
• Transmit spatial diversity provides channel averaging gains alone. It is trickier than receive di-
versity, since samples transmitted from different transmit antennas can interfere at the receiver.
For two transmit antennas, the Alamouti space-time code is an optimal scheme for avoiding
interference between different transmitted symbols, while providing channel averaging gains.
Spatial multiplexing
• Sending parallel data streams from different antennas increases the symbol rate proportional
to the number of data streams, with space playing a role analogous to bandwidth.
• The parallel data streams interfere at the receiver, but can be demodulated using spatial
equalization techniques analogous to the time domain equalization techniques studied in Section
8.2 (e.g., suboptimal techniques such as ZF and MMSE).
8.6 Endnotes
While we have shown that ISI in singlecarrier systems can be handled using linear equalization,
significant performance improvements can be obtained using nonlinear strategies, including op-
timal maximum likelihood sequence estimation (MLSE), whose complexity is often prohibitive
for long channels and/or large constellations, as well as suboptimal strategies such as decision
feedback equalization (DFE), whose complexity is comparable to that of linear equalization. An
introduction to such strategies, as well as pointers for further reading, can be found in more
advanced communication theory texts such as [7, 8].
OFDM has now become ubiquitous in both wireless and wireline communication systems in recent
years, because it provides a standardized mechanism for parallelizing equalization of arbitrarily
complicated channels in a way that leverages the dropping cost and increasing speed of digital
computation. For more detail than we have presented here, we refer to the relevant chapters in
books on wireless communication by Goldsmith [46] and Tse and Viswanath [47]. These should
provide the background required to access the huge research literature on OFDM, which focuses
on issues such as channel estimation, synchronization and reduction of PAPR.
There has been an explosion of research and development activity in MIMO, or space-time
communication, starting from the 1990s: this is the decade in which the large capacity gains
provided by spatial multiplexing were pointed out by Foschini [48] and Telatar [49], and the
Alamouti space-time code was published by Alamouti [50]. MIMO techniques have been in-
corporated into 3G and 4G (WiMax and LTE) cellular standards, and WiFi (IEEE 802.11n)
standards. An excellent reference for exploring MIMO-OFDM further is the textbook by Tse
and Viswanath [47], while a brief introduction is provided in Chapter 8 of Madhow [7]. Other
books devoted to MIMO include Paulraj et al [51], Jafarkhani [52], and the compilation edited
by Bolcskei et al [53].
As discussed in the epilogue, a new frontier in MIMO is opening up with research and development
for wireless communication systems at higher carrier frequencies, starting with the “millimeter
wave” band (i.e., carrier frequencies in the range 30-300 GHz, for which the wavelength is in the
range 1-10 mm). Of particular interest is the 60 GHz band, where there is a huge amount (7
GHz!) of unlicensed spectrum, in contrast to the crowding in existing cellular and WiFi bands.
While fundamental MIMO concepts such as beamforming, diversity, and spatial multiplexing
still apply, the order of magnitude smaller carrier wavelength and the order of magnitude larger
bandwidth requires fundamentally rethinking many aspects of link and network design, as we
briefly indicate in the epilogue.
442
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
8.7 Problems
ZF and MMSE equalization: modeling and numerical computations
Problem 8.1 (Noise enhancement computations) Consider the ISI vector models in (8.10),
(8.12) and (8.11).
(a) Compute the noise enhancements (dB) for the three equalizer lengths in these models, as-
suming white noise.
(b) Now, assume that the noise w[n] is colored, with Cw specified as follows:
2
σ , i=j
σ2
Cw (i, j) = , |i − j| = 1 (8.77)
2
0, else
Compute the noise enhancements for the three equalizer lengths considered, and compare with
your results in (a).
Problem 8.2 (Noise enhancement as a function of correlator length) Now, consider the
discrete time channel model leading to ISI vector models in (8.10), (8.12) and (8.11).
(a) Assuming white noise, compute and plot the noise enhancement (dB) as a function of equalizer
length, for L ranging from 4 to 16, increasing the observation interval by two by adding one sample
to each side of the current observation interval, and starting from an observation interval of length
L = 4 lined up with the impulse response for the desired symbol. Does the noise enhancement
decrease monotonically? Does it plateau? (b) Repeat for colored noise as in Problem 8.1(b).
Problem 8.3 (MMSE correlator and SINR computations) Consider the ISI vector model
(8.10).
2
(a) Assume Cw = σ 2 I, and define SNR = ||uσ02|| as the MF bound on achievable SNR. For
SNR of 6 dB, compute the MMSE correlator and the corresponding SINR (dB), using (8.34)
and (8.16). Check that the results match the alternative formula (8.36). Compare with the SNR
achieved by the ZF correlator.
(b) Plot the SINR (dB) of the MMSE and ZF correlators as a function of the MF SNR (dB).
Comment on any trends that you notice.
g (t) 2 g (t)
TX g (t) RX
C
Symbols 1 1
−0.5 Sampler Samples
b[n] t
t 0 t
0 1 0 1
−1
n(t)
Problem 8.4 (From continuous time to discrete time vector ISI model) In this problem,
we discuss an example of how to derive the vector ISI model (8.13) from a continuous time model,
using thePsystem shown in Figure 8.25. The symbol rate 1/T = 1, the input to the transmit
filter is n b[n]δ(t − nT ), where b[n] ∈ {−1, 1}. Thus, the continuous time noiseless signal at
443
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
P
the output of the receive filter is n b[n]q(t − nT ), where q(t) = (gT X ∗ gC ∗ gRX )(t) is the
system response to a single symbol. The noise n(t) at the input to the receive filter is WGN
with PSD σ 2 = N20 , so that the noise w(t) = (n ∗ gRX )(t) at the output of the receive filter, using
the material in Section 5.9, is zero mean, WSS, Gaussian with autocorrelation/autocovariance
function Rw (τ ) = Cw (τ ) = σ 2 (gRX ∗ gRX,mf )(τ ).
(a) Sketch the end-to-end response q(t). Compute the energy per bit Eb = ||q||2.
Eb ||q||2
Remark: Note that N 0
= 2σ2
. If we fix the signal scaling, and hence ||q||2, then the value of σ 2
is fixed once we specify Eb /N0 .
(b) Assume that the sampler operates at rate 2/T = 2, taking samples at times t = m/2 for
integer m. Show that the discrete time end-to-end response to a single symbol (i.e. the sampled
version of q(t)) is
1 1
h = (..., 0, 1, 2, − , −1, − , 0, ...)
2 2
(c) For the given sampling rate, show that a vector ISI model (8.13), the noise covariance matrix
satisfies (8.77).
(d) Specify the matrix U corresponding to the ISI model (8.13) for a linear equalizer of length
5, with observation interval lined up with the channel response for the desired symbol.
(e) Taking into account the noise coloring, compute the optimal ZF correlator, and its noise
enhancement relative to the matched filter bound.
Eb
(f) Compute the MMSE correlator for N 0
of 10 dB (see (a) and the associated remark). What
is the output SINR, and how does it compare with the SNR of the ZF correlator in (e)?
Problem 8.5 (ZF geometry) For white noise, the output of a ZF correlator satisfying (8.17)
and (8.18) is given by
cT r[n] = b[n] + N(0, σ 2 ||c||2)
Since the signal scaling is fixed, the optimal ZF correlator is one that minimizes the noise variance
σ 2 ||c||2. Thus, the optimal ZF correlator minimizes ||c||2 subject to (8.17) and (8.18).
(a) Suppose a correlator c1 satisfies (8.17) and (8.18), and is a linear combination of the signal
vectors {uk }. Now, suppose that we add a component ∆c orthogonal to the space spanned by
the signal vectors. Show that c2 = c1 + ∆c is also a ZF correlator.
(b) How is the output noise variance for c2 related to that for c1 ?
(c) Conclude that, in order to be optimal, a ZF correlator must lie in the signal subspace spanned
by {uk }.
(d) Observe that the condition (8.17) implies that c must be orthogonal to the interference
subspace spanned by {uk , k 6= 0}. Combining with (c), infer that c must be a scalar multiple of
P⊥ I u0 .
444
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 8.7 (Alternative computation of the ZF correlator) An alternative computa-
tion for the ZF correlator is by developing an expression for P⊥ I u0 in terms of u0 and UI , a
matrix containing the interference vectors {uk , k 6= 0} as columns. That is, UI is obtained from
the signal matrix U by deleting the column corresponding to the desired vector u0 . Let us de-
fine the projection of u0 onto the interference subspace by PI u0 . By definition, this is a linear
combination of the interference vectors, and can be written as
PI u0 = UI aI (8.80)
The orthogonal projection P⊥
I u0 is therefore given by
P⊥
I u0 = u0 − PI u0 = u0 − UI aI (8.81)
(a) Note that bPperp u0 must be orthogonal to each of the interference vectors {uk , k 6= 0}, hence
(going directly to the general complex-valued setting)
UH ⊥
I PI u0 = 0
(c) Derive the following explicit expression for the orthogonal projection
−1 H
P⊥ H
I u0 = u0 − UI UI UI UI u0 (8.82)
(d) Derive the following explicit expression for the energies of the projection onto the interference
subspace and the orthogonal projection:
−1 H
||PI u0 ||2 = uH 0 UI UI UI
H
UI u0
−1 H −1 H (8.83)
||P⊥I u0 || 2
= ||u 0 || 2
− uH
0 UI U H
I UI UI u0 = uH
0 I − UI U H
I U I UI u0
(e) Note that (8.82) and (8.83), together with (8.27), give us an expression for a ZF correlator
cZF scaled such that hcZF , u0 i = 1.
Problem 8.8 (Analytical expression for MMSE correlator) Let us derive the expression
(8.34) for the MMSE correlator for the vector ISI model (8.13). We consider the general scenario
of complex-valued symbols and signals. Suppose that the symbols {b[n]} in the model are
uncorrelated, satisfying
∗ 0, m 6= n
E[b[n]b [m]] = (8.84)
σb2 , m = n
We have
" #
X X
R = E[r[n]rH [n]] = E ( b[n + k]uk + w[n])( b[n + l]ul + w[n])H
k l
" #
X
p = E[b∗ [n]r[n]] = E b∗ [n]( b[n + k]uk + w[n])
k
Now use (8.84), and the independence of the symbols and the noise, to infer that
X
R = σb2 uk uH
k + Cw , p = σb2 u0
k
445
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 8.9 (ZF correlator for colored noise) Consider the model (8.13) where the noise
covariance is a positive definite matrix Cw . We now derive the formula (8.30) for the ZF corre-
lator, by mapping, via a linear transformation, the system to a white noise setting for which we
have already derived the ZF correlator in (8.22). Specifically, suppose that we apply an invertible
matrix A to the received vector r[n], then we obtain a transformed received vector
X
r̃[n] = Ar[n] = b[n + k]ũk + w̃[n] (8.85)
k
where
ũk = Auk (8.86)
c = AT c̃ (8.88)
(b) Suppose that we can find A such that the noise in the transformed system is white:
ACw AT = I (8.89)
Show that
C−1 T
w = A A (8.90)
(c) Show that the optimal ZF correlator c̃ for the transformed system is given by
−1
T
c̃ZF = Ũ Ũ Ũ e (8.91)
(d) Show that the optimal ZF correlator cZF in the original system is given by (8.30).
Hint: Use (8.86), (8.88) and (8.90).
(e) While we have used whitening only as an intermediate step to deriving the formula (8.30) for
the ZF correlator in the original system, we note for completeness that a whitening matrix A
satisfying (8.89) is guaranteed to exist for any positive definite Cw , and spell out two possible
choices for A. For example, we can take A = B−1 , where B is the square root of Cw , which
is a symmetric matrix satisfying Cw = B2 . Another, often more numerically stable, choice is
A = L−1 , where L is a lower triangular matrix obtained by the Cholesky decomposition of Cw .
which satisfies Cw = LLT . Matlab functions implementing these are given below:
%square root of Cw
B=sqrtm(Cw); %symmetric matrix
%Cholesky decomposition of Cw
L=chol(Cw,’lower’); %lower triangular matrix
Throughout the preceding problem, replacing transpose by conjugate transpose gives the cor-
responding results for the complex-valued setting. The Matlab code segment above applies for
both real- and complex-valued noise.
446
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
MIMO
Problem 8.10 (Effect of array spacing) Consider a regular linear array with N elements
and inter-element spacing d.
(a) For N = 8, plot the beam pattern for a beam directed at 30◦ from broadside for d = λ4 .
(b) Repeat (a) for d = 2λ.
(c) Comment on any differences that you notice in the beamforming patterns in (a) and (b).
(d) For inter-element spacings of αλ, the maximum of the beam pattern is not unique as α gets
larger. That is, the beam pattern takes its maximum value not just in the desired direction, but
also in a few other directions. These other maxima are called grating lobes. What is the value of
α beyond which you expect to see grating lobes?
Problem 8.11 (SDMA) The base station in a cellular network is equipped with a linear array
with 16 elements uniformly spaced at λ/3. Consider two mobiles, Mobile A is at angle 20◦ from
broadside and Mobile B is at angle −30◦ from broadside.
wishes to simultaneously send different data streams to two different mobiles. Assume that it
has a linear array with 16 elements uniformly spaced at λ/3. Mobile A is at angle 20◦ from
broadside and Mobile B is at angle −30◦ from broadside.
(a) Compute the array responses corresponding to each mobile, and plot the beamforming pat-
terns if the base station were only communicating with one mobile at a time.
(b) Now, suppose that the base station employs SDMA using zero-forcing interference suppres-
sion to send to both mobiles simultaneously. Plot the beam patterns used to send to Mobile A
and Mobile B, respectively.
(c) What is the noise enhancement in (b) relative to (a).
(d) Repeat (a)-(c) when Mobile B is at angle 10◦ from broadside (i.e., closer to Mobile A in
angular spacing). You should notice a significant increase in noise enhancement.
(e) Try playing around with different values of angular spacing between mobiles to determine
when the base station should attempt to use SDMA (e.g., what is the minimum angular spacing
at which the noise enhancement is, say, less than 3 dB).
Problem 8.12 (Outage rates with receive diversity) Consider a 1 × N MIMO system with
receive diversity. The gain relative to a SISO system is given by (8.64):
For our rich scattering model, hi ∼ CN(0, 1) are i.i.d., hence hi |2 are i.i.d. exponential random
variables, each with mean one. We state without proof that the sum of N such random variables
is a Gamma random variable with PDF and CDF given by
g N −1 −g
pG (g) = e Ig≥0 (8.92)
(N − 1)!
∞ N −1 k
−g
X gk −g
X g
FG (g) = P [G ≤ g] = e =1−e , g≥0 (8.93)
k=N
k! k=0
k!
(a) Use the preceding results to compute the probability of outage (log scale) for N-fold receive
diversity versus link margin (dB) relative to the SISO benchmark for N = 1, 2, 4. That is,
reproduce the results displayed in Figure 8.20.
(b) Optional It may be an interesting exercise to use simulations to compute the empirical CDF
of G, and to check that you get the same outage rate curves as those in (a).
447
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 8.13 (Alamouti scheme with multiple receive antennas) Consider a 2 × N
MIMO system where the transmitter employs Alamouti space-time coding as in (8.67). Let
H = (h1 h2 ) denote the N × 2 channel matrix, with (N × 1) columns h1 and h2
(a) Show that the optimal receiver is given by (8.70), where u1 = √12 (h1 , h∗2 )T and u2 =
√1 (h2 , −h∗ )T , and
2 1
y[1]
ỹ =
y∗ [2]
(b) Show that the SNR gain relative to our unfaded SISO system with the same transmit power
and constant channel gain of unity is given by
N 2
1 XX
G= |H(j, k)|2
2 j=1 k=1
Comparing with the receive diversity gain (8.64) in a 1 × N system, answer the following
True/False questions (give reasons for your answers).
(c) True or False: A 2 × 2 MIMO system with Alamouti space-time coding is 3 dB better than
a 1 × 2 MIMO system with receive diversity.
(d) True or False: A 2 × 2 MIMO system with Alamouti space-time coding is 3 dB worse than
a 1 × 4 MIMO system with receive diversity.
(e) Use the approach in Problem 8.12 to compute and plot the outage probability (log scale) ver-
sus link margin (dB) relative to the unfaded SISO system for a 2 × N MIMO system, N = 1, 2, 4.
You should get a plot that follows those in Figure 8.22.
Problem 8.14 (Outage rates for spatial multiplexing with ZF reception) Consider two-
fold spatial multiplexing in a 2 × N MIMO system with N × 2 channel matrix H. Define the
2 × 2 matrix R = HH H.
(a) Referring to the spatial multiplexing model (8.73), how do the entries of R relate to the
signal vectors u1 and u2 ?
(b) Show that the energy of the projection of u1 orthogonal to the subspace spanned by u2 is
given by
|R(1, 2)|2
E1 = R(1, 1) −
R(2, 2)
where R(i, j) denotes the (i, j)th entry of R, i, j = 1, 2.
(c) If we fix the transmit power to that of the SISO benchmark (splitting it equally between the
two data streams), show that the gain seen by the first data stream is given by
|R(1, 2)|2
E1 1
G1 = = R(1, 1) −
2 2 R(2, 2)
|R(1, 2)|2
1
G2 = R(2, 2) −
2 R(1, 1)
Note that, under our rich scattering model, G1 and G2 are identically distributed random vari-
ables.
(c) Use computer simulations with the rich scattering model to plot the outage rate versus link
margin for 2 × 2 and 2 × 4 MIMO with two-fold spatial multiplexing and ZF reception. You
should get a plot similar to Figure 8.24. Discuss how the performance compares with that of the
Alamouti scheme.
448
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Problem 8.15 (Outage rates for spatial multiplexing with ZF reception) (a) Argue
that 2 × N Alamouti space-time coding is exactly 3 dB worse than 1 × 2N receive diversity.
(b) For 2 × N spatial multiplexing with ZF reception, approximate its performance as x dB
worse than a 1 × N ′ receive diversity. (Note that spatial multiplexing has twice the bandwidth
efficiency as receive diversity, but it loses 3 dB of power up front due to splitting it between the
two data streams.)
Answer: Approximately 3 dB worse than 1×(N −1) receive diversity. That is, if the gain relative
to the SISO benchmark for a 1 × N receive diversity system is denoted as Grx−div (N), then the
gain for a 2 × N spatial multiplexed system is Gsmux (N) ≈ 12 Grx−div (N − 1)/2. Thus, the CDF,
and hence outage rate, of the spatially multiplexed system is approximated as
P [Gsmux (N) ≤ x] ≈ P [Grx−div (N − 1) ≤ 2x]
(c) Use the results in (b), and the analytical framework in Problem 8.12, to obtain an analytical
approximation for the simulation results in Problem 8.14(c).
Laboratory Assignment
0) Use as your transmit and receive filters the SRRC pulse employed in Software Labs 4.1 and
6.1. Putting the code for realizing these together with the code fragments developed in this
chapter provides the code required for this lab. As in Software Labs 4.1 and 6.1, the transmit,
channel, and receive filters are implemented at rate 4/T . For simplicity, we consider BPSK
signaling throughout this lab, and consider only real-valued signals. Generate nsymbols =
ntraining + npayload (numbers to be specified later) ±1 BPSK symbols as in Lab 6.1, and pass
them through the transmit, channel, and receive filters to get noiseless received samples at rate
4/T .
1) Let us start with a trivial channel filter as before. Set nsymbols = 200. The number of rate
4/T samples at the output of the receive filter is therefore 800, plus tails at either end because
the length of the effective pulse modulating each symbol extends over multiple symbol intervals.
Plot an eye diagram (e.g., using code fragment 8.1.3) using, say, 400 samples in the middle. You
should get an eye diagram that looks like Figure 8.26: the cascade of the transmit and receive
filter is approximately Nyquist, and the eye is open, so that we can find a sampling time such
that we can distinguish between +1 and -1 well, despite the influence of neighboring symbols.
2) Now introduce a non-trivial channel filter. In particular, consider a channel filter specified (at
rate 4/T ) using the following matlab command:
channel filter = [−0.7, −0.3, 0.3, 0.5, 1, 0.9, 0.8, −0.7, −0.8, 0.7, 0.8, 0.6, 0.3]′;
Generate an eye diagram again. You should get something that looks like Figure 8.27. Notice
now that there is no sampling time at which you can clearly make out the difference between
449
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
10
−2
−4
−6
−8
−10
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
t/T
Figure 8.26: Eye diagram for a non-dispersive channel. The eye is open.
30
20
10
Receive filter output
−10
−20
−30
−40
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
t/T
Figure 8.27: Eye diagram for a dispersive channel. The eye is closed.
450
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
+1 and -1 symbols. The eye is now said to be closed due to ISI, so that we cannot make symbol
decisions just by passing appropriately timed received samples through a thresholding device.
3) We are now going to evaluate probability of error without and with equalization. First, let
us generate the noisy output of the receive filter. We need to generate nsymbols = ntraining +
npayload (numbers to be specified later) ±1 BPSK symbols as in Software Lab 6.1, and pass them
through the transmit filter, the dispersive channel, and the receive filter to get noiseless received
samples at rate 4/T . Since we are signaling along the real axis only, at the input to the receive
filter, add iid N(0, σ 2 ) real-valued noise samples (as in Lab 6.1, choose σ 2 = N20 corresponding
Eb
to a specified value of N 0
). Pass these (rate 4/T ) noise samples through the receive filter, and
add the result to the signal contribution at the receive filter output.
4) Performance without equalization: Let {rk } denote the output of the receive filter, and
let Z[n] = rd+4(n−1) , n = 1, 2, ..., nsymbols denote the best symbol rate decision statistics you
can obtain by subsampling at rate 1/T the receive filter output. As in the solutions to earlier
labs, choose the decision delay d equal to the location of the maximum of the overall response
(which now includes the channel) to a single symbol. For nsymbols = 10100, compute the error
probability of the decision rule b̂[n] = sign(Z[n]) as a function of Eb /N0 , where the latter ranges
from 5 to 20 dB. Compare with the ideal error probability curve for BPSK signaling for the same
range of Eb /N0 . This establishes that a simple one-sample per symbol decision rule does not
work well for non-ideal channels, and motivates the equalization schemes discussed below.
Linear equalization: We now consider linear equalization, where the decision for symbol bn is
based on linear processing of a vector of samples r[n] of length L = 2M + 1, where the entries of
r[n] are samples spaced by T /q, with the center sample being the same as the decision statistic
in part 3: q = 1 corresponds to symbol-spaced sampling, and q > 1 corresponds to fractionally
spaced sampling. We consider two cases: q = 1 and q = 2.
r[n] = (rk+4(n−1)+d−(4/q)M , rk+4(n−1)+d−(4/q)(M −1) , ...,
rk+4(n−1)+d , rk+4(n−1)+d+(4/q) , ..., rk+4(n−1)+d+(4/q)M )T
The decision rule we use is
b̂ = sign(cT r[n]) (8.94)
where c is a correlator whose choice is to be specified. Note that the decision rule in part 3
corresponds to the choice c = (0, .., 0, 1, 0, ..., 0)T , since it uses only the center sample.
The vector of samples r[n] contains contributions from both the desired symbol bn and from ISI
due to bn±1 , bn±2 , etc. We implement the linear minimum mean squared error (MMSE) equalizer
using a least squares adaptive implementation, as discussed in Section 8.2.1.
5) For the least squares implementation, assume that the first ntraining symbols are known
training symbols, b1 , ..., bntraining . Define the L × L matrix
ntraining
1 X
R̂ = r[n]rT [n]
ntraining n=1
6) Now, the correlator obtained via (8.8) is used to make decisions, using the decision rule (8.94),
on the unknown symbols n = ntraining + 1, ..., nsymbols.
451
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
7) Fix ntraining = 100 and npayload = 10000. For L = 3, 5, 7, 9, and q = 1, 2, implement linear
MMSE equalizers, and plot their error probabilities (for the payload symbols) as a function of
Eb /N0, in the range 5 to 20 dB. to the unequalized error probability and the ideal error probability
found in part 3.
Hint: An efficient way to generate the statistics cT r[n] is to pass an appropriate rate 2/T
subsequence of the receive filter output through a filter whose impulse response is the time
reverse of c, and to then appropriately subsample at rate 1/T the output of the equalizing filter.
This is much faster than correlating c with r[n] for each n.
8) Comment on the performance of symbol-spaced versus fractionally spaced equalization. Com-
ment on the effect of equalizer length of performance. What is the effect of increasing or de-
creasing the training period?
Lab Report: Your lab report should document the results of the preceding steps in order.
Describe the reasoning you used and the difficulties you encountered.
Laboratory Assignment
We would like to leverage the code from Software Lab 8.1 as much as possible, so we set the DAC
filter to be the transmit filter in that lab, and the receive filter to its matched filter as before.
The main difference is that the time domain samples sent through the DAC filter are obtained
by taking the inverse FFT of the frequency domain symbols, and inserting a cyclic prefix. We
fix the constellation as Gray coded QPSK.
Step 1 (Exploring time and frequency domain relationships in OFDM): Let us first
discuss the structure of a single “OFDM symbol,” which carries N complex-valued symbols in
the frequency domain. Here N is the number of subcarriers, chosen to be a power of 2. Set L to
be length of the cyclic prefix. Set N = 256, L = 20 for these initial explorations, but keep the
parameters programmable for later use.
1a) Generate N Gray coded QPSK symbols B = {B[k], k = 1, .., N}. (You can use the function
qpskmap developed in Software Lab 6.1 for this purpose.) Take the inverse FFT to obtain time
domain samples b = {b[n], n = 1, ..., N}.
1b) Append the last L time domain samples to the beginning, to get a length N + L sequence
of time domain samples b′ = {b′ [n], n = 1, ..., N + L}. That is, b′ [1] = b[N − L + 1], ..., b′ [L] =
b[N], b′ [L + 1] = b[1], ..., b′ [N + L] = b[N].
1c) Take the first N symbols of b′ , say r1 = {b′ [1], ..., b′ [N]}. Show that the FFT output (say
R1 = {R1 [k]}) is related to the original frequency domain symbols B through a frequency domain
channel H as follows: R1 [k] = H[k]B[k]. Find and plot the amplitude |H[k]| and phase arg(H[k])
versus k.
1d) Repeat 1c) for the time domain samples {b′ [3], ..., b′ [N + 3]} (i.e., skip the first two samples
of b′ . How are the frequency domain channels in 1c) and 1d) related? (What we are doing here
is exploring what cyclic shifts in the time domain do in the frequency domain.)
Step 2 (generating multiple OFDM symbols): Now, we generate K frames, each carrying
N Gray coded QPSK symbols. Set N = 256, L = 20, K = 5 for numerical results and plots in
this step.
2a) For each frame, generate time domain samples and add a cyclic prefix, as in Steps 1a) and
1b). Then, append the time domain samples for successive frames together. We now have a
452
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
stream of K(N + L) time domain samples, analogous to the time domain symbols sent in Lab 3.
2c) Pass the time domain symbols through the same transmit filter (this is the DAC in Figure
8.9) as in Software Labs 4.1, 6.1 and 8.1, again oversampling by a factor of 4. (That is, if the
time domain samples are at rate 1/Ts , the filter is implemented as rate 4/Ts . This gives us a
rate 4/Ts transmitted signal.
2d) Compute the peak to average power ratio (PAPR) in dB for this transmitted signal (OFDM
is notorious for having a large PAPR). This is done by taking the ratio of the maximum to
average value of the magnitude squared of the time domain samples.
2e) Note that the original QPSK symbols in the frequency domain have a PAPR of one, but the
time domain samples are generated by mixing these together. The time domain samples could
be expected, therefore, to have a Gaussian distribution, invoking the central limit theorem. Plot
a histogram of the I and Q components from the time domain samples. Do they look Gaussian?
2f) As in Lab 6.1, assume an ideal channel filter and pass the transmitted signal through a receive
filter matched to the transmit filter. This gives a rate 4/Ts noiseless received signal.
2g) Subsample the received signal at rate 1/Ts , starting with a delay of d samples (play around
and see what choice of d works well–perhaps based on the peak of the cascade of the transmit,
channel and receive filter). The first N samples corresponding to the first frame. Take the FFT
of these N samples to get {R1 [k]}. Now, estimate the frequency domain channel coefficients
{H[k]} by using the known transmitted symbols B1 [k] in the first frame as training. That is,
Plot the magnitude and phase of the channel estimates and comment on how it compares to
what you saw in 1c) and 1d).
2h) Now, use the channel estimate from 2g) to demodulate the succeeding frames. If frame m
uses time domain samples over a window [a, b], then frame m + 1 uses time domain samples over
a window [a + (N + L)Ts , b + (N + L)Ts ]. Denoting the FFT of the time domain samples for
frame m as Rm [k], the decision statistics for the frequency domain symbols for the mth frame
are given by
B̂m [k] = Ĥ ∗ [k]Rm [k]
You can now decode the bits and check that you get a BER of zero (there is no noise so far).
Also, display scatter plots of the decision statistics to see that you are indeed seeing a QPSK
constellation after compensating for the channel.
Step 3 (Channel compensation): We now introduce a nontrivial channel (still no noise).
Increase the cyclic prefix length if needed (it should be long enough to cover the cascade of the
transmit, channel and receive filters. But remember that the cyclic prefix is at rate 1/Ts , whereas
the filter cascade is at rate 4/Ts .
3a) Repeat Step 2f, but now with a nontrivial channel filter modeled at rate 4/Ts . Use the
channels you have tried out in Lab 3 (still no noise). For example:
channel filter = [−0.7, −0.3, 0.3, 0.5, 1, 0.9, 0.8, −0.7, −0.8, 0.7, 0.8, 0.6, 0.3]′;
3b) Repeat Step 2g. Comment on how the magnitude and phase of the frequency domain channel
differs from what you saw in 2g, 1c and 1d.
3c) Repeat Step 2h. Check that you get a BER of zero, and that your decision statistics give
nice QPSK scatter plots.
3d) Check that everything still works out as you vary the number of subcarriers N (e.g., N =
512, 1024, 2048), the cyclic prefix length L and the number of frames K.
Step 4 (Effect of noise): Now, add noise as in Software Labs 6.1 and 8.1. Specifically, at the
input to the receive filter, add independent and identically distributed (iid) complex Gaussian
noise, such that the real and imaginary part of each sample are iid N(0, σ 2 ) (we choose σ 2 = N20
Eb
corresponding to a specified value of N 0
). Let us fix N = 1024 for concreteness, and set the cyclic
453
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
prefix to just a little longer than the minimum required for the channel you are considering. Set
the number of frames to K = 10. Try a couple of values of Eb /N0 of 5 dB and 8 dB.
4a) While you can estimate Eb analytically, estimate it by taking the energy of the transmitted
signal in 3a, and dividing it by the number of bits in the payload (i.e., excluding the first frame).
Use this to set the value of N0 for generating the noise samples.
4b) Pass the (rate 4/Ts ) noise samples through the receive filter, and add the result to the output
of part 3a.
4c) Consider first a noiseless channel estimate, in which you carry out Step 3b (estimating the
channel based on frame 1) before you add noise to the output of 3a. Now add the noise and
carry out Step 3c (demodulating the other frames). Estimate the BER and compare with the
analytical value for ideal QPSK. Show the scatter plots of the decision statistics.
4d) Repeat 4c, except that you now estimate the channel based on frame 1 after adding noise.
Discuss how the BER degrades. Compare the channel estimates from parts 4c and 4d on the
same plot.
Note: You may notice a significant BER degradation, but that is because the channel estimation
technique is naive (the channel coefficients for neighboring subcarriers are highly correlated, but
our estimate is not exploiting this property). Exploring better channel estimation techniques is
beyond the scope of this lab, but you are encouraged to browse the literature on OFDM channel
estimation to dig deeper.
Step 5 (Consolidation): Once you are happy with your code, plot the BER (log scale) as a
function of Eb /N0 (dB) for the channel in Lab 3. Plot three curves: ideal QPSK, OFDM with
noiseless channel estimation, OFDM with noisy channel estimation. Comment on the relation
between the curves.
Lab Report: Your lab report should document the results of the preceding steps in order.
Describe the reasoning you used and the difficulties you encountered.
Laboratory Assignment
Background: Consider the rich scattering model for a single subcarrier in a MIMO-OFDM
system with M transmit and N receive antennas. The N × M channel matrix H is modeled as
having i.i.d. CN(0, 1) entries.
Code Fragment 8.7.1 (MIMO matrix with i.i.d. complex Gaussian entries)
Let T denote the number of time domain samples for our system. Let xi [t] denote the sam-
ple transmitted from transmit antenna i at time t, where 1 ≤ i ≤ M and 1 ≤ t ≤ T . Let
x[t] = (x1 [t], ..., xM [t])T denote the M × 1 vector of samples transmitted at time t, and let
X = (x[1], ..., x[T ]) denote the M × T matrix containing all the transmitted samples. Our con-
vention is to normalize the net transmit power to one, so that |xi [t]|2 = M1 . For a single input
454
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
single output (SISO) system, this would lead to an average received SNR of SNR = 2σ12 , since the
magnitude squared of the channel gain is normalized to one, and the noise per receive antenna
is modeled as CN(0, 2σ 2 ), and we vary this hypothetical SISO system SNR when evaluating
performance.
The N × T received matrix Y, with yj [t], 1 ≤ j ≤ N denoting the spatial vector of received
samples at time t, is then modeled as
Y = HX + N
where N is an N × T matrix with i.i.d. CN(0, 2σ 2 ) entries. This model is implemented in the
following code fragment.
In order to use the preceding generic code fragments for a particular MIMO scheme, we must
(a) map the transmitted symbols into the matrix X of transmitted samples, and (b) process the
matrix Y of received samples appropriately.
Alamouti space-time code: We first consider the Alamouti space-time code for a 2 × 1 MIMO
system. The transmitted samples can be generated using the following code fragment.
Step 1: Consider a 2 × 1 MIMO system. Setting M = 2, N = 1 and SNR at 10 dB, put code
fragments 8.7.1, 8.7.3, and 8.7.2 together to model the transmitted and received matrices X and
Y. Setting T = 100, do a scatter plot of the real and imaginary parts of the received samples.
The received samples should be smeared out over the complex plane, since the signals from the
two transmit antennas interfere with each other at the receive antenna.
Step 2: Compute the decision statistics (8.70) based on the received matrix Y. You may use
the following code fragment, but you must explain what it is doing. Do a scatter plot of the
decision statistics. You should recover the noisy QPSK constellation.
455
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Code Fragment 8.7.4 (Receiver processing for Alamouti space-time code for a 2 × 1
MIMO system)
Step 3: Repeat Steps 1 and 2 for a few different realizations of the channel matrix. The quality
2 2
of the scatter plot in Step 2 should depend on the G = |H(1,1)| +|H(1,2)|
2
.
Step 4: Now, suppose that we use the Alamouti space-time code for a 2 × N MIMO system,
where N may be larger than one. Show that only the receiver processing code fragment 8.7.4
needs to be modified (other than changing the value of N in the other code fragments), with
Ỹ having dimension 2N × T2 and u1 , u2 each having dimension 2N × 1. Implement these
modifications and do a scatter plot of the decision statistics for N = 2 and N = 4, fixing the
equivalent SISO SNR to 10 dB. You should notice a qualitative improvement with increasing N
as you run several channel matrices, although the plots depend on the channel realization.
Hint: See Problem 8.13.
We now consider spatial multiplexing in a 2 × N MIMO system. We can now send 2T symbols
over T time intervals, as in the following code fragment.
Step 5: Setting M = 2, N = 4 and SNR at 10 dB, put code fragments 8.7.1, 8.7.5, and 8.7.2
together to model the transmitted and received matrices X and Y. Setting T = 100, again do
a scatter plot of the real and imaginary parts of the received samples for each received antenna.
As before, the received samples should be smeared out over the complex plane, since the signals
from the two transmit antennas interfere with each other at the receive antennas.
Step 6: Now, apply a ZF correlator as in (8.74) to Y to separate the two data streams. Do
scatter plots of the two estimated data streams. You should recover noisy QPSK constellations.
Step 7: Fixing SNR at 10 dB and fixing a 2 × 4 channel matrix, compare the scatter plots
of the decision statistics for the Alamouti scheme with those for two-fold spatial multiplexing.
Which ones appear to be cleaner?
Lab Report: Your lab report should document the results of the preceding steps in order.
Describe the reasoning you used and the difficulties you encountered.
456
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Epilogue
457
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
of the order of kilometers to picocells with diameters of the order of 100-200 meters increases
spatial reuse, and hence potentially the network capacity, by two orders of magnitude. Picocel-
lular base stations may be opportunistically deployed on lampposts or rooftops, and see a very
different propagation and interference environment from macrocellular base stations carefully
placed at elevated locations. Interference among adjacent picocells becomes a major bottleneck,
as does the problem of handing off rapidly moving users as they cross cell boundaries (indeed,
cell boundaries are difficult to even define in picocellular networks due to the complexity of
below-rooftop propagation). Thus, it is important to rethink the design philosophy of tightly
controlled deployment and operation in today’s macrocellular networks. The scaling and organic
growth of picocellular networks is expected to require a significantly greater measure of decentral-
ized self-organization, including, for example, auto-configuration for plug-and-play deployment,
decentralized coordination for interference and mobility management, and automatic fault de-
tection and self-healing. Another critical issue with small cells is backhaul (i.e., connecting each
base station to the wired Internet): pulling optical fiber to every lamppost on which a picocel-
lular base station is deployed may not be feasible. Finally, we can go to even smaller cells called
femtocells, with base stations typically deployed indoors, in individual homes or businesses, and
using the last mile broadband technology already deployed in such places for backhaul. For both
picocells and femtocells, it is important to devise efficient techniques for sharing spectrum, and
managing potential interference, with the macrocellular network. In essence, we would like to
be able to opportunistically deploy base stations as we do WiFi access points, but coordinate
just enough to avoid the tragedy of the commons resulting from the purely selfish behavior in
unmanaged WiFi networks. Of course, as we learn more about how to scale such self-organized
cellular networks, we might be able to apply some of the ideas to promote peaceful coexistence
in densely deployed and independently operated WiFi networks using unlicensed spectrum. In
short, it is fair to say that there is a clear opportunity and dire need for significant innovations in
overall design approach as well as specific technological breakthroughs, in order to truly attain
the potential of “small cells.”
Millimeter wave communication: While commercial wireless networks deployed today employ
bands well below 10 GHz, there is significant interest in exploring higher carrier frequencies, where
there are vast amounts of spectrum. Of particular interest are millimeter (mm) wave frequencies
from 30-300 GHz, corresponding to wavelengths from 10 mm down to 1 mm. Historically,
RF front end technology for these bands has been expensive and bulky, hence there was limited
commercial interest in using them. This has changed in recent years, with the growing availability
of low-cost silicon radio frequency integrated circuits (RFICs) in these bands. The particular slice
of spectrum that has received the most attention is the 60 GHz band (from 57-64 GHz). Most of
this band is unlicensed worldwide. The availability of 7 GHz of unlicensed spectrum (vastly more
than the bandwidth in current cellular and WiFi systems) opens up the possibility for another
revolution in wireless communication, with links operating at multiples of Gigabits per second
(Gbps). Potential applications of 60 GHz in particular, and mm wave in general, include order of
magnitude increases in the data rates for indoor wireless networks, multiGbps wireless backhaul
networks, and base station to mobile links in picocells, and even wireless data centers. However,
realizing the vision of multiGbps wireless everywhere is going to take some work. While we can
draw upon the existing toolkit of ideas developed for wireless communication to some extent,
we may have to rethink many of these ideas because of the unique characteristics of mm wave
communication. The latter largely follow from the order of magnitude smaller carrier wavelength
relative to existing wireless systems.
At the most fundamental level, consider propagation loss. As discussed in Section 6.5 (see also
Problems 6.37 and 6.38), the propagation loss for omnidirectional transmission scales with the
square of the carrier frequency, but for the same antenna aperture, antenna directivity scales
up with the square of the carrier frequency. Thus, given that generating RF power at high
carrier frequencies is difficult, we anticipate that mm wave communication systems will employ
antenna directivity at both ends. Since the inter-element spacing scales with carrier wavelength,
458
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
it becomes possible to accommodate a large number of antenna elements in a small area (e.g.,
a 1000 element antenna array at 60 GHz is palm-sized!), and to use electronic beamsteering to
realize pencil beams at the transmitter and receiver. Of course, this is easier said than done.
Hardware realization of such large arrays remains a challenge. On the algorithmic side, since
building a separate upconverter or downconverter for every antenna element is infeasible as we
scale up the array, it is essential to devise signal processing algorithms that do not assume the
availability of the separate complex baseband signals for each antenna element. The nature
of diversity and spatial multiplexing also fundamentally changes at tiny wavelengths: due to
the directionality of mm wave links, there are only a few dominant propagation paths, so that
designs for rich scattering models no longer apply. For indoor environments, blockage by humans
and furniture becomes inevitable, since the ability of electromagnetic waves to diffract around
obstacles depends on how large they are relative to the wavelength (i.e., obstacles “look bigger”
at tiny wavelengths). For outdoor environments, performance is limited by the oxygen absorption
loss (about 16 dB/km) in the 60 GHz band, and rain loss for mm wave communication in general
and the mm wave band (e.g., as high as 30 dB/km in heavy rain). While link ranges of hundreds
of meters can be achieved with reasonable margins to account for these effects, longer ranges
than these would be fighting physics, hence multihop networks become interesting. Of course,
once we start forming pencil beams, networking protocols that rely on the broadcast nature
of the wireless medium no longer apply. These are just a few of the issues that are probably
going to take significant research and development to iron out, which bodes well for aspiring
communication engineers.
Figure 8.28 depicts how picocells and mm wave communication might come together to address
the cellular capacity crisis. A large macrocellular base station provides default connectivity
via Long Term Evolution (LTE), a fourth generation cellular technology standardized relatively
recently; despite its name, it may not suffice for the long term because of exponentially increasing
demand. Picocellular base stations are deployed opportunistically on lampposts, and may be
connected via a mm wave backhaul network. Users in a picocell could talk to the base stations
using LTE, or perhaps even mm wave. Users not covered by picocells talk to the macrocell using
LTE.
Cooperative communication While we have restricted attention in this book to the study of com-
munication between a single transmitter and receiver, this provides a building block for emerging
ideas in cooperative communication. For example, neighboring nodes could form a virtual an-
tenna array, forming distributed MIMO (DMIMO) systems with significantly improved power-
bandwidth tradeoffs. This allows us, for example, to bring the benefits of MIMO to systems
with low carrier frequencies which propagate well over large distances, but are not compatible
with centralized antenna arrays because of the large carrier wavelength. For example, the wave-
length at 50 MHz is 6 meters, hence conventional antenna arrays would be extremely bulky, but
neighboring nodes naturally spaced by tens of meters could form a DMIMO array. DMIMO at
low carrier frequencies is a promising approach for bridging the digital divide in cost-effective
fashion, providing interference suppression and multiplexing capabilities as in MIMO, along with
link ranges of tens of kilometers. Another promising example of cooperative communication is
interference alignment, in which multiple transmitters, each of which is sending to a different
receiver, coordinate so as to ensure that the interference they generate for each other is limited
in time-frequency space. Of course, realizing the benefits of cooperative communication require
fundamental advances in distributed synchronization and channel estimation, along with new
network protocols that support these innovations. More good news for the next generation of
communication engineers!
Full-duplex communication: Most communication transceivers cannot transmit and receive at
the same time on the same frequency band (or even closely spaced bands). This is because even
a small amount of leakage from the transmit chain can swamp out the received signal, which is
much weaker. Thus, communication networks typically operate in time division duplexed (TDD)
mode (also more loosely termed half duplex mode), in which the transmitter and receiver use the
459
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Figure 8.28: The potential role of small cells and mm wave communication in future cellular
systems (figure courtesy Dinesh Ramasamy).
460
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
same band, but are not active at the same time, or in frequency division duplexed (FDD) mode,
in which the transmitter and receiver may be simultaneously active, but in different, typically
widely separated, bands. There has been promising progress recently, however, on relatively low-
cost approaches to canceling interference from the transmit chain, seeking to make full duplex
operation (i.e., sending and receiving at the same time in the same band) feasible. If these
techniques turn out to be robust and practical, then they could lead to significant performance
enhancements in wireless networks. Of course, networking with full duplex links require revisiting
current protocols, which are based on either TDD or FDD.
Challenging channels: While we have discussed issues related to significant improvements in
wireless data rates and ranges relative to large-scale commercial wireless networks today, there
are important applications where simply forming and maintaining a viable link is a challenge.
Examples include underwater acoustic networks (for sensing and exploration in oceans, rivers
and lakes) and body area networks (for continuous health monitoring).
Wireless-enabled multi-agent systems: Wireless is at the heart of many emerging systems that
require communication and coordination between a variety of “agents” (these may be machines
or humans). Examples include asset tracking and inventory management using radio frequency
identification (RFID) tags; sensor networks for automation in manufacturing, environmental
monitoring, healthcare and assisted living; vehicular communication; smart grid; and nascent
concepts such as autonomous robot swarms. Such “multi-agent” systems rely on wireless to
provide tetherless connectivity among agents, as well as to possibly provide radar-style measure-
ments, hence characterization and optimization of the wireless network in each specific context
is essential for sound system design.
461
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
mobility.
Yet another area where we seek increased speed and sophistication is wired backplane commu-
nication for interconnecting hardware modules on a circuit board (e.g., inputs and outputs for a
high-speed router, or processor and memory modules in computer), and “networks on chip” for
communicating between modules on a single integrated circuit (e.g., for a “multi-core” processor
chip with multiple processor and memory modules).
How does one overcome the ADC bottleneck? We do not have answers yet, but there are some
natural ideas to try. One possibility is to try and get by with fewer bits of precision per sample.
Severe quantization introduces a significant nonlinearity, but it is possible that we could still
extract enough information for reliable communication if the dynamic range of the analog signal
being quantized is not too large. Of course, the algorithms that we have seen in this textbook
(e.g., for demodulation, linear equalization, MIMO processing) all rely on the linearity of the
channel not being disturbed by the ADC, an excellent approximation for high-precision ADC.
This assumption now needs to be thrown out: in essence, we must “redo” DSP for communication
if we are going to live with low-precision ADC at the receiver. Another possibility is to parallelize:
we could implement a high-speed ADC by running lower speed ADCs in parallel, or we could
decompose the communication signal in the frequency domain, such that relatively low-speed
ADCs with high precision can be used in parallel for different subbands. These are areas of
active research.
Parting thoughts
The introductory treatment in this textbook is intended to serve as a gateway to an exciting future
in communications research and technology development. We hope that this discussion gives the
reader the motivation for further study in this area, using, for example, more advanced textbooks
and the research literature. We do not provide specific references for the topics mentioned in
this epilogue, because research in many of these areas is evolving too rapidly for a few books
or papers to do it justice. Of course, the discussion does provide plenty of keywords for online
searches, which should bring up interesting material to follow up on.
462
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Bibliography
463
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
[21] F. M. Gardner, Phaselock techniques. Wiley, 2005.
[22] A. J. Viterbi, Principles of Coherent Communication. Mc-Graw Hill, 1966.
[23] R. Best, Phase locked loops: design, simulation, and applications. McGraw Hill, 2007.
[24] B. Razavi, Phase locking in high-performance systems: from devices to architectures. Wiley-
IEEE Press, 2003.
[25] R. D. Yates and D. J. Goodman, Probability and Stochastic Processes: A Friendly Introduc-
tion for Electrical and Computer Engineers. Wiley, 2004.
[26] J. W. Woods and H. Stark, Probability and Random Processes with Applications to Signal
Processing. Prentice Hall, 2001.
[27] A. Leon-Garcia, Probability and Random Processes for Electrical Engineering. Prentice Hall,
1993.
[28] A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes.
McGraw-Hill, 2002.
[29] J. B. Johnson, “Thermal agitation of electricity in conductors,” Phys. Rev., vol. 32, pp. 97–
109, 1928.
[30] H. Nyquist, “Thermal agitation of electric charge in conductors,” Phys. Rev., vol. 32,
pp. 110–113, 1928.
[31] D. Abbott, B. Davis, N. Phillips, and K. Eshraghian, “Simple derivation of the thermal
noise formula using window-limited fourier transforms and other conundrums,” Education,
IEEE Transactions on, vol. 39, pp. 1 –13, feb 1996.
[32] R. Sarpeshkar, T. Delbruck, and C. Mead, “White noise in mos transistors and resistors,”
Circuits and Devices Magazine, IEEE, vol. 9, pp. 23 –29, nov. 1993.
[33] V. A. Kotelnikov, The Theory of Optimum Noise Immunity. McGraw Hill, 1959.
[34] G. D. Forney, “Maximum-likelihood sequence estimation of digital sequences in the presence
of intersymbol interference,” IEEE Trans. Information Theory, vol. 18, pp. 363–378, 1972.
[35] G. Ungerboeck, “Adaptive maximum-likelihood receiver for carrier-modulated data-
transmission systems,” IEEE Trans. Communications, vol. 22, pp. 624–636, 1974.
[36] S. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Pren-
tice Hall, 1993.
[37] H. V. Poor, An Introduction to Signal Detection and Estimation. Springer, 2005.
[38] H. L. V. Trees, Detection, Estimation, and Modulation Theory, Part I. Wiley, 2001.
[39] R. E. Blahut, Modem Theory: an Introduction to Telecommunications. Cambridge Univer-
sity Press, 2009.
[40] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley, 2006.
[41] C. E. Shannon, “A mathematical theory of communication,” Bell Systems Technical Journal,
vol. 27, pp. 379–423, 623–656, 1948.
[42] R. J. McEliece, The Theory of Information and Coding. Cambridge University Press, 2002.
464
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
[43] R. E. Blahut, Algebraic Codes for Data Transmission. Cambridge University Press, 2003.
[44] S. Lin and D. J. Costello, Error Control Coding. Prentice Hall, 2004.
[45] T. K. Moon, Error Correction Coding: Mathematical Methods and Algorithms. Wiley, 2005.
[46] A. Goldsmith, Wireless Communications. Cambridge University Press, 2005.
[47] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge University
Press, 2005.
[48] G. Foschini, “Layered space-time architecture for wireless communication in a fading e
nvironment when using multi-element antennas,” Bell-Labs Technical Journal, vol. 1, no. 2,
pp. 41–59, 1996.
[49] E. Telatar, “Capacity of multi-antenna Gaussian channels,” AT&T Bell Labs Internal Tech.
Memo # BL0112170-950615-07TM, June 1995.
[50] S. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE
Selected Areas in Communications, vol. 16, pp. 1451–1458, October 1998.
[51] A. Paulraj, R. Nabar, and D. Gore, Introduction to space-time wireless communications.
Cambridge University Press, 2003.
[52] H. Jafarkhani, Space-time coding: theory and practice. Cambridge University Press, 2003.
[53] H. Bolcskei, D. Gesbert, C. B. Papadias, and A. J. van der Veen, eds., Space-time wireless
systems: from array processing to MIMO communications. Cambridge University Press,
2006.
465
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
Index
AM, 96 CDF, 204
envelope detection, 100 joint, 208
modulation index, 97 cellular networks
amplitude modulation, 93 introduction, 21
conventional AM, 96 picocells, 457
DSB-SC, 93 central limit theorem, 273
QAM, 109 channel code
SNR, 278 bounded distance decoding, 378
SSB, 102 linear, 373
VSB, 107 minimum distance, 377
analog communication channel decoder
block diagram, 16 role of, 19
analog modulation channel encoder
legacy systems, 132 role of, 18
SNR, 278 coherence bandwidth, 56
angle modulation, 111 complementary CDF, 205
FM, 111 Complex baseband representation
PM, 111 frequency domain relationship, 69
SNR, 281 complex baseband representation, 64
antipodal signaling, 317 filtering, 72
AWGN channel role in transceiver implementation, 76
optimal reception, 308 wireless channel modeling, 77
complex envelope, 66
bandwidth, 61, 159 complex exponential, 31
fractional power containment, 159 complex numbers, 28
Bandwidth efficiency conditional error probabilities, 291
orthogonal modulation, 176 constellations, 156
bandwidth efficiency convolution, 39
linear modulation, 169 discrete time, 43
baseband channel, 62 correlator
baseband signal, 62 for optimal reception, 309
Bayes’ rule, 201 Covariance
belief propagation, 382 matrix, 229
check node update, 385 properties, 229
software lab, 394 covariance, 222
variable node update, 384 cumulative distribution function, see CDF
binary symmetric channel, 370
capacity, 372 dBm, 248
biorthogonal modulation, 176 decoding
bit interleaved coded modulation, 364 belief propagation, 382
bit flipping, 380
capacity bounded distance, 378
bandlimited AWGN channel, 368 delay spread, 56
binary symmetric channel, 372 delta function, 32
discrete-time AWGN channel, 368 demodulator
466
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
role of, 19 Hilbert transform, 105
density, 204 hypothesis testing, 290
conditional, 210
joint, 208, 209 I and Q channels
digital communication orthogonality of, 65
advantages, 20 impulse, 32
block diagram, 17 impulse response, 39
discrete memoryless channel, 366 indicator function, 33
dispersive channel inner product, 33
software model, 403 Internet
double sideband introduction, 21
see DSB, 93 Intersymbol interference, see ISI
downconversion, 65 ISI, 399
DSB, 93 eye diagram, 404
need for coherent demodulation, 95 vector model, 413
467
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
maximum likelihood (ML) orthogonal frequency division multiplexing, see
geometry of decision rule, 311 OFDM
millimeter wave, 458 Orthogonal modulation
MIMO, 399, 425 bandwidth efficiency, 176
Alamouti space-time code, 436 orthogonal modulation, 173
beamsteering, 427 BER, 334
distributed, 459 binary, 318
diversity, 433
linear array, 425 Parseval’s identity
OFDM, 432 Fourier series, 51
receive diversity, 434 Fourier transform, 55
rich scattering model, 433 passband channel, 62
SDMA, 430 passband filtering, 72
software lab, 454 passband signal, 62
spatial multiplexing, 437 complex envelope, 66
transmit diversity, 435 envelope, 66
minimum mean squared error, see MMSE phase, 66
Minimum Shift Keying (MSK) PDF, 204
preview, 185 joint, 208
MMSE, 409, 417 performance analysis
modulator M-ary orthogonal signaling, 330
role of, 18 16QAM, 328
Moore’s law, 22 ML reception, 313
MSK, see Minimum Shift Keying QPSK, 322
multi-rate systems, 45 scale-invariance, 317
multipath channel scaling arguments, 320
example, 41, 56 union bound, 325
multiple antenna communication, see MIMO phase locked loop, see PLL
multiple input multiple output, see MIMO phase modulation, see PM
PLL, 122
nearest neighbors approximation, 327 first order, 128
noise FM demodulation, 125
circularly symmetric, 277 frequency synthesis, 126
flicker, 275 linearized model, 127
mechanisms, 274 mathematical model, 126
shot, 275 nonlinear model, 129
thermal, 274 second order, 129
white PMF, 203
passband, 276 joint, 209
noise enhancement, 415 power, 35
Noise figure, 335 power efficiency, 317, 328
noise figure, 248 Power spectral density, 157
noncoherent reception, 174 power spectral density, 157, 241
Nyquist linear modulation, 191
criterion for ISI avoidance, 165 linearly modulated signal, 158
sampling theorem, 162 one-sided, 244
power-bandwidth tradeoff
OFDM, 418 preview, 170
cyclic prefix, 422, 424 probability
MIMO, 432 Bayes’ rule, 201
software lab, 452 conditional, 200
Offset QPSK, 185 conditional independence, 202
on-off keying, 317 event, 198
468
Visit : www.EasyEngineering.net
Visit : www.EasyEngineering.net
independence, 202 signal, 30
law of total probability, 200 DC value, 35
sample space, 197 energy, 34
union bound, 203 norm, 34
probability density function, see PDF power, 35
probability mass function, see PMF signal space, 299
PSD, see power spectral density signal-to-noise ratio, 255
sinc function, 33
Q function, 224 single parity check code, 374
asymptotic behavior, 228 single sideband, see SSB
bounds, 268, 272 SNR, 255
quadrature amplitude modulation, see QAM analog modulation, 278
angle modulation, 281
Raised cosine pulse, 183 maximization, 256
raised cosine pulse, 167 source encoder
random processes role of, 17
baseband, 245 Space-time communication, 399
filtered, 252 space-time communication, see MIMO
Gaussian, 246 square root Nyquist pulse, 172
linear operations on, 251 square root raised cosine, 172, 194
passband, 245, 275 SRRC, see square root raised cosine
power spectral density, 241 SSB, 102
stationary, 240 demodulation, 107
wide sense stationary (WSS), 240 superheterodyne receiver, 119
random variables, 203
affine transformation, 229 Tanner graph, 380
Bernoulli, 203 time average, 35
binomial, 207
covariance, 222 union bound, 325
expectation, 218 intelligent union bound, 326
exponential, 205 upconversion, 65
functions of, 214
Gaussian, 206, 220, 223 vestigial sideband modulation
i.i.d., 213 seeVSB, 107
independent, 213 Walsh-Hadamard codes, 176
joint Gaussianity, 230 WGN, see White Gaussian Noise
mean, 218 white Gaussian noise, 250
moments, 219 geometric interpretation, 305
multiple, 207 WiFi
Poisson, 207 introduction, 22
standard Gaussian, 223 wireless
uncorrelated, 222 introduction, 21
variance, 219 wireless channel
random vector, 207 modeling in complex baseband, 77
Gaussian, 230 ray tracing, 78
ray tracing, 78
Rayleigh fading zero-forcing, 414
preview, 352
Receiver sensitivity, 335
repetition code, 363, 374
sampling theorem, 162
Shannon
source-channel separation, 16
469
Visit : www.EasyEngineering.net