Signal Integrity-Final Thesis
Signal Integrity-Final Thesis
Signal Integrity-Final Thesis
Thesis by
Behnam Analui
c 2005
Behnam Analui
All Rights Reserved
Some materials were previously published in IEEE publications and the copyright is owned by IEEE.
iii
To
Acknowledgements
This is perhaps my most favorite part of this thesis. The acknowledgement section is
usually the last section that is written but is often the first section that is read! I enjoy
reading the acknowledgments section because it is a short documentary that takes you
behind the scenes. It is real. You feel the extreme joy and satisfaction of the author for his
achievement, flowing from his words. You witness his passion of sharing his feeling with
everyone. A passion that is making him scream loudly in the silence of the library where I
am reading his thesis: THANK YOU to all that made it happen! And the voice fades away
but the satisfaction of learning, the bliss of friendship, and the pleasure of accomplishment
all remain for him in his personal treasure.
Here are the people to whom I want to express my deepest gratitude for making it
happen and for what they have contributed to my treasure. I am extremely grateful for the
dedication of their big brains and even bigger hearts to my work and my life.
First and foremost, I am truly indebted to Prof. Ali Hajimiri. He granted me the
privilege to work in his research group at Caltech and guided me through all the ups and
downs of my Ph.D. with his special enthusiasm like that of a young coach who is full of
energy and knows the game very well. What he taught me goes well beyond his detailed
technical feedback about the research work in this dissertation. From him, I learned to be a
responsible engineer and a critical scientist as well as learned to write succinctly! I am
also grateful for all his advice and help in the transition process to my next career. In short,
I am proud to be his student and wish that my five years at Caltech are only the beginning
of a life-long relationship with him.
v
I was lucky to overlap three years with Jim Buckwalter in Ali’s group. Jim’s
excitement and energy for doing research was always a motivation for me. He was my
collaborator in the data-dependent jitter work. I thank him for the technical discussions
and his various contributions to that work. Dr. Alexander Rylyakov was my mentor during
my several visits to the IBM T. J. Watson Research Center. I thank him for his tireless
supervision in the eye-opening monitor work and contributions to all the stages of the
design, layout, and testing with his extreme patience and insightful comments. Finally, I
am very grateful to Prof. Hossein Hashemi for his own thrilling way of criticizing my
research work. We had several discussions in the office, at lunch, in the gym, at Echo
Mountain while hiking, and even at home when cooking! His solid reasoning always made
me think twice and, admittedly, he was often right!
I am grateful to Profs. David Rutledge and Shuki Bruck for their kind and helpful
advice during my research work at Caltech and while transitioning to my next career.
Dave is a true academician and has created a research lab truly devoted to the
advancement of microwave engineering. I benefited a lot from interacting with his group
and using his lab facilities. Prof. Bruck initiated a collaborative project with Caltech’s
High-Speed Integrated Circuit Group (Hajimiri’s group) that I enjoyed being a part of. I
also thank Prof. Bruck for his encouragement and for transferring his passion and positive
attitude to me in all of our conversations.
I acknowledge Profs. David Rutledge, Shuki Bruck, Sandy Weinreb, and Bob
McEliece for dedicating their time to be on my oral candidacy committee and for
providing their technical comments about the progress of my research work. I also
acknowledge Profs. David Rutledge, Shuki Bruck, Sandy Weinreb, and Yu-Chong Tai for
kindly accepting to serve on my Ph.D. defense committee and reading this thesis.
Many friends and colleagues have contributed to this work through their technical
feedback, reading my paper manuscripts, helping with layout and measurements, and
CAD technical support. I thank them all. Particularly, I am indebted to Abbas Komijani,
vi
Arun Natarajan, Prof. Donhee Ham, Dr. Ichiro Aoki, Prof. Hui Wu, Dr. Scott Kee, Sam
Mandegaran, Amir Faraji Dana, Dr. Masoud Sharif, Ali Vakili, Niklas Wadefalk, Ann
Shen, Dr. Saleem Mukhtar, Maryam Owrang, Dr. Lawrence Cheuang, Dr. Jose Tierno, Dr.
Thomas Zwick, Dr. Sergey Rylov, Dr. Mounir Meghelli, Dr. Daniel Friedman, Dr. Sudhir
Gowda, Dr. Michael Beakes, Dr. Jeremy Schaub, Dr. David Sanderson, Naveed
Near-Ansari, Dr. Jean-Olivier Plouchart, Dr. Noah Zamdmer, Dr. Yue Tan, and numerous
others.
My Caltech years were really fun! I am grateful for the luxury of interacting with
many unique individuals who made the happy story of my life at Caltech. I sincerely thank
them all. Particularly, I thank Donhee, my office-mate, who spent a lot of time showing
me around LA and teaching, with his gifted excitement, a lot about classical music. I hope
we get together once in a while to have “soltani.” I thank Sam, my other office-mate,
vii
whose quest for knowing more has strengthened my sense of curiosity. He has taught me
numerous scientific facts that I would otherwise pass by without even noticing. He was
also the first person to scientifically introduce me to the theory of evolution, one of the
most profound theories of all time, in my opinion. I thank Abbas, with whom I started the
five-year journey at Caltech, for his always unique perspective and insightful comments
that made me wonder “why didn’t I think as simply as that?” I also thank him for his
happy, energetic heart that often made him break the unwritten rules of being an adult and
reminded me of the first few pages of my all-time favorite book, “The Little Prince.” I
thank Ehsan, Arash Y., Matthieu, Jeremy, Fati, Nikoo, Maryam, Maziar-Lisa, Arun, Jim
Bucky, Amir S., Farshad, Baharak, Roberto, Xiang, Xiaofeng, Yujin, Taka, Dai, Chris,
Shervin, Alireza, Michella, Carol, Michelle, Heather Jackson, Linda, Veronica, Parandeh,
Jim Endrizzi, Tess, Dale, Prof. McGill, Tara, Erni, Chandler’s pizza staff, Ampex, Nodal,
and Mr. and Mrs. Bentley for being a part of my life at Caltech. Finally, I want to express
my deep gratitude to Lisa Cowan and Prof. Shuki Bruck. Conversations with them were
always exceptionally delightful and charming, and I am very thankful to them for that.
Everyone has his own list of folks whose influences on his life are hard to express in
words. It is also hard, if not impossible, to pay them back for what they contributed to his
life. However, they are usually the ones who don’t expect to be paid back. Here is my list:
Ali H., for being a great advisor and friend and for teaching me fairness and balance.
Mehrdad Sharif-Bakhtiar, for teaching me electronics and living by principles. Hossein,
for teaching me to ask questions, for teaching me to listen to my honest self more often,
for increasing my confidence, for introducing me to the fun side of sport, and for being a
true brother. Amir-Helia as a single entity, for bringing Maryam to my life and for being
the little prince and princess around whom I feel I am the adult kid I always want to be. In
their presence, I can sit for hours, stare at their smiles, and take pleasure. Daei Mehdi, for
his advice that typically covers all the aspects of all the issues with any probability larger
viii
than zero. Maman Maria, the coolest mother-in-law ever, for her regular calls from her
office to see how I am doing and for having a heart that enjoys every moment of her life;
after all she is my wife’s mom! Bita, Behrad, and Behdad, for their love and support while
being thousands of miles away. Maryam, my angel, my other half, my lovely wife, for
giving me courage, for being full of surprises, for making our life a wonderland, and for
tolerating me when I said: “Honey! I am a little busy this weekend” for many weekends. I
believe she shares my extreme thankfulness for all who have made our life a
dream-come-true. Finally, my mom and dad, for supporting me unconditionally. Since
high school, I have concentrated all my efforts on making them proud, and I believe this
has led me to all my successes. I humbly bow to them and dedicate this thesis to them as a
little sign of appreciation for all their sacrifices.
ix
Abstract
This work focuses on the basic signal integrity issues of high-speed wireline links. It
bridges the gap between optimum system design and circuit design for such links by: (1)
understanding the effects of the system parameters on the bit error rate (BER), (2)
introducing circuit architectures for the realization of systems that minimize the BER, and
(3) demonstrating integrated circuit prototypes that verify the solutions.
First, we develop a theory that analytically relates the data link BER to the system
characteristics, e.g., the channel response, the pre-amplifier bandwidth, and the transmitter
clock jitter. We generate the BER contours to find the optimum receiver bandwidth as well
as the optimum sampling point and its associated timing margin. We also develop the
theory of the data-dependent jitter (DDJ), which is a significant component of the timing
jitter in high-speed links. We provide an analytical distribution function for the DDJ of an
arbitrary linear time-invariant system and include the impact of the DDJ on the BER.
Table of Contents
Acknowledgements iv
Abstract ix
List of Figures xv
List of Tables xx
Chapter 1: Introduction 1
1.1 Information Technology: Desire for Higher Speed..................................................... 1
1.2 Scope of this Thesis .................................................................................................... 2
1.3 Why Silicon-Based Integrated Circuits? ..................................................................... 4
1.4 Challenges ................................................................................................................... 5
1.5 Contributions............................................................................................................... 6
1.5.1 Fundamental Issues: Signal Integrity................................................................. 6
1.5.2 High-Speed Integrated Circuit Topologies in Silicon........................................ 7
1.5.3 Novel Architectures: High-Speed Signal Processing to Maintain Signal Integ-
rity ............................................................................................................ 7
1.6 Thesis Organization..................................................................................................... 8
Bibliography 163
xv
List of Figures
Chapter 1: Introduction
Figure 1.1: Categories of wireline communication applications..............................3
Figure 6.8: Differential symmetric interwound inductors for one section of the
delay line...................................................................................137
Figure 6.9: Magnitude of the S-parameters of one MIM-based standalone delay
line ............................................................................................138
Figure 6.10: Collective group delays of 27 standalone MIM-based delay lines ....139
Figure 6.11: Collective group delays of 47 standalone VPP-based delay lines .....140
Figure 6.12: Normalized standard deviations for group delays of standalone delay
lines ...........................................................................................140
Figure 6.13: Distributions of normalized delay at 1GHz for both MIM and
VPP-based delay lines ..............................................................141
Figure 6.14: Die photo of 19-section VPP-based LC delay line ...........................141
Figure 6.15: Die photo of 24-section MIM-based LC delay line ...........................142
Figure 6.16: A three-input ECL OR gate ..............................................................142
Figure 6.17: Die microphotograph of the 1:2 demultiplexer with three 5-section dif-
ferential LC delay line. .............................................................143
Figure 6.18: The y2 output in the oscillator mode. ................................................143
Figure 6.19: Demultiplexer outputs out1 and out2 for 3 input sequences (a)1100
(b)10000000 (c) 1000000010001000. ......................................144
xx
List of Tables
Chapter
Introduction
1
1.1 Information Technology: Desire for Higher Speed
Integrated systems are among the key technologies that have revolutionized the
information era by enabling high-speed computation and communication technique as
well as high-speed access to stored information. The commodities benefiting from this
revolution, e.g., internet, personal computer, and cellular wireless phone, have become
commonplace. The evolution of such commodities has caused a dramatic growth in the
amount of information generated and in the number of end users who access that
information. A recent study estimates 0.5 million terabytes of original information was
distributed over the internet in 2002, double the amount in 1999 [1]. The number of
internet users worldwide has also increased by 146% from the year 2000 to the year 2005
[2]. The continuous growth of internet traffic necessitates an upgrade in the backbone
infrastructure and local area networks to support even higher data transfer rates and larger
numbers of end users. The Synchronous Optical NETwork (SONET) [3][4] and Ethernet
[5] are evolving in response to this demand for communication at higher speed.
In addition to end users, there are two more types of network nodes that access
information and will take advantage of higher data transfer rates: processor nodes and
storage nodes. Today's microprocessors run at about 100 times faster than 15 years ago [6]
due to device scaling and architecture design advances. As on-chip clock frequencies
increase, I/O bus bandwidth becomes the speed bottleneck in a multi-chip environment.
Developing high-speed chip-to-chip links allows increased processing power and faster
networking with other chipsets. The impact of higher-speed links becomes increasingly
significant in distributed computing networks and the so called “super computers” with
2
multiple processing nodes, where the system's performance relies on fast, error-free
communication between the processing nodes.
High-speed access to data storage devices for fast transfer of large volumes of
information is another emerging application for high-speed data communication. An
example is storage area networks (SAN) that, in contrast to a single large-capacity device,
consist of a scalable network of storage nodes. The projected amount of data that will be
stored using a SAN-based database is three million terabytes in 2005 [7]. Advancement of
technology in various areas will accelerate the generation of more original information.
For instance, sensor networks will be ubiquitous and will constantly sense and aggregate
data from various environments. This data needs to be stored in sizable databases.
Similarly, large databases will be necessary to store human genome sequences that occupy
about 0.75 gigabytes per human being. The continuous increase in the size of databases is
an additional incentive for developing high-speed mass-storage media networks.
backplane board or a motherboard. An example is the link between several line cards on
one backplane of a network router [8]. In a backplane link, the data rate is typically higher
and the transmission distance is longer compared to a chip-to-chip link. However, the
channel used is still electrical.
If the transmission distance is more than 100m and the data rate is above 10Gb/s,
electrical transmission lines are not deployed anymore mainly due to the significant loss
of the channel. In such cases, multi-mode fiber (MMF) or single-mode fiber (SMF) are
used. While the copper attenuation easily reaches 10dB/m at 10Gb/s [9], typical loss of
modern single-mode fiber remains below 0.5dB/Km in the 1200nm-1600nm wavelength
range [10].
Although several applications exist in various categories discussed above, the front
end architecture of the high-speed chip-to-chip, backplane or optical communication links
is similar, as will be discussed in Chapter 2. The focus of this dissertation is on the basic
challenges of high-speed link design resulting from channel impairments and hardware
4
restrictions and thus the contributions that can be applied to any of the above categories of
wireline communication links.
Besides cost advantage, integration has two other benefits. First, the parasitic impact
of package and board on the interface between functions that are now integrated on the
same die is eliminated. Therefore, the operation speed can be significantly higher as the
speed is now only limited to the on-chip device and interconnect parasitic components.
Second, the power consumption of the chip is lowered because the I/O drivers, required at
the interface between chips in a multichip environment, will be eliminated.
delivering more functionality for a lower price and are the perfect candidate for the
implementation of low-cost high-speed integrated circuits.
1.4 Challenges
The main challenge in designing high-speed links is to understand and combat channel
response restrictions. As the speed of link operation increases, the channel impacts that
were primarily neglected will have a significant effect on the link reliability. For instance,
frequency dependent loss and dispersion caused by channel degrade signal integrity and
introduce inter-symbol interference (ISI) and data-dependent jitter (DDJ) that increase
error probability. To optimize the link reliability, the ISI and jitter impact of the channel
response should be quantified. Furthermore, the error probability should be related to the
ISI and jitter, in addition to the channel noise.
If the channel response is known, Nyquist pulse shaping is used to eliminate the ISI
[15]-[17]. However, pulse shaping is typically not applied to high-speed communication
because the channel response is not necessarily known a priori. The alternative approach
that is feasible for high-speed links is adaptive equalization. Equalizer is a filter that is
designed to reshape the received pulse to minimize the overall effect of ISI and noise at
the sampling point. Adaptive equalizer automatically adjusts filter parameters to
accommodate unknown channel response and its variations over time. Adaptive
equalization algorithms based on fast fourier transform have been efficiently realized by
digital signal processors (DSP) at low frequency (multimega bits per second). However,
the prohibitively large power consumption of the DSP and analog-to-digital converters at
multigiga-bits-per-second speed makes such an approach impractical. Instead, realization
of adaptive equalizers at 10Gb/s and beyond requires analog high-speed adaptive
transversal filters and robust adaptation techniques with their associated circuitry.
The signal degradation induced by the channel is intensified by the nonidealities of the
receiver circuit. Integrated circuits eliminate the bandwidth limitations imposed by the
6
parasitic components of the packages and the wiring between the packages in a discrete
design. However, intrinsic device and metal interconnect parasitics on the chip still restrict
the maximum achievable bandwidth. Single-chip implementation of 10Gb/s systems
requires an understanding of the factors that limit the on-chip operation speed and
encourages development of circuit techniques and topologies for overcoming those
limitations.
1.5 Contributions
This thesis focuses on the analysis, design, and hardware implementation of
high-speed wireline integrated communication systems. The investigation of the
challenges described in Section 1.4 has led to original contributions in this thesis that can
be categorized into the following specific topics:
We will elaborate on these topics individually in the following and conclude the
chapter with the thesis organization.
signal. Understanding how data and clock jitter exacerbate bit error probability is
fundamental to the design of a high-speed link. In our work, we focus on DDJ that is
predominantly caused by system bandwidth limitation. The ISI resulting from the
bandwidth limitation shifts the threshold-crossing times and translates to jitter. We provide
a comprehensive analytical framework to model and predict DDJ caused by any linear
time-invariant (LTI) system [18]–[22]. Associating the LTI system response to the DDJ
provides insight for circuit and system designers for minimizing jitter and complements
conventional measurement-based methods. In addition, we can predict the DDJ
contribution of a system at any data rate from its step response. Experimental data verify
our model predictions for various systems with less than 7.5% error.
The first contribution to this topic is the development of a novel eye-opening monitor
that enables full integration of adaptive equalizers in the receiver high-speed front-end
[28][29]. An eye-opening monitor (EOM) is a block that evaluates the quality of the
8
received signal eye diagram and periodically reports a quantitative measure, which is
directly correlated to the signal quality. This output is used as a cost function for automatic
adjustment of the filter coefficients in an adaptive equalizer. Our proposed EOM can
effectively capture a two-dimensional image of the eye diagram shape at the output of the
equalizer. Its simple error detection mechanism allows implementation at very high speed.
The prototype implemented in 0.13µm CMOS was successfully tested up to 12.5Gb/s
[29]. It provides up to 68dB output error dynamic range that is sufficient for the
optimization algorithm of the equalizer coefficients.
The other thesis contribution in this area is a novel architecture for instantaneous
clockless demultiplexing [30]–[32]. Instantaneous data acquisition is required in
burst-mode communication systems, where the data stream arrives at the receiver in
asynchronous packets separated by unknown quiet intervals. Conventional narrowband
phase-locked loops require a long preamble with a large acquisition time and are therefore
not suitable. As an alternative to gated oscillators that require a full-rate clock for
operation, we have proposed a clockless finite state machine that recovers and
demultiplexes the received burst of data instantaneously. The architecture consists of a
combinational logic structure with immediate response and a bit-period-delayed feedback
loop. Therefore, every time a burst is received, the operation is initiated exactly in-phase
with the first bit and continues synchronous to the stream. We implemented a 1:2 clockless
demultiplexer based on this concept in a SiGe BiCMOS technology and verified its
operation at 7.5Gb/s.
reliability, i.e., probability of making an error. Each of the next four chapters, Chapters 3
to 6, cover one of the topics that was discussed in Section 1.5. The approach common to
all the chapters is to, first, discuss the prior art and introduce the novel concept of the
circuit topology or system architecture that is developed as part of the thesis. Then the
design steps are discussed. Finally, each chapter concludes by demonstrating a hardware
prototype and by providing experimental data from the measurements of the prototype
that verify the concept. Chapter 3 describes the contributions to the understanding of
data-dependent jitter from a circuit design perspective. Chapter 4 presents the
methodology for bandwidth enhancement of wideband amplifiers. The eye-opening
monitor technique is the subject of Chapter 5. Lastly, Chapter 6 discusses the development
of the instantaneous demultiplexing architecture. We conclude with a summary in Chapter
7 that covers the achievements of the thesis as well as suggestions for future research to
expand the results of the current work.
10
Chapter
Principles of
2 High-Speed
Communications
2.1 Trade-Offs in Link Design
The objective of the communication is to transfer information reliably from a
transmitter to a receiver [33]. The measure of reliability is the probability of making an
error in detecting the received information bit. A typical high-speed wireline link is
designed to transfer data at a specified rate, e.g., 10Gb/s, while maintaining a given level
of reliability, e.g., error probability of 10-12. The physical medium that acts as the channel
is selected based on the required bandwidth as well as the maximum amount of signal
attenuation that can be tolerated due to channel loss. Once the channel is selected, the first
step in link design is to derive the relationships between the error probability and the link
specifications, e.g., gain, bandwidth, sensitivity, jitter generation, and jitter tolerance.
Then, the parameters of the transmitter and the receiver blocks are determined from such
relationships as to minimize the error probability. Other parameters such as cost or power
dissipation that affect the practicality of the design are also considered at this stage.
In this chapter, we introduce the underlying concepts of a high-speed wireline link and
describe the first step mentioned above. We discuss the link design challenges caused by
the inter-symbol interference (ISI) and jitter. Then, we analyze the relationship between
the error probability, the ISI and jitter. We study the combined impact of the ISI and jitter
on the link reliability and demonstrate the trade-offs between the ISI, jitter, and system
parameters such as bandwidth. Finally, we provide a unified relationship that enables
minimization of the link error probability in the presence of both the ISI and jitter. The
assumptions and definitions of this chapter will be used throughout the dissertation.
11
The dominant modulation scheme that is used in high-speed links is two-level pulse-
amplitude modulation (2-PAM). In the 2-PAM, the binary information is encoded to two
signal levels. Typically, bipolar signals, i.e., symmetrical levels around a well defined zero
threshold, are selected. For instance, -1/2 represents binary “0” and +1/2 represents binary
“1”. This way, if the information source generates “0” and “1” with equal probability, the
data sequence will have a zero average. This is advantageous in reducing the wander jitter,
which is the long term deviation of data transition time from a reference time and happens
due to the drift or variation of the threshold.
The main reason for using 2-PAM in high-speed communication systems is its
simplicity. The 2-PAM signal can be generated by simply turning a transistor or a laser
source on and off. The detection mechanism is also simple, and it does not require accurate
power-level control. A main reason to use more complex modulation techniques such as
M-PAM or M-QAM is to achieve a higher data rate using the same channel bandwidth.
However, because the channel in wireline links has abundant bandwidth, the designers
tend to deploy simple 2-PAM modulation instead of a more complex technique and use a
faster device technology to achieve the high data rates. Nevertheless, more recently, as
data rates have hit the bandwidth limitations of copper transmission lines, it has become
reasonable to design circuits for implementation of more complex modulations such as the
12
4-PAM [34][35]. In such a case, although a more complicated slicer, clock, and data
recovery architecture is needed, the required bandwidth is only half of the bandwidth
required for the 2-PAM because the symbol rate is half of the data rate. In this chapter, we
assume that a 2-PAM modulation is used with amplitude 0 and 1 for information bits “0”
and “1,” respectively.
The signal shapes in high-speed links typically take a nonreturn-to-zero (NRZ) format,
which means that the pulses that represent each bit last for a full bit period, Tb. This is in
contrast to the return-to-zero (RZ) signaling, where the pulses last only for half of the bit
period. Figure 2.1 illustrates the NRZ and RZ representations of a “1011001” data
sequence with unipolar pulses. RZ is typically preferred in the long-haul optical
telecommunication networks, where the optical power is expensive, because the RZ
format has relaxed signal-to-noise requirements compared to the NRZ format [36]. In
other words, since the RZ pulses are on a shorter period of time, for a given average
transmitted power, they have higher peak power compared to the NRZ pulses. This results
in a lower bit-error probability because the optical receivers respond to the peak power.
1 0 1 1 0 0 1
RZ
NRZ
t
Tb
On the other hand, the RZ pulses have a larger transition density. They require a larger
channel and receiver bandwidth, twice as much as the NRZ. Therefore, the NRZ pulses
are the dominant format in high-speed wireline links due to their smaller bandwidth
requirements and simpler implementation. In this dissertation, we assume that all the data
sequences follow the NRZ format.
One potential problem with the NRZ format is the occurrence of long sequences of “0”
or “1,” also referred to as consecutive identical digits (CID) [37]. When a long sequence
of CID is transmitted, there is not any transition in the data for a long time. Consequently,
the receiver that extracts the timing information from the spacing between received data
transitions loses synchronization. To avoid the loss of synchronization, the data is encoded
before the transmission using run-length-limited code words. Such code words guarantee
a maximum number of CID bits. For instance the 8-bit/10-bit (8b/10b) encoding scheme
that was proposed by IBM [38] is widely used in several high-speed wireline applications,
such as Fibre Channel for storage networks and 10Gigabit Ethernet for local area
networks. The 8b/10b coding replaces a byte of information with 10 transmission bits. It
guarantees a maximum of five CID bits. In addition, it keeps dc balance of the signal by
allowing an equal number of “1s” and “0s” for transmission. A disadvantage of the 8b/10b
coding is the 25% increase in the data rate. Basically, to keep 10Gb/s data throughput, the
signaling speed must be increased to 12.5Gb/s because of the 25% data overhead added by
the encoding. This is undesirable in some applications such as SONET. Instead SONET
standard recommends using data scrambling (no overhead) or very low overhead
encoding. However, the link for SONET is required to tolerate up to 72 CID [36][39].
Coding techniques are also used for other purposes in data transmission such as error
correction [36] and spectral shaping [40][41]. In this dissertation, we assume that the data
sequence is a random binary sequence using 2-PAM NRZ signaling.
14
x(t) =
∑ a k ⋅ pi ( t – k T b ) ( a k ∈ {0, 1} ) (2.1)
k = –∞
⎧ 1 0 ≤ t ≤ Tb
pi ( t ) = ⎨ . (2.2)
⎩ 0 otherwise
The coefficient ak represents the kth transmitted bit which is “0” or “1” with known
statistics. Because the transmitted bits are each a random variable, x(t) is a stochastic
process. We can show x(t) is a cyclostationary process [17], which means that the mean
and autocorrelation function of the process are time dependent and periodic. The average
power spectral density, i.e., the Fourier transform of the time-averaged autocorrelation
function, demonstrates how the signal power is distributed in the frequency domain. It is
an indication of the required bandwidth for transmission of a 2-PAM NRZ signal. The
power spectral density can be calculated as
1 1 2
S ( f ) = --- δ ( f ) + --- T b ⋅ [ sinc ( f ⋅ T b ) ] (2.3)
4 4
where the sinc(x) function is defined as
sin ( πx )
sinc ( x ) = ------------------- . (2.4)
πx
The first term on the right-hand side of (2.3) is the dc power that is caused by using
unipolar signaling. The double-sided power spectrum is plotted in Figure 2.2. Due to the
zeros of the sinc function, the spectrum experiences frequency nulls at integer multiple
frequencies of the data rate, 1/Tb. This indicates that the synchronization mechanism in the
receiver should be a nonlinear process because the received signal itself does not have any
information at the clock frequency. Figure 2.3 shows the same power spectral density on a
15
sinc2(fTb)
δ(f)
Frequency[Hz]
Figure 2.3: Power spectrum of 2-PAM NRZ on logarithmic axes
log-log scale. The gain is normalized to one, and the clock frequency is assumed to be
10GHz for a data rate of 10Gb/s. Evidently, the spectrum covers a broad frequency range
from dc all the way to the clock frequency. All wireline communication systems require a
wide bandwidth channel and circuit blocks to allow transmission of broadband NRZ
signal with minimum distortion. Bandwidth restrictions of the channel and/or receiver
circuits are the primary cause of signal impairment in high-speed communication, which
limits the link reliability as we will discuss later.
16
Although the NRZ modulation scheme requires a broadband channel and receiver,
excessive bandwidth in the receiver can be harmful to the receiver sensitivity because
wider bandwidth results in a larger integrated noise power. Therefore, from a sensitivity
standpoint, an optimum bandwidth exists that maximizes the sensitivity by balancing
between performance degradation due to the inter-symbol interference (small bandwidth)
and noise (large bandwidth). The conventional rule of thumb is to choose the receiver
bandwidth equal to 70% of the data rate [36]. We will analytically demonstrate the validity
of this rule and its underlying assumptions in Section 2.3. The receiver typically consists
of several blocks such as the pre-amplifier, the main amplifier, and the equalizer. The
individual bandwidth of each of these blocks should be larger than 70% of the data rate
because when the blocks are cascaded the overall bandwidth is reduced. We will consider
this in Chapter 4 when proposing bandwidth enhancement techniques for wideband
amplifiers.
The eye diagram of a data sequence is a form of representation of the signal that
provides insight about the quality of the signal. As we will discuss in Section 2.3.2,
Section 2.3.3, and Section 2.5.1, a received signal has several characteristics, such as
amplitude noise, inter-symbol interference, or jitter, that affect the probability of
extracting correct information from it. The eye diagram of a signal contains information
about such characteristics of the signal. To generate the eye diagram, the signal is divided
into frames where each has a length of an integer multiple of the symbol period (bit
period, Tb, in the case of 2-PAM modulation). Then all the frames are overlapped to create
a single diagram with one frame length that contains several traces of the signal. An
example for a 2-PAM eye diagram with the length of 2.Tb is shown in Figure 2.4, which
looks like an eye, hence the name.
17
2Tb
Figure 2.4: Creation of the eye diagram with the length of 2.Tb from signal
The error probability or BER for a simplified link as in Figure 2.5 can be calculated
from the noise distribution at the sampling point. We have
BER = P ( 0 ) ⋅ P ( 1 0 ) + P ( 1 ) ⋅ P ( 0 1 ) . (2.5)
P(0) and P(1) are the probabilities that the transmitted bit is “0” and “1,” respectively. If
we assume “0” and “1” are equiprobable, P(0)=P(1)=0.5. P(0|1) is the probability of sam-
pling a “0” if the transmitted bit is “1.” This is equal to the area under the tail of the noise
distribution below the threshold level, as illustrated in Figure 2.6. Furthermore,
18
noise
r(t)
Transmitter Channel + r(kTb)
fs=1/Tb
r(t) r(kTb)
Figure 2.5: Bit error generation due to noise in a symbol detection-based receiver
P(1|0)=P(0|1) if the noise distribution at the zero-amplitude level and the one-amplitude
level are equal. Therefore, the BER is simplified to
BER = Q ⎛ ------⎞
1
(2.6)
⎝ 2σ⎠
for a Gaussian noise source with cumulative distribution function of Q(.) and standard
deviation of σ. Equation (2.6) is not a very accurate approximation of BER in a real
high-speed wireline link. Several other factors including inter-symbol interference (ISI),
data timing jitter, and sampling clock uncertainty will affect the BER. We will introduce
these issues in the following sections and study their effects on the link BER.
V
TH
0 V 1
TH
Figure 2.6: The BER calculation from the area under the tail of the noise distribution
19
In reality the noise is not the only impairment of the communication channels. The tail
of a received pulse shape associated with a symbol can last for longer than a symbol
period, Tb, and therefore it can interfere with its neighboring symbols. This effect is called
inter-symbol interference (ISI). The received signal for a 2-PAM NRZ, as x(t) in (2.1), can
be written as
r( t) =
∑ a k ⋅ po( t – k T b ) + n ( t ) ( a k ∈ {0, 1} ) (2.7)
k = –∞
where n(t) is additive noise and po(t) is the received pulse shape. The signal r(t) is sampled
at times t=Ts+mTb to regenerate the symbols, where m is an integer and 0<Ts<Tb.
r ( T s + mT b ) = a m p o ( T s ) + ∑ a k ⋅ po ( T s + ( m – k )T b ) + n ( T s + mT b ) . (2.8)
k = – ∞, k ≠ m
The second term on the right hand side is the ISI term that affects the decision of each
symbol. Nyquist proposed conditions on the overall response as well as pulse shapes that
completely null the ISI term [15]. Based on his works, the classical method to eliminate
ISI is to design transmit and receive filters such that the overall received pulse shape is a
Nyquist pulse, i.e., po(Ts+(m-k)Tb)=0 for m ≠ k , [16][17][40].
Figure 2.7: Loss contributions from conductor and dielectric in a FR4-based stripline [42]
Dispersion is another significant source of the ISI. Dispersion occurs when the phase
of the channel response transfer function is not a linear function of the frequency, and thus
the group delay, which is the derivative of the phase, will be frequency dependent.
Consequently, when a signal with broad spectrum travels in the channel, various
frequency components get delayed by different amounts. The overall effect is to broaden
the pulse shape of the signal in time domain and thus cause ISI. Both electrical and fiber
optic channels are dispersive.
Dispersion is a major source of ISI in optical fiber [43]. In multi-mode fibers (MMF),
modal dispersion is dominant. Modal dispersion is caused when various optical modes are
excited on the fiber and travel at different speeds. Modal dispersion becomes more
problematic at longer transmission distances because optical modes get separated more,
and thus received pulse causes severe ISI. An example is shown in Figure 2.8 for 800m of
MMF at 10Gb/s. In single-mode fibers (SMF), modal dispersion is absent and chromatic
dispersion and polarization-mode dispersion are dominant [43]. Chromatic dispersion is
mainly caused by the frequency-dependent refraction index of the fiber material. The
21
Optical Modes
Multi-Mode Fiber
Laser
Output
Figure 2.8: The output of an 800m MMF channel is severely distorted due to modal dispersion
fundamental optical mode that is excited on an SMF has nonzero spectral width and thus
will disperse because various spectral contents will experience a different index of
refraction. Polarization-mode dispersion is due to the group velocity difference of
orthogonal polarization modes in an SMF that does not have a perfectly cylindrical shape.
This is shown in Figure 2.9.
2.3.4 Equalization
Equalization refers to any technique used in link design to compensate for the
impairments induced by ISI. For example, the equalizer can be a filter at the receiver that
x
x
y
SMF
y
reshapes the channel's undesirable frequency response such that the final received pulses
are ISI free. An example is a pulse that satisfies the Nyquist criterion [15][17] and is
commonly called a Nyquist pulse shape. The Nyquist criterion was implicitly introduced
in Section 2.3.3 for the received pulse shape po(t). It can be stated as follows.
A received pulse shape po(t) satisfies the Nyquist criterion and is called a Nyquist
pulse shape if
⎧
p o ( t = kT b ) = ⎨ 1 k = 0 . (2.9)
⎩ 0 k≠0
If the equalizer filter is designed such that when cascaded with the channel the overall
response satisfies (2.9), the link will be ISI free according to (2.8) and for Ts=0.
Equalization can be applied either in the transmitter or the receiver. The transmitter
equalizer is sometimes referred to as the transmitter pre-emphasis. It amplifies the high
frequency content of the signal at the transmitter to compensate for the high frequency
attenuation of the channel after the signal travels through it. The advantage of the
transmitter pre-emphasis is that it does not amplify the receiver noise because the
compensation process takes place in the transmitter. Nevertheless, the transmitter
pre-emphasis can cause large crosstalk between neighboring transmission lines in a
parallel link due to the strength of the signal high frequency content. Receiver equalization
is intended basically to add a filter at the receiver to minimize the overall effect of ISI and
noise at the sampling point. It is typically preferred to the transmitter equalization because
the equalizer can be made adaptive to accommodate the unknown channel response and its
variations over time. In this work we focus on receiver equalization.
The straightforward equalization technique is to design the filter such that the overall
response to the cascade of the channel and the filter satisfies the Nyquist criterion for zero
ISI in (2.9). This technique is known as the zero-forcing (ZF) technique [44]. Essentially,
the ZF algorithm forces the filter transfer function to be equal to the inverse of the
channel's transfer function. Evidently, the ZF algorithm only accounts for the ISI
23
impairment and neglects the noise. In band-limited channels, the equalizer filter transfer
function with the ZF criterion becomes a highpass filter that amplifies the high frequency
content of the signal to compensate for the channel’s high frequency attenuation.
However, the filter will also amplify the noise and will potentially degrade the received
signal to noise ratio. An alternative approach to the ZF method is to use the mean square
error (MSE) algorithm to design the filter. The MSE algorithm considers the noise and ISI
together and avoids extensive noise amplification at the receiver by allowing occurrence
of partial ISI. The MSE criterion minimizes the BER rather than the ISI. The filter
parameters are designed to minimize the number of decision errors on the received
symbols.
Equalizer +
Pre-Amp Filter Pre-Amp +
-
Equalizer
Filter
(a) (b)
Figure 2.10: Equalizer filter in two topologies (a) FFE (b) DFE
24
that because it acts on the receiver decisions that are noise free, the DFE structure does not
cause noise enhancement. However, the DFE architecture can potentially cause error
propagation in cases where the BER is very large, when the assumption for correct
receiver decisions is violated. A combination of FFE and DFE can be implemented to
compensate linear and nonlinear ISI and to avoid significant noise enhancement.
T D = n LC (2.10)
where n is the number of LC sections. The large layout area of the inductor elements is a
disadvantage for this topology when implementing an integrated delay cell, especially
25
In TD TD TD
c0 c1 c2 cN
Σ
Out
Figure 2.11: FIR filter with tapped-delay line topology and N+1 taps
L
L L L
C/2 C/2
C/2 C C C/2
(a) (b)
Figure 2.12: The constant-k filter-based LC delay line: (a) pi-section (b) 3-section
when n is large. However, because the passive delay cell is a linear system, it is desirable
when a linear equalizer is needed to preserve the amplitude information of the signal as
the signal travels in the delay line. An FFE equalizer that compensates linear ISI is an
example of such a case.
shape of the MMF core, and the fiber length. Furthermore, factors such as changes in the
environmental condition or aging may result in a time-varying channel response. Adaptive
equalization, which was first proposed by Lucky [16][51] for communication systems,
remedies these issues by automatically adjusting filter coefficients and constantly tracking
any time variations in the channel response.
An effective adaptive equalization algorithm that has received more attention for
high-speed implementation recently is the least mean square (LMS) algorithm (e.g., [52]).
This is due to its ease of implementation. The LMS algorithm is an MSE-based algorithm,
in which the optimization criteria is defined in order to minimize the mean square of the
difference between the filter output and the receiver's decision. The equation for updating
the coefficient of tap m can be simplified to [17]
C m ( k ) = Cm ( k – 1 ) + ∆ ⋅ ε k – 1 ⋅ xm ( k – 1 ) (2.11)
where εk-1 is the difference between filter output and receiver decision and ∆ is a scaling
parameter that affects the convergence speed. The hardware structure of the adaptive
equalizer filter with the LMS algorithm can be implemented as illustrated in Figure 2.13.
The architecture only requires the implementation of high-speed summation and multipli-
cation to add the LMS algorithm to the transversal filter structure of Figure 2.11, which is
feasible in today's advanced integrated technologies. As can be seen from Figure 2.13, the
27
In TD TD
c0 c1 c2
Σ Σ Σ
Σ
∆
Out
εk -
+
Figure 2.13: The implementation of the LMS algorithm for adaptive equalization
LMS algorithm is a decision-based optimization algorithm, i.e., the parameter that is min-
imized depends on the decision of the receiver. In blind equalization algorithms that do not
use any training sequence, this can be a potential disadvantage for the LMS algorithm. In
high BER conditions many of the receiver decisions may be incorrect, which will result in
a slow convergence of the algorithm. In Chapter 5 we discuss one of the contributions of
this thesis, i.e., an alternative technique for adaptive equalization.
The calculations in Section 2.3.2 assume the ISI is zero at the sampling instance. In
practice, the noise margin at the sampling point is reduced because of the ISI, and thus the
probability of error is increased. Using (2.5) and (2.8) and assuming equiprobable data
bits, the condition for finding probability of error can be rewritten as
28
∞
⎛ ⎞
P e = P ⎜⎜
1---⎟
∑
⎝ k = – ∞, k ≠ m
a k ⋅ p o( T s + ( m – k )T b ) + n ( T s + m T b ) >
2⎟
⎠
(2.12)
and can be calculated from the joint probability distribution of the noise and ISI. However,
finding this joint probability distribution is very complicated for an arbitrary system [16].
Lucky et al. provide a solution for Pe in the form of a finite sum [16]. They assume that
only a finite number of ISI terms in (2.8) affect the joint probability distribution of the
noise and ISI. The suggested solution requires tedious numerical computation for finding
the Pe for any system and does not provide an insight to correlate the Pe to the system
parameters. An alternative approach for finding the impact of the ISI on Pe is to find an
accurate bound on the Pe in the presence of the ISI. For example, Saltzberg provided a
tight bound on Pe that depends only on the noise variance and the samples of the received
pulse shape [53]. Therefore, the complexity of the bound grows linearly with the number
of ISI terms. Excellent tutorials on several computationally efficient methods to calculate
Pe for the ISI channels can be found in [54]–[57].
If the link has a first-order system response with an associated time constant, τ, the
received pulse shape can be written as
29
⎧
⎪ 0 t≤0
⎪
⎪ t
– --
⎪ τ
po ( t ) = ⎨ 1–e 0 ≤ t ≤ Tb (2.13)
⎪
⎪ t
– --
⎪ ⎛ --- – 1⎞ ⋅ e
1 τ
Tb ≤ t
⎪ ⎝α ⎠
⎩
–Tb ⁄ τ
where we define α ≡ e that relates the system time constant and the bit period. The
ISI term for a0 can be calculated by replacing (2.13) in (2.8) for m=0 as
–1 T –1
-----s
Tb
∑ ∑
–k–1
ISI = a k ⋅ p o( T s – k T b ) = α ak ⋅ ( 1 – α ) ⋅ α (2.14)
k = –∞ k = –∞
where the sum goes only to k=-1 because we assume the system is causal. Ts is the sam-
pling time offset from t=0 and 0 ≤ T s ≤ T b . The sampled value of the current symbol, i.e.,
a0, can be calculated from the first term on the right in (2.8) as
T
-----s
Tb
p ( Ts ) = 1 – α . (2.15)
The optimum sampling point for the first-order system is at Ts=Tb because (2.15)
reaches its maximum at this sampling point. Equation (2.14) demonstrates that the
interference impact of the prior bits decreases exponentially. When the impact of only one
prior bit, i.e., a-1, is significant, ISI terms will be concentrated around two mean values,
ISI0 and ISI1. The two mean values can be calculated from the expected value of ISI in
(2.14) when it is conditioned on the value of a-1. We have
T
-----s + 1
1 Tb
ISI 0 = E { ISI a– 1 = 0 } = --- α (2.16)
2
T
-----s
α Tb
ISI 1 = E { ISI a –1 = 1 } = α ⎛⎝ 1 – ---⎞⎠ (2.17)
2
30
* =
p o(
p o(
VTH=0.5
ISI 1
ISI 0
Noise
T s)
T s)
+IS
+IS
ISI distribution I0
distribution Total distribution
I1
Figure 2.14: Total amplitude distribution at the sampling point when ISI impact of one bit is taken into
account
where E{.} is the expected value. The two mean ISI terms perturb the amplitude at the
sampling point. Because of the stochastic nature of the data, the ISI at any point can be
modeled by a random variable. Then, the ISI distribution can be represented by two prob-
ability mass functions, i.e., two delta functions at the values ISI0 and ISI1, with probability
weight p and (1-p), respectively, where p is the probability of a-1=0. The overall ampli-
tude distribution can be found by the convolution of the ISI distribution and the Gaussian
noise distribution as shown in Figure 2.14.
The optimum slicing threshold, VTH, can be calculated from the average of the four
possible mean signal levels in Figure 2.14, which simplifies to VTH=0.5 independent of Ts.
N0
If the receiver input noise is white with double-sided power spectral density ------ , the
2
amplitude noise variance at the sampling point is reshaped by the first-order system
transfer function. The total noise power can be calculated from
∞ N0
------ N
∫
2 2
σ = --------------------- df = -----0- . (2.18)
2 2 4τ
1+τ ω
–∞
which can be evaluated for different sampling time by using (2.15)–(2.18). Figure 2.15
compares the BER at various signal-to-noise ratios (SNR) in the zero-ISI case in equation
(2.6) with the BER in the ISI channels from equation (2.19), when the systems have the
same noise bandwidth, i.e., equal σ. The BER curves are plotted for various 3dB band-
width-to-bit rate ratios (BW/BR) for the ISI channel. The figure shows that the ISI
degrades the performance of the link at large SNR values when the ISI dominates over
noise. Also, as the bandwidth-to-bit rate ratio decreases, the BER degrades more.
0
BW/BR=0.5
BW/BR=0.75
Zero ISI
-5
log10[BER]
-8
-10
-10 -12
-14
21 22 23 24
-15
10 15 20
SNR [dB]
Figure 2.15: The BER vs. SNR for various normalized bandwidths compared to the zero-ISI BER of
equation (2.19), sampled at optimum point, i.e., Ts=Tb
32
-2
-4
-6
log10[BER]
-8
Ν0 [V2/Hz]
-10
5e-3
6e-3
-12 7e-3
8e-3
-14
0.2 0.4 0.6 0.8 1 1.2
f-3dB/Bit Rate
Figure 2.16: ISI and noise trade-off as normalized bandwidth variations justifies existence of a
minimum for BER
Figure 2.16 relates the BER to the system 3dB bandwidth, f-3dB, for various noise
power spectral density, when the signal is sampled at the optimum point, i.e., Ts=Tb.
Evidently, at very small f-3dB, the ISI is severe and limits the BER. However as bandwidth
gets excessively large, the noise power that is injected into the receiver is the dominant
contributor to the link-quality degradation and causes higher BER. Consequently, there is
a trade-off between the system noise and the ISI impact. There exists an optimum
bandwidth that minimizes BER. The optimum bandwidth in the case of the first-order
system is around 40% of the bit rate when Ts=Tb. In typical wireline link architectures the
sampling clock is in the middle of the eye at Tb/2. Although this results in simple
hardware implementations, Tb/2 is not necessarily the optimum sampling point. The BER
for when sampling occurs in the middle of the eye, i.e., Ts=Tb/2, is plotted in Figure 2.17.
The same trade-off exists between the noise and the ISI. However, the optimum
33
-2
Ν0 [V2/Hz]
3e-3
-4 4e-3
5e-3
6e-3
-6
log10[BER]
-8
-10
-12
0.3 0.5 0.7 0.9 1.1 1.3 1.5
f-3dB/Bit Rate
Figure 2.17: The optimum bandwidth for minimum BER when sampling point is in the middle of the
eye at Tb/2
bandwidth is now at around 70% of the bit rate which agrees with the well-known
optimum bandwidth-to-bit rate ratio for best sensitivity in broadband receivers [36].
We also notice by comparing Figure 2.16 and Figure 2.17 that for equal input noise,
the location of the sampling point affects the minimum achievable BER. In fact, the plot in
Figure 2.16 can be reproduced for all possible sampling points, and the optimum sampling
point can be determined from the plot that results in the smallest minimum-achievable
BER. The optimum receiver bandwidth is also determined from the same plot. We will
elaborate on this topic for the link design when we add the effect of jitter to the BER. We
will analytically derive the two-dimensional BER contours that allow the designer to
simultaneously determine the optimum bandwidth and the optimum sampling point to
minimize the BER.
34
2
ωn
H ( s ) = ------------------------------------------- (2.20)
2 2
s + 2ζω ns + ω n
where ζ is the damping factor and ωn is the natural frequency. The step response of an
under-damped system, i.e., when ζ<1 is
– ζω n t
sin ⎛⎝ ω n 1 – ζ t + cos ζ⎞⎠
1 2 –1
s ( t ) = 1 – ------------------ e (2.21)
2
1–ζ
and the system 3dB bandwidth is
ωn 2 2 2
f – 3dB = ------ 1 – 2ζ + 1 + ( 1 – 2ζ ) . (2.22)
2π
Figure 2.18 shows the pulse response of a second-order system for two different values of
ζ. For each value, the pulse response is plotted for four different f-3dB. We can carry out
the same procedure as in Section 2.3.5.1 to find the BER equation. All of the ISI terms that
have significant impact on the BER from (2.12) are included in the calculations. In addi-
tion, for every given pair of ζ and f-3dB, the BER can be calculated at several sampling
points in the unit interval. The trade-off between the noise and the ISI is also present in the
second-order system. This trade-off, and hence the existence of an optimum system band-
width to minimize the BER, can be seen by plotting the BER vs. f-3dB. In Figure 2.19, we
have plotted the BER contours that show the BER values vs. f-3dB at various sampling
points, for three different values of ζ. The cross section of the contours for a constant Ts
that are in parallel to the y-axis show the noise-ISI trade-off. The optimum f -3dB that
35
1.2
0.8
Amplitude [V]
f-3dB
0.6 0.3
(a) 0.5
0.4
0.75
1
0.2
-0.2
0 2 4 6 8 10
t/Tb [UI]
1.2
1
Amplitude [V]
0.8 f-3dB
0.3
0.6
0.5
(b) 0.75
0.4
1
0.2
-0.2
0 2 4 6 8 10
t/Tb [UI]
Figure 2.18: Pulse response of a second-order system at various normalized 3dB bandwidths: (a) ζ=0.5
(b)ζ= 2 ⁄ 2
results in the minimum BER occurs around 70% of the bit rate. Furthermore, the contours
can be used to select the sampling point that results in the minimum achievable BER.
Similar calculations can be performed for any linear time-invariant (LTI) system,
when the system pulse response is available. The BER relationship that includes the ISI
impact can be derived as in (2.19), which provides insight about the relationship of the
response of the system and the BER. One contribution of this thesis is to use a similar
36
1.2 1.2
-4
1.1 1.1 -14
-4
-12
-14
-12
-8
-8
1 1
-1
f-3dB/Bit Rate
f-3dB/Bit Rate
-1
0.9 -16
0.9
-12
-14
0.8 0.8
-4
-16
-4
-14
-1
-8
-14 0.7
-8
2
0.7 -12
-1
-1 -14
-1
-8 -12
6
0.6 -4
0.6 -8
-4
-1
0.5 0.5
-1
-4
0.4 0.4 -4
-1 -1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1.2
-8
-4
-12
-1
1.1 -16
-16
1
f-3dB/Bit Rate
0.9
-20
-8
0.8
-4
-12
-1
-1
0.7
6
-2
0
0.6 -8
-4 -1
2 -16
0.5
-1
0.4 -8
-4
-1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Ts/Tb [UI]
(c)
Figure 2.19: The contours of log10[BER] for N0=4e-3 v2/Hz: (a) ζ=0.4 (b) ζ=0.5 (c)ζ= 2⁄2
approach in Chapter 3 to find a general relationship between the system response and the
data-dependent jitter (DDJ) to minimize DDJ and improve link reliability.
implementation is more or less the same. Table 2.1 lists some of the wireline standards
developed for 10Gb/s communication with various applications. Differences such as
transmission distance or power consumption impact the design parameters such as
channel type, number of repeaters, gain budget, and jitter budget. Figure 2.20 illustrates
the general architecture of a wireline communication transceiver, also known as a serial
link, that can be applied to any of the standards in Table 2.1.
On the transmit side, low-speed data arrives at the multiplexer that serializes the
parallel data into a single high-speed serial data sequence synchronous to the transmitter
clock. The driver either directly sources the data to the electrical transmission line or
drives an optical modulator that modulates the data onto optical pulses, which then get
transmitted over the fiber optic channel. In both cases, several channel impairments
degrade the quality of the high-speed signal until it arrives at the receiver. The degraded
Tx PLL
signal is amplified with a low noise wideband pre-amplifier. Then, the equalizer filter
partially revives the signal by reducing the ISI and increasing the signal-to-noise ratio.
Next, a sampling clock is extracted from the signal by a clock recovery phase-locked loop
(PLL). The clock is used to sample the received signal to retime and recover the data. The
clock is also used in a demultiplexer to deserialize the single data sequence back to the
original data in parallel lines.
2.4.2 Channel
The channel can be electrical or optical. The simplest electrical channel is unshielded
twisted-pair copper wire such as the ones used in Category 5 (CAT5) that consists of 4
pairs of twisted pair and is commonly used in 10/100Mb/s Ethernet LAN. Chip-to-chip
and backplane communication at multi-Gb/s require channels with less loss at high
frequencies. They use coaxial cable or controlled impedance PCB microstrip transmission
line or stripline. However, the loss of such channels is not tolerable either, when the
transmission distance is above hundreds of meters. Multi-mode fiber (MMF) is deployed
for longer than 100m transmission. The dominant impairment of the MMF is modal
dispersion that is caused by the difference in the propagation velocity of the various
excited optical modes, as was discussed in Section 2.3.3. Since the MMF and the electrical
channels discussed above mainly induce linear distortion on the signal, they can be
modeled with a linear system. We will make this assumption throughout the dissertation
that the channel can be modeled with a linear time-invariant (LTI) system. Therefore, all
39
the analysis results from the thesis contributions can be generally applied to all of the
channels above.
2.4.3 Pre-Amplifier
The main function of the pre-amplifier is to amplify the received weak signal to the
sensitivity level of the next stage in the receiver. The stages following the pre-amplifier
often require fixed-minimum swing at their input, e.g., in emitter-coupled logic (ECL).
While the required output swing of the pre-amplifier is constant, the amplitude of the input
signal can take a wide range of values depending on the transmitted power and the channel
attenuation. Therefore, the pre-amplifier needs to have a wide dynamic range and high
gain. In addition, the pre-amplifier should be low noise to have minimum impact on the
signal-to-noise ratio. Figure 2.21 shows an example schematic of a pre-amplifier for an
optical link with a second main-amplifier stage. The main amplifier is in the form of a
limiting amplifier (LA) or automatic gain-controlled (AGC) amplifier for maintaining a
constant output amplitude. In this example, the pre-amplifier is a trans-impedance
Rf
Amp Amp
Figure 2.21: The front end of an optical communication receiver with the photo detector and a
shunt-feedback trans-impedance amplifier (TIA)
40
amplifier (TIA) with shunt-feedback configuration. The low input impedance of the TIA
absorbs most of the current generated by the photodetector. Also, it avoids bandwidth
limitation that can be caused by the large photodetector capacitance. Designing a TIA with
large gain and bandwidth and reasonable sensitivity is challenging particularly in CMOS
technologies due to their poor device parasitic components or low-current unity-gain
frequency, ft. Chapter 4 discusses this issue and provides a methodology for overcoming
these challenges.
As can be seen from Figure 2.3, the 2-PAM NRZ data sequence has zero energy at the
data rate frequency and its integer multiples. Therefore, the received signal does not
contain any direct component of the timing information from the transmitter clock. In
addition, the signal has travelled over a channel with an arbitrary length that causes an
unknown delay or phase for the signal at the receiver. In a symbol detection-based
scheme, a synchronous clock is required to sample each signal at an optimum sampling
point to recover the data. Therefore, in such systems, a synchronization technique or clock
recovery is needed.
41
Clock recovery methods for communication applications can be categorized into two
groups: Feedforward and feedback clock recovery [41]. Feedforward methods generally
comprise of a nonlinear element in front of the signal for generations of the spectral lines
at the clock frequency followed by a very high-quality bandpass filter to extract the clock.
The nonlinearity can take many forms, e.g., derivative [58][60] or square law [59]. It is
very costly to integrate a high-quality bandpass filter at 10Gb/s [60]. Therefore
feedforward techniques are rarely deployed in high-speed wireline communication.
threshold-crossing points. We will discuss the problems arising from the timing jitter and
its impact on link reliability next.
In a perfect transmission using 2-PAM NRZ, the data transitions, i.e, “01” or “10”
occurrences, cross the decision threshold at integer multiples of the bit period, Tb. Because
of several causes, e.g., random noise and ISI, the actual threshold-crossing times of data
transitions deviate from their ideal values, as shown in Figure 2.23. The timing jitter of the
data is deviations from a reference time at a defined threshold [61]. We will show in the
next section how the data timing jitter impacts the link reliability and increases the BER.
Because of the random nature of the sources of jitter, the timing jitter is modelled by a
random variable and is thus characterized by a distribution. The jitter distribution is then
used to find its impact on the BER. Figure 2.24 shows an accumulated eye diagram of a
data sequence measured with an oscilloscope, superimposed onto the jitter histogram. The
histogram is generated by capturing and accumulating the time of all of the threshold-
crossing events. This histogram approximates the jitter distribution.
Tb
VTH
Jitter [sec]
t
Figure 2.23: Jitter is deviation of the threshold-crossing time from a reference time
43
In calculations of Section 2.3.2 and Section 2.3.5, we had two implicit assumptions.
First, we assumed the sampling clock is ideal, i.e., all clock periods equal Tb and the clock
does not have any timing jitter that randomly moves the sampling point. Second, we
assumed that all the transmitted bits are sampled and bits are never lost or sampled twice.
Existence of data jitter violates these two assumptions and increases BER.
Figure 2.25 shows the clock and data recovery stage of the high-speed receiver front
end right before sampling. Data jitter at the input of clock and data recovery impacts the
BER in two ways. First, the data jitter decreases the horizontal eye diagram opening of the
signal, which means for a given BER, larger data jitter leaves a smaller sampling window
in the eye diagram that achieves that target BER. In other words, for a fixed sampling
point, the BER increases as the data jitter increases because of the error induced by the
jitter.
The BER from jitter can be calculated from the area under the tail of the jitter
distribution, as illustrated in Figure 2.26. This area corresponds to the data transitions that
happen on the wrong side of the sampling clock. Therefore, such an event is called bit
slipping. The errors due to jitter are independent of errors caused by amplitude noise or
44
ISI. If we assume an ideal sampling clock, zero amplitude noise, and zero ISI, we can
calculate the BER only caused by jitter as
∞ –T b + Ts
∫ fTJ ( t ) ⋅ dt ∫
1 1
BER ( T s ) = --- + --- f TJ ( t ) ⋅ dt (2.23)
2 2
Ts –∞
where fTJ(t) is the probability distribution function of the total jitter from all sources and Ts
is the location of the sampling clock in the eye. We assume the sampling point varies
within a unit interval (UI), i.e., 0 ≤ T s ≤ T b . The 1/2 factor represents the probability of a
transition event. If the sampling point is in the middle of the eye at Ts=Tb/2, (2.23) simpli-
fies to
BER =
∫ fTJ ( t ) ⋅ dt . (2.24)
Tb
-----
2
Although the probability of a data transition event on each side of the eye diagram in
Figure 2.26 is half, both sides of the eye can independently contribute to BER by adding
an error, as indicated by the area under the tail of both distributions.
Data
Recovery
Phase Loop
Detector filter
VCO
Figure 2.25: Impact of data jitter on BER from data path and clock path
45
Tb/2
0 0 1 1 1 1 0 1 1 1 0 0
fTJ(t)
Sampling
Clock
Figure 2.26: Impact of the data jitter on the BER by causing bit slipping
The second impact of jitter on the BER comes from the uncertainty of the sampling
clock. Data jitter acts as a reference noise for the clock recovery PLL, and therefore it
creates clock jitter at the output of the clock recovery. Because of the inherently slow
response of the clock recovery to the input fluctuations, the clock jitter at the time of
sampling is uncorrelated to the data that it samples. Therefore, the clock jitter degrades the
BER further. In the next section, we combine the results of this section with Section 2.3.5
to find the overall impact of impairments such as noise, ISI, and jitter on the link BER. In
the next chapter we break down the total jitter to its components and show an analytical
model for data-dependent jitter that is caused by ISI. We also highlight the impact of
data-dependent jitter on the BER.
When the sampling clock is ideal, i.e., all the clock cycles are exactly one bit period, a
detection error may occur due to the combination of noise and ISI. We calculated the BER
for such a case in (2.19). This BER is a function of the sampling time. The minimum BER
is achieved when sampling at the optimum point, which is not necessarily at Tb/2. If the
sampling point is moved from the optimum sampling point, the ISI contribution changes
from (2.16) and (2.17) and thus the BER increases. However, the general form of the BER
remains the same as in (2.19). We denote the BER that is caused from the amplitude noise
and the ISI by BERISI ( T s ) as it is a function of the sampling point, Ts. We can rewrite
(2.19) as
1
BERISI ( T s ) = --- [ BER ( ISI 0 ( T s ) ) + BER ( ISI 1 ( T s ) ) ] (2.25)
2
where we have defined
In the presence of the timing jitter, the relative location of the sampling point and the
threshold crossing of the data is changing randomly. In other words, if we assume that the
timing jitter, ∆t, is a random variable with zero mean, the sampling point for the bit that is
sampled right after a transition is Ts-∆t. Therefore, the BER for such a bit for a given ∆t is
BER ISI ( T s – ∆t ) . The overall BER for such bits is found by integrating the BER over all
possible values of ∆t weighted by the probability distribution function. For instance, for a
“01” transition, the overall BER from the combined effect of the noise, the ISI, and the
timing jitter is
∞ ∞
1 1
BER ( T s ″01″ ) = ---
2 ∫ f TJ ( t ) ⋅ BER ISI ( T s – t ) dt + ---
2 ∫ fTJ ( t ) ⋅ BERISI ( Tb – Ts + t ) dt . (2.28)
–∞ –∞
47
The two terms on the right in (2.28) correspond to when bit “1” and bit “0” in the “01” pair
are in error, respectively. Notice that when the ISI and noise are absent and only the timing
jitter is present we have
⎧
BER ISI ( t ) = ⎨ 0 t ≥ 0 . (2.29)
⎩ 1 t<0
Therefore, (2.28) simplifies to (2.23). Furthermore, if the timing jitter is absent, the jitter
distribution is a delta function concentrated at t=0, and if it is replaced in (2.28), the equa-
tion is simplified to BERISI ( T s ) .
We have calculated the approximate overall BER caused by the combined effects of
the ISI, noise, and jitter in Appendix A, equation (A.9), which is rewritten here as
BER ( T s )=
Ts
⎛ – ISI 0 ( T s ) 0.5 – ISI 1 ( T s – t ) ⎞ . (2.30)
1--- ⎜ ⎛ 0.5 Ts – Tb Ts Tb – T s
⎜ Q ⎝ ----------------------------------⎞⎠ + ∫ f t ( t ) ⋅ Q ⎛⎝ ---------------------------------------⎞⎠ dt⎟⎟ ⋅ ⎛⎝ 1 + Q ⎛⎝ -----------------⎞⎠ ⎞⎠ + Q ⎛⎝ -----⎞⎠ + Q ⎛⎝ -----------------⎞⎠
4 σn σn σj σj σj
⎝ –∞ ⎠
Equation (2.30) provides the overall BER as a function of the sampling time and the sys-
tem response, i.e., ISI0 and ISI1, for the given noise and timing jitter standard deviations.
In reality the sampling clock has some uncertainty or jitter associated with it that can
be modeled by a probability distribution function, pdfclk(Ts), where Ts is now a random
variable. Then, the total BER can be found by using the continuous total probability
theorem [62] as
Tb
BER =
∫ p dfclk ( Ts ) ⋅ BER ( Ts ) dTs (2.31)
0
that simplifies to (2.30) if the clock is ideal with a delta probability distribution function.
We have neglected the contributions of the tails of the clock and have bounded the clock
48
distribution to one bit period. Although (2.31) does not have a closed form for an arbitrary
clock distribution, we can use it to numerically compute or simulate the BER and compare
the impacts of jitter and ISI. In the rest of this chapter we assume that the sampling clock
is ideal and pdfclk(Ts) is a Dirac delta distribution function.
If the BER caused by the timing jitter in equation (2.23) is plotted vs. the sampling
point in a unit interval, i.e., when 0 ≤ T s ≤ T b , a curve is achieved that resembles the
shape of a bathtub and is thus called a bathtub curve. It graphically demonstrates that as
the sampling point approaches the edges of the data eye diagram, the BER significantly
increases. An example bathtub curve is shown in Figure 2.27, when the total jitter
distribution is Gaussian with zero mean and standard deviation, σj=0.05 UI. A unit
interval (UI) is a unit of time that equals the time normalized to a bit period. The bathtub
curve is a useful tool for characterization of high-speed links. It is used to define an eye
diagram opening for a given BER. For instance, in Figure 2.27, the eye diagram opening
at the BER=10-12 is about 0.3 UI. The eye diagram opening corresponds to the available
timing margin for the location of the sampling clock in the eye diagram that can achieve
0
10
-2
10
Log[BER]
-4
10
-6
10
-8
10
-10
10
-12
10
0 0.2 0.4 0.6 0.8 1
Ts [UI]
Figure 2.27: Bathtub curve for σj=0.05 UI
49
the target BER. Therefore, the bathtub curve can be used as a measure for the trade-off
between the link data jitter budget, σj, and the clock jitter budget, the eye opening.
We can generalize the concept of the bathtub curve to a data link with noise and ISI. If
we use (2.30) to calculate the BER, we can plot a three-dimensional bathtub curve as a
function of the sampling time and the system bandwidth that represents the ISI in the case
of a first-order system. Consequently, we obtain an insight about the trade-offs between
the data link’s jitter and ISI budget and the sampling clock timing margin. Such trade-offs
are important in determining the specifications of the pre-amplifier response and the clock
and data recovery characteristics for achieving minimum BER.
Figure 2.28(a) shows the 3D bathtub curve when the link is modeled with a first-order
system. The BER is calculated for various sampling points and normalized 3dB
bandwidths, when σj=0.05UI and N0=4e-3V/Hz2. If N0=0, the cross section of the plot,
when bandwidth approaches infinity, becomes the conventional bathtub curve.
Figure 2.28(b) shows the contours of the BER as a function of the sampling point and the
bandwidth, which is equivalent to the top view of the 3D bathtub curve. The contours -8
1.4
-6
-8
-4
-2
-4
-6
-2
f-3dB/(Bit Rate)
1.2
-10
1
-10
log10[BER]
-8
-6
-8
-4
-2
-4
-6
-2
-12
0.8
-10
ate)
0.6
tR
-10
-8
-8
-6
-4
-6
-4
-2
i
-2
dB /(B
0.4
0.1 0.3 0.5 0.7 0.9
Ts [UI]
f-3
Ts [UI]
(a) (b)
Figure 2.28: (a) Three dimensional bathtub curve for a first-order system for various normalized
bandwidths; σj=0.05UI and N0=4e-3v2/Hz (b) Contours of BER from top view of plot
(a)
50
-1
-2
-4
1.4 1.4
-2
-1
-3
-2
-8
-6
-5
-3
-4
-8
-6
-2
-6
-5
-4
-4
-6
1.2 -10 1.2
f-3dB/(Bit Rate)
f-3dB/(Bit Rate)
-7
-10-8
1 1
-2
-1
-7
-12
-4
-10
-2
-1
-3
-2
-6
-6
-5
-3
-8
-8
-4
-2
-6
-5
-6
-4
-4
0.8 0.8
-9
-14
-12 -14
-12
-7
-8
0.6 0.6
-10-8
-1
-2
-7
-2
-10
-1
-4
-3
-16
-2
-8
-6
-6
-5
-8
-3
-2
-4
-6
-5
-6
-4
-4
0.4 0.4
0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9
Ts [UI] Ts [UI]
(a) (b)
Figure 2.29: BER contours (a) σj=0.025UI and N0=4e-3V /Hz (b)σj=0.05UI and N0=5e-3V2/Hz 2
show that the BER is independent of bandwidth as the sampling point approaches the data
edges because the timing jitter dominates the BER. Moreover, the optimum bandwidth
that minimizes BER is about 75% of the bit rate. At this bandwidth, the optimum sampling
point is neither in the center of the eye nor at Ts=Tb as we saw in a first-order system.
Finally, we can see that the timing margin for the sampling clock reduces drastically at
smaller system bandwidths.
We can use (2.30) to find the contours of the BER for any given σj and N0. The noise
standard deviation, σn, is a function of the bandwidth and N0. Figure 2.29 shows the BER
contours for two more cases. In Figure 2.29(a), the BER is plotted when the σj=0.025 UI,
half the value in Figure 2.28(b)’s plot. All the other parameters are the same. The jitter
reduction lowers the minimum achievable BER. It also moves the optimum sampling
point to the left, closer to Ts=Tb, which is the optimum sampling point for first-order
response in the absence of jitter. The optimum bandwidth is shifted to lower values, as the
sampling point is closer to Ts=Tb. This is because the ISI terms from (2.16) and (2.17)
decrease, and amplitude noise dominates BERISI. Therefore, a smaller system bandwidth
that filters more of the noise power can achieve a lower BER. Figure 2.29(b) shows the
51
BER contours when N0=5e-3V/Hz2 and all the other parameters are the same as
Figure 2.28(b). The increase in amplitude noise degrades the overall BER by 3-4 orders of
magnitude. The optimum system bandwidth-to-bit rate ratio is smaller compared to
Figure 2.28(b), as we discussed above. The sampling timing margin for achieving BER of
10-12 has significantly decreased.
The y-axis in Figure 2.29 is the normalized bandwidth only for a first-order system.
For a general LTI system the y-axis is related to the size of the pulse response at each
subsequent sampling point, which is in turn associated with the received pulse response as
a result of the combination of the channel response and pre-amplifier transfer function.
Therefore, the designer can use the BER contours to determine the optimum front end
time response shape for achieving a target BER. In addition, the optimum sampling point
and its associated timing margin can be obtained from the BER contours with a target
BER and is used to design the parameters for the clock recovery circuit.
In the next chapter, we introduce the data-dependent jitter (DDJ) phenomenon that is
the impact of the ISI on the threshold-crossing time of the data. The DDJ modifies the
jitter distribution by effectively increasing the jitter variance. In this case, (2.30) is not
sufficient for finding the total BER. We complete the equation for computing the BER by
including the impact of DDJ in our calculations.
2.7 Summary
In this chapter we introduced the principles of wireline communication systems. We
discussed the system level challenges for designing a reliable high-speed communication
link. We studied the impacts of the noise, the ISI, and the timing jitter on the link bit-error
probability and provided the relationships between the system parameters, e.g., bandwidth
and sampling point, and the BER. These relationships are the first step in designing the
system for minimizing the BER. Finally, we combined the effects of the ISI, noise, and
jitter and calculated the overall BER when all of these impairments are present. We used
52
the result to demonstrate some of the existing trade-offs between system parameters. We
showed how the analytical formulation for the BER can be used to find the system level
specifications for the blocks of the receiver, such as the pre-amplifier and the clock
recovery. In the next chapter we analyze data-dependent jitter (DDJ) and provide an
analytical probability distribution function for it that modifies the timing jitter distribution
we discussed in this chapter. We add the DDJ component to the BER and demonstrate its
remarkable impact on the performance of high-speed serial links.
53
Chapter
Data-Dependent Jitter
3 in Wireline
Communications
3.1 Introduction
As we discussed in Chapter 2, the reliability of high-speed serial communication links
depends upon timing jitter. The timing jitter of data transition is deviations of the
threshold-crossing time, i.e., time at which data crosses a decision threshold, compared to
a reference clock. The transmitter, the channel, and the receiver contribute to the timing
jitter of the data sequence. In addition, at least a part of the timing jitter of the data is
inherited as phase uncertainty of the recovered sampling clock in the clock recovery
system. The bit error rate (BER) of the regenerated data sequence in the receiver is
degraded by the timing jitter of the data and sampling clock. Nonidealities such as
bandwidth limitation and medium dispersion exacerbate jitter effects.
Data timing jitter is separated into two main categories, namely, random jitter (RJ) and
deterministic jitter (DJ) [61]. RJ is random variations of threshold-crossing time due to
amplitude fluctuations around the crossing time or phase noise of the transmitter clock
[63]. DJ is further categorized into data-dependent jitter (DDJ), duty cycle distortion jitter,
and bounded uncorrelated to data jitter (e.g., crosstalk jitter or sinusoidal jitter) [61]. DDJ
is threshold-crossing time deviations correlated to the previous bits on the current data bit.
It is also known as pattern jitter. DDJ is often caused by bandwidth limitations of the
system or electromagnetic reflections of the signal. Therefore, DDJ has a larger impact on
high-speed transmission systems with restricted bandwidth. In this chapter, we propose
methods for characterizing DDJ theoretically based on system parameters and study its
impact on BER.
54
The impact of timing jitter on the performance of different communication links has
been studied extensively [59],[64]–[69]. However, these works have focused on the effect
of digital pattern on the output jitter of the extracted clock. They have neglected the
limitations of all other blocks in the communication link. For instance, Byrne et al. have
investigated the accumulation effect of timing jitter in a series of regenerators with special
attention to the effect of pattern jitter [65]. However, the analysis is limited to a simple
second-order tank as the timing extraction block. Saltzberg has estimated the aggregate
effect of RJ and DDJ using Taylor series expansion and has calculated the jitter of the
extracted sampling clock [66]. Similarly, Gardner has compared the effect of pattern jitter
on different clock recovery schemes [67]. He has presented a relation between DDJ and
the sampling clock phase variation with qualitative explanations. Huang has proposed
pulse shapes that result in DDJ-free data streams [68]. But, he has emphasized the
peak-to-peak data-dependent jitter and has calculated it from the two data sequences that
result in the maximum shift of the threshold-crossing time. He has assumed a given form
for the received data stream, namely an ideal non-causal Nyquist pulse. All these works
condition the system that generates DDJ to several assumptions. A model for the DDJ
generated from a general LTI system is still lacking.
In a different context, jitter modelling techniques are developed for separating and
measuring jitter performance of devices in communication links [61],[70]–[72]. Reliable
jitter measurement methods are more important in high-speed devices, where bandwidth
limitations aggravate DDJ. Therefore, predicting DDJ contribution is essential to accurate
measurement systems. For instance, Shimanouchi has related the bandwidth of an
automatic test equipment (ATE) system and the DDJ [70]. However his analysis was
based on the previous data transition only. In addition, he limits the model to first-order
system response.
Although the significance of DDJ has been realized in the aforementioned literature,
theoretical analysis of DDJ and study of its relation to system parameters such as
55
bandwidth has been neglected. The main contribution of this chapter is to propose a
method for predicting data-dependent jitter for a general LTI system in a context suitable
for circuits and system designers. The dependence of DDJ on system parameters provides
additional insights for minimizing jitter and highlights that increasing the bandwidth does
not necessarily minimize DDJ. In addition, the method reduces the simulation or
measurement time remarkably by relating DDJ characterization linearly to the number of
prior bits considered. The conventional computation grows exponentially with the number
of bits because it requires passing all possible sequences through the system. The
theoretical results are matched with jitter histogram measurements.
In the rest of this chapter, we first define data-dependent jitter formally. Then we
derive an analytical expression for DDJ of first-order LTI systems. The expressions are
associated to conventional approximations of the distribution of data-dependent jitter, and
the results are experimentally verified. Next, we generalize the analysis for any LTI
system with known step response. A perturbation method is introduced that approximates
DDJ by separating the jitter contributions of the previous bits. We compare the measured
deterministic jitter of real communication media with analytical expressions that we
derive for DDJ and demonstrate that the presented analytical results estimate DDJ
accurately and are reliable for predicting jitter. Finally, we update the BER calculation
from the previous chapter by accounting for the correlation between DDJ and ISI.
3.2 Framework
3.2.1 Data Jitter
multiples of symbol period. However, it deviates from the ideal value due to several
factors in the link (e.g., noise, limited channel bandwidth, or limited receiver front-end
bandwidth). Consequently, the knowledge of the effect of the system on data
threshold-crossing times and the sampling clock timing is essential for optimizing BER.
∆t tj = ∆t rj + ∆t dj . (3.1)
Hence, the total jitter probability distribution function (PDF) is the convolution of the PDF
of RJ and DJ,
f tj ( ∆t ) = f rj ( ∆t ) ⊗ f dj ( ∆t ) (3.2)
DJ
DJ DJ
5 ps 200 mV
⊗ =
Data-dependent jitter (DDJ) is the deviation of each data threshold-crossing time from
a reference time due to the residual signal of the previous data bits delayed due to the
memory of the system. Limited bandwidth of the transmission medium (e.g., PCB traces),
receiver front-end (e.g., TIA), or electromagnetic reflections cause prior symbols to
interfere with the current transition. While the effect of inter-symbol interference (ISI) on
the amplitude of the received symbols has been studied (e.g., [15][17]), its effect on the
timing needs further analysis. The effect of ISI on timing is to change the
threshold-crossing time of a data transition and cause DDJ, as shown by an example in
Figure 3.2. Here, depending on the value of the bit prior to the “01” transition, the
transition can occur earlier or later at the output of the system, as shown on the right.
To analyze DDJ, the data link with ISI is modeled as an LTI system. A sequence of
random 2-PAM NRZ data is passed through the LTI system. The last two bits of the
sequence are either “01” or “10” to model a rising edge transition or falling edge
transition, respectively. The variation of the crossing time of the transition can be related
to the data statistics to calculate DDJ. The process is illustrated in Figure 3.3, for a “01”
58
0 0 0 1
Link
with ISI
0 1 0 1 t1 t2
∆t histogram
Figure 3.3: Response of a general LTI system to a random bit sequence and generation of DDJ
transition. For symmetric input rising and falling transitions and a threshold of half-signal
swing, the jitter distributions for rising and falling transitions are identical and calculation
of one is sufficient.
–2
⎧ 1 0 ≤ t ≤ Tb
pi ( t ) = ⎨ (3.4)
⎩ 0 otherwise
where u(t) is the unit step function and models the rising edge, pi(t) is the unit pulse signal,
as described in (3.4) with duration of bit period, Tb, and the aks are the random bits that are
either “1” or “0” with a given probability. The sum in (3.3) starts from k=-2, i.e., a-1=0, to
59
guarantee a rising edge at t=0. One can write a similar equation for a falling edge in which
case a-1=1, a0=0, and the rest of the equation is the same as (3.3). Because the system is
linear, we can use superposition theorem to find the output as
–2
y( t) = s(t) + ∑ a k ⋅ po ( t – k T b ) , (3.5)
k = –∞
where s(t) and po(t) are, respectively, the system step response and unit pulse response.
The solution to
y ( t c ) = v th = 0.5 (3.6)
for tc determines the time of the threshold-crossing event as a function of data statistics
and system parameters. We compare tc to the time of the threshold-crossing event when all
the aks are zero, and we denote it by t0. We can calculate t0 by solving
s ( t 0 ) = V TH = 0.5 .
Then, DDJ is defined as
∆t ≡ t 0 – t c . (3.7)
We will solve (3.6) for the first-order system as an example in Section 3.3 and analyze the
general LTI system in Section 3.4.
In this section we analyze the DDJ of a first-order system, as described by the transfer
function
1
H ( s ) = -------------- . (3.8)
1 + τs
60
Here, τ is the system time constant and the associated 3dB bandwidth is 1/(2πτ). From
(3.6) and (3.7), we can find a closed-form solution for the DDJ random variable of a
first-order system as
–2
⎛ – k⎞
⎜ ⎛
∆t = – τ ⋅ ln ⎜ 1 – ⎝
– α-⎞
1-----------
α ⎠ k = –∞
a k ⋅ α ∑⎟
⎟ (3.9)
⎝ ⎠
–Tb ⁄ τ
where we define α ≡ e similar to Chapter 2. In a system with a large bandwidth
compared to the input data rate, α approaches zero. On the other hand, if the bandwidth is
small the data transitions take longer. The upper limit on α for this calculation is set if we
assume the rising transition crosses the threshold within a bit period. This bounds α to
values smaller than 0.5. At α=0.5 the bandwidth is only 11% of the bit rate.
Equation (3.9) relates the impact of each prior bit and the threshold-crossing time
deviation. For any data transition the prior bits are random sequences that overall result in
an ensemble of ∆t values. As α ≤ 0.5 , the more recent bits have a dominant effect on jitter
and a-2 has the largest impact. Also, the residual effect of the bits vanishes exponentially
for a larger system bandwidth to bit rate ratio, i.e., when α approaches zero. Figure 3.4
captures these effects by plotting ∆t in unit intervals (UI) for different values of α. For
each α, all the possible values of ∆t are plotted. We include the impact of four prior bits
and neglect the effect of more distant bits. A larger α corresponds to smaller
∆tpp
0.25
DDJ PDF
16-impulse
0.2
2-impulse
8-impulse
4-impulse
∆t [UI]
0.15
0.1
0.05
0 0 0.0 0.1
0 .01 0.2 0.3
65
α
Figure 3.4: Ensemble of normalized DDJ values for different ratios of bandwidth to bit rate along with the
appropriate model to use for data-dependent jitter PDF
61
For 0.01 ≤ α ≤ 0.065 ∆t is concentrated around two values. In this range of system
bandwidth, the DDJ distribution can be modeled with two impulses that carry the
probability weight expressed in section Figure 3.2 However, for larger α the distribution
should be extended to four or more impulses, as can be seen from Figure 3.4. In a
first-order system, the concentration of data jitter around two values corresponds to
bandwidth range, where only the penultimate bit, a-2, has a remarkable effect on jitter.
Since a-2 is “1” or “0,” the data jitter is divided into two mean probability masses,
modeled by the two impulse functions. This is exactly the same as the conventional model
for DDJ distribution based on the double Dirac delta function. We can also observe the
behavior of DDJ similar to predictions of Figure 3.4. The threshold-crossing time and
related histogram in the output of a first-order system is shown in Figure 3.5 for two
different α values and demonstrates bifurcation of DDJ distribution from two delta
functions to four delta functions as α increases.
(a) (b)
Figure 3.5: Threshold-crossing histogram and DDJ distribution: (a) α=0.1 (b) α=0.3
62
∆t pp = – τ ⋅ ln ( 1 – α ) (3.10)
which is overlaid with a dashed line on the plot in Figure 3.4. Since the latest crossing
time is referenced, the plot shows that ∆tpp sets an upper bound on ∆t.
In modern serial communication links, measured total jitter distributions resemble the
jitter histogram in Figure 3.1(b). In such systems, a useful measure of data-dependent
jitter is the distance between the two impulse functions in Figure 3.1(a) or the separation
between the means of the two Gaussian distributions. According to discussions in
Section 3.3.1, the two impulse distribution results when the impact of only one prior bit,
a-2, on jitter is included. Therefore we define the separation of the impulses as follows and
call it the scale-one data-dependent jitter, DDJ1, because only the impact of one prior bit is
included.
DDJ 1 = E { ∆t a –2 = 0 } – E { ∆t a – 2 = 1 } (3.11)
where E{.} is the expected value of ∆t conditioned on a-2. For equal probabilities of “1”
and “0” we can show
63
α
Figure 3.6: Comparison of the measurement results for DDJ1 and the analytical expression in (3.12) for a
first-order system
τ 1+α -
DDJ 1 = --- ln ------------------------ . (3.12)
2 2
1–α+α
We verified the expression in (3.12) experimentally by testing an RC filter that serves as
7
the first-order system. A 2 – 1 pseudo-random bit sequence was applied to the filter and
the jitter histogram was measured using Agilent’s 86100 communication analyzer. The
input bit rate was scanned over a wide range of observable DDJ1 values. The separation of
the jitter mean of the two Gaussians in the histogram was measured. Figure 3.6 demon-
strates the excellent agreement between (3.12) and the measurement results. For α<0.02
random jitter dominated DDJ1.
For a general LTI system, equation (3.6) may not be solvable analytically. We propose
a technique that approximates DDJ for a general LTI system based only on its step
64
∆tk
po(t0-kTb) vth
t0
Figure 3.7: Deviation of the threshold-crossing time due to the effect of the kth bit
Data-dependent jitter occurs because the tails of prior bits perturb the time that the
data transition crosses the threshold level. In the absence of any prior bit,
threshold-crossing time is t0 as discussed in Section 3.3.1. However, if ak is “1” the kth
prior bit changes s(t) by po(t0-kTb), in (3.5). This amplitude perturbation shifts the
threshold-crossing time from t0 and causes jitter. Assuming p o ( t 0 – kT b ) « s ( t 0 ) , the shift
in threshold-crossing time from the contribution of the kth bit can be calculated from the
slope of s(t) at t0 and the shift in the amplitude of s(t). This process is shown graphically in
Figure 3.7. The threshold-crossing time shift due to the kth bit is denoted by ∆tk. We have
po ( t0 –k Tb )
∆t k ≅ – ---------------------------- (3.13)
ds ( t )-
-----------
dt t = t
0
–2 –2
∆t ≡ ∑ ds ( t
–1
a k ∆t k = -------------------------- ⋅
)
------------
∑
a k po ( t 0 – k T b ) . (3.14)
k = –∞
dt t = t k = –∞
0
This technique is based on classical perturbation theory (e.g., [76]). The assumption
made above on the amount of perturbation bounds the accuracy of the method. In a
65
practical system the bandwidth is chosen such that unit pulse response fall time is within
Tb. Therefore, po(t0-kTb) is much smaller than vth and (3.14) is a good approximation. If
the link is designed such that the received pulse has the shape of a Nyquist pulse, the
approximation still holds. For such pulses the residual memory of prior bits changes
slowly around the threshold-crossing [40]. Therefore, the perturbation of the step response
is po(t0-kTb). A similar methodology was used to calculate the reference jitter in a clock
recovery system [66][77][78].
We evaluated the results in (3.14) for all possible bit sequences and compared them
against the accurate DDJ in (3.9) for a first-order system. We limit k to – 10 ≤ k ≤ – 2 to
account for the 11 most recent bits only because the effect of the bits exponentially
decreases. Error in DDJ prediction is calculated for each bit sequence at different ratios of
bandwidth (1/2πτ) to bit rate (1/Tb), and for each ratio the worst case relative error is
plotted in Figure 3.8(a). The perturbation method approximation has worst case accuracy
of better than 2.5% in a practical range of bandwidth. Moreover, at the nominally
optimum bandwidth-to-bit rate ratio of 0.7, the error is only 0.01%. For a first-order
system, the error in approximation is identical even if – 3 ≤ k ≤ – 2 .Therefore, (3.14)
introduces a basis for a very efficient technique of calculating data-dependent jitter.
6
different ζ
Relative error [%]
5 0.4
Relative error [%]
0.5
0.71
4 1
0.01%
3
1
0 .7
0.5 0.6 0.7 0.8 0.9 1
f-3dB.Tb f-3dB.Tb
(a) (b)
Figure 3.8: Worst case accuracy of the perturbation method in predicting DDJ: (a) for a first-order
system. (b) for a second-order system
66
2
ωn
H ( s ) = ------------------------------------------- (3.15)
2 2
s + 2ζω ns + ω n
where ωn is the natural frequency and ζ is the damping factor. The exact DDJ value for
this system is computed from MATLAB simulations of system output for all possible bit
sequences. Then, the approximated DDJ is calculated using (3.14). The results are com-
pared and the worst case relative error is plotted in Figure 3.8(b) for different damping
factors over a practical range of bandwidth normalized to bit rate. Again, small relative
errors verify that (3.14) is an accurate expression for predicting the DDJ of a general LTI
system based on its step response.
We can use (3.14) to estimate the peak-to-peak data-dependent jitter for a general LTI
system. We have
–2
ds ( t
1
∆t pp = -------------------------- ⋅
)
------------
∑
k = –∞
p o ( t 0 –k Tb ) . (3.17)
dt t = t
0
Scale-one DDJ can also be defined for a general LTI system similar to (3.11).
However, the predominant impact on jitter is not necessarily related to a-2, as discussed in
Section 3.3.3. The pulse response of the system and the bit rate determine the effect of
67
prior bits. The effect of each prior bit can be estimated separately from (3.13) and the bit
with most prominent impact can be distinguished. Then, using the same definition as in
(3.11) and assuming that am has the largest impact on DDJ, we can write
DDJ 1 = E { ∆t a m = 0 } – E { ∆t a m = 1 }
⎧ ⎫ ⎧ ⎫
⎪ –2 ⎪ ⎪ –2 ⎪ . (3.18)
⎪ ⎪ ⎪ ⎪
= E⎨
⎪
∑ ak ∆t k ⎬ – E ⎨ ∆t m +
⎪ ⎪
∑
a k ∆t k ⎬ = ∆t m
⎪
⎪ k = –∞ ⎪ ⎪ k = –∞ ⎪
⎩ k≠m ⎭ ⎩ k≠m ⎭
Therefore, we conclude
po ( t 0 –m T b )
DDJ 1 = ----------------------------
- , (3.19)
ds ( t )
------------
dt t = t
0
which is an important expression that determines the separation of the two impulses in the
probability distribution function of DDJ as in Figure 3.1(a) for a general LTI system. It
can be integrated into any communication link design or circuit design simulation soft-
ware to predict the data-dependent jitter contribution of the corresponding component in
the system. In addition, DDJ1 can be easily measured using a general purpose high speed
oscilloscope. We will verify equation (3.19) experimentally in Section 3.5. A significant
advantage of the perturbation method is the remarkable reduction of the simulation or
measurement time of DDJ. In fact, simulation time for peak-to-peak DDJ is now linearly
related to k, while direct calculation from (3.6) requires passing all the 2 k possible
sequences through the system, which increases exponentially with k.
In a first-order system, any ak=1 will increase the absolute value of DDJ. Furthermore,
the closer the bit to the data transition, the stronger its impact on data jitter. However, this
68
∆t-3
∆t-4
∆tpp [UI]
∆t [UI]
ζ :0.3-1 ζ :0.3-1
ωn : π/Τb ∆t-2 ωn : π/Τb
f-3dB.Tb f-3dB.Tb
(a) (b)
Figure 3.9: (a) Variation of the impacts of the last three prior bits on DDJ in a second-order system. (b)
Existence of a minimum in the peak-to-peak data-dependent jitter
is not generally true for all LTI systems. It can be seen from (3.13) that the sign and value
of ∆tk depends on po(t0-kTb), and based on the response of the system the effect of each
prior bit can dramatically vary independent of the other bits. Particularly, the pulse
response in (3.13) is sampled at integer multiples of bit period. Therefore, for a given bit
rate, the system can be designed such that its pulse response reduces dominant DDJ terms
and minimizes overall jitter. Pulse shapes that result in minimum jitter in addition to
minimum ISI in the receiver have been studied [68][79]. As an example, the variations of
the first three DDJ terms from (3.13) are plotted in Figure 3.9(a) for a second-order system
with different bandwidth-to-bit rate ratios. The selected range covers under-damped,
over-damped, and critically damped systems. In the range of 0.46-0.48 for the normalized
bandwidth, ∆t-3 has a larger impact on DDJ than ∆t-2. In addition, there exists a minimum
in the peak-to-peak data-dependent jitter as illustrated in Figure 3.9(b). This jitter
minimization behavior can be observed in higher-order systems as well. An experimental
example is shown in Figure 3.10, where the output eye diagram of a 4” copper microstrip
transmission line on conventional FR4 board is plotted at two different bit rates. The
69
5 ps 5 mV 5 ps 5 mV
(a) (b)
Figure 3.10: The output eye diagram of a 4” microstrip line on FR4 PCB at (a) 5 Gb/s and (b) 6.5 Gb/s
demonstrates larger peak-to-peak deterministic jitter at lower bit rate
peak-to-peak jitter is clearly larger at the lower bit rate. As will be shown in Section 3.5,
increasing the bandwidth blindly does not necessarily reduce the DDJ.
spectrum does not exceed the system bandwidth. This fact demonstrates that while the
system bandwidth is large enough to minimize amplitude distortion, DDJ still persists.
The jitter histogram is measured after at least 500,000 crossing events are captured by the
oscilloscope. At the same time, we compute the pulse response from the measured step
response and the current bit rate and calculate DDJ1 from (3.19). Finally, we compare the
measured and analytically-calculated DDJ1.
A. Discrete systems. In one set of experiments we carry out the procedure for various
off-the-bench systems available in the lab. They include a Mini Circuit ZFL 1000-LN
driver amplifier with 1GHz bandwidth, a 9” long 50Ω copper microstrip on standard FR4
printed circuit board, a 10.5’ long standard BNC coaxial cable, and an HP 11688A
microwave high-order lowpass filter with cut-off frequency of fc=2.8 GHz. None of these
systems has a simple first-order response. Therefore the DDJ1 should be estimated from
(3.19). The measurement results are summarized in Table 3.1. Small relative errors in the
last column verify the validity of the analytical results for predicting data-dependent jitter.
For the microstrip line, a-3 rather than a-2 has the most dominant effect on DDJ and causes
the scale-one separation of the threshold-crossing times.
Measured Corresponding
Bit Rate Dominant
DUT DDJ1 dominant ∆tk Error
Measured Bit
[psec] [psec]
Mini Circuit
1.3 Gb/s 7.665 a-2 7.15 -6.7%
ZFL-1000
microstrip
10 Gb/s 5.35 a-3 5.23 -2.3%
on FR4 PCB
HP 11688A
1.2 Gb/s 20.5 a-2 18.96 -7.5%
Lowpass Filter
coaxial
3 Gb/s 4.6 a-2 4.72 2.5%
cable
71
Amplitude [V]
∆tk [psec]
4
∆tk [psec]
0.3 0.3 3
0.2 0.2 2
2
0.1 0.1 1
0
0 0 0
-0.1 -2 -0.1 -1
0 1.25 2.5 3.75 5 -5 -4 -3 -2 0 0.5 1 1.5 2 -10 -8 -6 -4 -2
time [nsec] k time [nsec] k
(a) (b)
HP 11688A Lowpass Filter BNC Coaxial Cable
Step/Pulse Response Prior bits jitter contributions Step/Pulse Response Prior bits jitter contributions
0.8 0.8
15
0.6 0.6 4
10
Amplitude [V]
Amplitude [V]
∆tk [psec]
∆tk [psec]
0.4 0.4 3
5
0.2 0.2 2
0
0 -5 0 1
Step response, pulse response, and the jitter contributions of some prior bits are plotted
in Figure 3.11 for the systems we tested. ∆tk is calculated from pulse response using
(3.13). An important observation is the significance of the time response of the system and
its impact on data-dependent jitter at the output. HP 11688A is a lowpass filter with the
3dB cut-off frequency at 2.8GHz. Compared to ZFL-1000, an amplifier with 3dB
bandwidth of 1GHz, one may suspect that the data-dependent jitter contribution to overall
jitter is larger for the amplifier due to smaller bandwidth. However, around the same bit
rate (1.2-1.3 Gb/s), the filter has significantly larger DDJ. This can be associated to the
72
pulse response characteristics of the two systems as illustrated in Figure 3.11(a) and (c).
Pulse response of the filter has larger ringing in its damping tail that dramatically increases
the jitter from (3.13) because the samples of the pulse response at the measurement bit rate
(1.2 Gb/s) coincide with the maxima and minima of the oscillating tail. Consequently, the
contributions of prior bits are all significant and oscillate between negative and positive
values, as can be seen from Figure 3.11(c). However, the amplifier has smaller ringing,
and the ringing oscillation frequency is not constant and is not related to the measurement
bit rate.
In a communication link, if the channel response is not known or is time varying, zero-
ISI pulse shaping is not possible. In such cases, an adaptive equalizer is utilized in the
receiver to minimize ISI. Similarly, if pulse shaping for the transmitted data sequence is
not feasible due to channel unpredictability, a data-dependent jitter equalizer can be used
in front of the clock recovery circuit [80].
Vout
CHIP
Bias
Vin
Figure 3.12: TIA test board setup for the 10 Gb/s TIA
duroid PCB. The chip is wire bonded to microstrip transmission lines that then transfer the
signal to SMA connectors on the brass substrate. The test board setup is shown in
Figure 3.12. Although this TIA has enough bandwidth to operate at 10 Gb/s, the
reflections from connectors and wirebond mismatches in addition to the amplifier
response cause the whole system to have a ringing step response as the measurement
shows in Figure 3.13. In spite of having enough bandwidth, the TIA, along with the
measurement setup, exhibits a large amount of DDJ.
We measured the DDJ of the TIA at two bit rates, 1.65 Gb/s and 3.3 Gb/s, using the
same procedure previously discussed. While the bit rates are within the bandwidth range
of the TIA, we observed significant amounts of DDJ. The eye diagram at 1.65 Gb/s is
a-2
40
Amplitude [mV]
30
20
10
0 1 2 3 4 5
time [nsec]
Figure 3.13: TIA step response and impact of a-2 pulse on t0 in a “101” sequence at 3.3Gb/s
74
shown in Figure 3.14(a). The measurement results are summarized in Table 3.2. We
should stress that the prediction of DDJ at several bit rates can be done by measuring the
step response only once.
Table 3.2: Comparing measured DDJ1 and predictions of analytical expression for the
10GB/s CMOS TIA
In the case of 1.65 Gb/s, DDJ prediction using the perturbation method has only 0.85%
error. Larger scales of data-dependent jitter that are associated with prior bits with less-
dominant jitter contributions are often smaller than rms of random jitter. Therefore, they
are hard to measure or observe and are thus neglected. However, the perturbation method
can still predict the DDJ of larger scales. We measured the DDJ scale-one (DDJ1) and
scale-two (DDJ2) of the TIA at 3.37 Gb/s, where both were observable, as Figure 3.14
illustrates. The measurement results are compared with the calculations in Table 3.2. The
perturbation method predicts scale-two DDJ with an accuracy of 2.5%. The measured
values of DDJ1 and DDJ2 are respectively related to ∆t-2 and ∆t-3 as calculated from
(3.13). The negative value of ∆t-2 corresponds to a negative shift in the zero crossing. In
other words, all the sequences in which ∆t-2 is “1” will split from the zero crossings that
occur at t0 and will move to t0 -|∆t-2 |. On the other hand, positive ∆t-3 will split each
crossing group to two groups, one remaining in the same position and one moving ∆t-3 to
the right. Therefore, overall, four crossing groups can be observed, as in Figure 3.14(b).
75
5 ps 6.85ps
5 mV
(a)
DDJ2
5 mV
5 ps
DDJ1
(b)
Figure 3.14: TIA eye diagram when DDJ1 and DDJ2 are observable (a)1.65 Gb/s (b)3.37Gb/s
∞ –T b + Ts
∫ fTJ ( t ) ⋅ dt ∫
1 1
BER j ( T s ) = --- + --- f TJ ( t ) ⋅ dt . (3.20)
2 2
Ts –∞
Let’s assume that the DDJ is modeled with a double Dirac delta function distribution
and the dominant prior bit that causes this distribution is a-2. fTJ(.) is the convolution of the
76
RJ Gaussian distribution and the DDJ distribution as in (3.2). Equivalently, we can split
(3.20) to two terms where each is the BER caused by random jitter conditioned on the
value of a-2. We have
where p is the probability that a-2=0. Each of the BERj terms on the right can be calculated
from (3.20) by replacing fTJ(.) with a Gaussian distribution, while noting that the value of
the a-2 determines the mean value of the Gaussian distribution. The mean value of the
Gaussian distribution is the same as the mean of the threshold-crossing times and can be
found from (3.7)
t c, 0 = E { t c a – 2 = 0 } = t 0 – E { ∆t a – 2 = 0 } (3.22)
t c, 1 = E { t c a– 2 = 1 } = t 0 – E { ∆t a – 2 = 1 } . (3.23)
We find the values of tc,0 and tc,1 in Appendix B. We can write the BER caused by jitter,
conditioned on a-2, as
⎛ T s – t c, 0⎞ ⎛ T b – ( T s – t c, 0 )⎞
BER j ( T s, a –2 = 0 ) = BERj ( T s, t c, 0 ) = Q ⎜ --------------------⎟ + Q ⎜ -------------------------------------⎟ (3.24)
⎝ σj ⎠ ⎝ σj ⎠
⎛ T s – t c, 1⎞ ⎛ T b – ( T s – t c, 1 )⎞
BER j ( T s, a – 2 = 1 ) = BER j ( T s, t c, 1 ) = Q ⎜ --------------------⎟ + Q ⎜ -------------------------------------⎟ . (3.25)
⎝ σj ⎠ ⎝ σj ⎠
The impact of DDJ on the overall BER can also be calculated by modifying the
distribution of the timing jitter. We have carried out this analysis in Appendix A, equation
(A.10). We can use (A.10) to generate the BER contours similar to Section 2.6.3.2. We
have plotted the BER contours for a first-order system in Figure 3.15(a) for when
σj=0.05UI and N0=4e-3V/Hz2. Because the DDJ is related to the system response and it
decreases with larger bandwidth in a first-order system, the BER contours depend on the
bandwidth at all of the sampling points. This is in contrast to Figure 2.28(b), where DDJ
was neglected and the BER becomes independent of bandwidth when jitter dominates the
BER.
77
-2
-6
1.4 1.4
-2
-6
-2
-8
-6
-4
-4
-8
-6
-4
1.2 1.2
f-3dB/(Bit Rate)
f-3dB/(Bit Rate)
-10 -10
-8
-4
-8
-2
1 1
-6
-10
-2
-2
-6
-6
-10
-8
-8
-4
-4
-12
-6
-4
-10
0.8 0.8
-10
-4
-12
-8
-1212
-8
-
-2
0.6 0.6
-10
-2
-6
-6 - 8
-6
-2
-4
-8
-6
-4
-1 -8
-
-10
-8 10
-4
0.4 0.4
0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9
Ts [UI] Ts [UI]
(a) (b)
Figure 3.15: BER contours for σj=0.05UI and N0=4e-3V2/Hz for two reference times for the sampling
point (a) t=0 (b) bandwidth-dependent threshold-crossing time
The values of tc,0 and tc,1 are functions of the system response. For instance, for a
first-order system, both tc,0 and tc,1 increase. In practice, this results in an offset in the eye
diagram of the data. This is because the threshold-crossing times that determine the start
and stop time of the eye diagram shift. Therefore, the absolute value of the optimum
sampling time that is achieved from the BER contours such as the one in Figure 3.15(a)
must also be offset by the same amount. Equivalently, we can replot the BER contours and
change the reference time for the sampling time from t=0 to the threshold-crossing time
for each bandwidth. This will result in the contours in Figure 3.15(b). Figure 3.15(b)
demonstrates that for all of the bandwidths, the optimum sampling point is at 0.65UI,
which is 0.15UI offset from the middle of the eye. The optimum system bandwidth for
minimum BER is again around 70% of the data rate.
We showed in Section 3.4 how to estimate the DDJ impact of a general LTI system
from its response that can be used to calculate tc,0 and tc,1 in (3.22) and (3.23). Therefore,
the BER contours can be easily obtained for any LTI system based on its response.
78
3.7 Summary
The data-dependent jitter is one type of deterministic jitter that results from residual
effects of prior bits on a data threshold-crossing time. It degrades the BER and the data
link performance as the data rates increase, while the system bandwidth budget is
restricted. We proposed a methodology to analytically estimate a general LTI system’s
data-dependent jitter based on its step response. The method reduces the complexity
remarkably because computation time grows linearly with the number of prior bits.
Whereas, in conventional methods, the complexity grows exponentially with the number
of bits.
We verified the validity of the analytical results with simulations and demonstrated
experimentally that this approximation is reasonably accurate for several systems. In
addition, we showed that certain pulse response shapes can result in a minimum
peak-to-peak data-dependent jitter. Finally, we highlighted that 3dB bandwidth does not
characterize DDJ of the system completely, and the shape of the system step response is
the important and essential element that determines DDJ characteristics. We provided the
relationship between the overall BER of a data link and the link response by considering
the effect of DDJ that complemented our calculations in Chapter 2. By analytically
relating the impact of the data link impairments to the BER we can design the system
response and link specifications to optimize the link reliability.
79
Chapter
Bandwidth
4 Enhancement for
Wideband Amplifiers
4.1 Introduction
Wideband amplifiers are one of the most critical building blocks at the front-end of a
high-speed link receiver. As we discussed in Chapter 2, any baseband communication
system needs a wide bandwidth receiver due to the signal’s low-frequency spectral
content. Particularly, all amplifiers in the signal path, such as the trans-impedance
amplifier (TIA) in Figure 2.21, should have enough bandwidth with minimum variations
in the passband and near constant group-delay to avoid distortion in the signal. We studied
the impact of restricted bandwidth, in the form of ISI and jitter. In this chapter, we provide
conditions to maximize the bandwidth of amplifiers in the front-end of high-speed
receivers. We are mainly interested in integrated amplifiers that are implemented by a
silicon-based technology.
Silicon integrated circuits are the only candidates that can achieve the required level of
integration with reasonable speed, cost, and yield and have thus been pursued to a great
degree in recent years. In particular, full integration of silicon-based optical-fiber
communication systems at 10Gb/s and 40Gb/s is of great interest. However, silicon-based
integrated circuits implementing such systems face serious challenges due to the inferior
parasitic characteristics in silicon-based technologies, complicating the procedure for a
wideband design.
The inherent parasitic capacitors of devices are the main cause of bandwidth limitation
in wideband amplifiers. Several bandwidth enhancement methods have been proposed in
the past that can be utilized to overcome this issue in silicon technologies. First-order
shunt peaking has historically been used to introduce a resonant peaking at the output as
80
the amplitude starts to roll off at high frequencies [81]–[83]1. It improves the bandwidth
by adding an inductor in series with the output load to increase the effective load
impedance as the capacitive reactance drops at high frequencies. Neuhauser et al. studied
the effect of bondwire inductors and used an active peaking network to enhance the
bandwidth [84][85]. Capacitive peaking uses an explicit capacitor to control the pole
locations of a feedback amplifier and thus potentially improves the bandwidth [86].
A more exotic approach to solving the problem that was proposed by Ginzton et al. is
using distributed amplification [87]. Here, the gain stages are separated with transmission
lines. Although the gain contributions of several stages are added together, the artificial
transmission line isolates the parasitic capacitors of several stages. In the absence of loss,
we can improve the gain-bandwidth product without limit by increasing the number of
stages. In practice, the improvement is limited by the loss of the transmission line. Hence,
the design of distributed amplifiers requires careful electromagnetic simulations and very
accurate modeling of transistor parasitics. For instance, a CMOS distributed amplifier was
presented in [88] with a unity gain frequency of 8.5 GHz.
1. For more references on traditional techniques for wideband amplifier design look at the bibliogra-
phy of [89].
81
Section 4.2 reviews these theoretical limitations. Section 4.3 presents a technique to
improve the bandwidth of wideband amplifiers. A design example using this technique
follows in Section 4.4 to demonstrate the practicality of the method, whose validity is
shown with experimental results in Section 4.5.
Over the last few decades, many techniques have been developed to improve the
bandwidth of amplifiers [36]. An improvement in the bandwidth of the amplifier is often
accompanied by a corresponding drop in its low frequency voltage gain. As such, the
gain-bandwidth product (GBW) can serve as a first-order figure of merit for an amplifier
topology in a given device technology [89][90]. For the purposes of this discussion, the
bandwidth is defined as the lowest frequency at which the voltage gain drops by 2 or
3dB. Accordingly, this bandwidth is often called the 3dB bandwidth. In Section 4.2.1, we
discuss the GBW limits of single-stage amplifiers for one- and two-port passive load
networks. Section 4.2.2 is dedicated to GBW limits of multi-stage amplifiers.
Figure 4.1(a) shows the simplest model for a linear single-stage amplifier, where R and
C are, respectively, the aggregate parasitic resistance and capacitance of the transistor and
the input of the following stage. The gain-bandwidth product of this amplifier is given by:
82
(a) vin R C
gmvin
Figure 4.1: Single-stage amplifier: (a) First-order load (b) General passive impedance load
1 gm
GBW = g m R ⋅ ------------------- = --------------- (4.1)
2π ⋅ RC 2π ⋅ C
As can be seen, the parasitic capacitance directly limits the bandwidth by reducing the
output impedance of the amplifier as the frequency grows. Consequently, retaining a
uniform output impedance over a wider frequency range will increase the GBW. In
general, it is possible to introduce a more elaborate passive load network Z(jω) to do so.
Figure 4.1(b) shows the generic load network, Z(jω), that should look like a constant
resistor over as wide a frequency range as possible. Wheeler [89] and Hansen [90] have
derived an intuitive upper bound for such a range. Bode [26] has mathematically proven
the existence of a bandwidth limit for a class of load impedances. Fano [27] and Youla
[91] have further generalized the theory for a larger class. This theoretical limit (a.k.a.
Bode-Fano Limit) for the amplifier in Figure 4.1(b) is [96]:
gm
GBW max = ----------- (4.2)
π⋅C
where C is defined as:
83
C = lim ⎛⎝ ---------⎞⎠
1
(4.3)
ω→∞ jωZ
and Z(jω) is an impedance function, as defined in Appendix C. Z(jω) includes the aggre-
gate output capacitance C, shown in Figure 4.1(a). It is easy to show that for a one-port
load network, C is greater than or equal to C. Thus, according to (4.2), any one-port pas-
sive network added in parallel to C can improve the GBW by at most a factor of two over
that of the amplifier in Figure 4.1(a). As a result, the maximum achievable bandwidth
enhancement ratio (BWER) for a one-port load is two. Shunt-peaking is an example of this
case. Shunt peaking results in BWER of 1.6 and 1.72 when designed for optimum group
delay or maximally flat responses, respectively [83].
Figure 4.2(a) shows a single-stage amplifier, where the intrinsic output resistance and
capacitance of the transistor, i.e., R1 and C1 are separated from those of the load, namely,
R2 and C2. The combination of capacitors C 1 and C 2 limits the bandwidth of the
amplifier, i.e.,
gm
GBW = ----------------------------------- . (4.4)
2π ⋅ ( C 1 + C 2 )
In this case, a passive two-port network can be inserted between the transistor’s intrinsic
components (R1 and C1) and load (R2 and C2) to increase the bandwidth, as shown in
Figure 4.2(b). This two-port passive network can be designed to maintain the impedance
constant over a wider frequency range, as it separates and isolates C1 and C2. Therefore,
C1 is the only capacitor that affects the gain-bandwidth product at the input port of the net-
work. Based on the argument in Section 4.2.1.1, the maximum gain-bandwidth product at
the input port of N(jω) is:
gm
GBW max = -------------- . (4.5)
π ⋅ C1
84
2 gm
GBW max = ----------- . (4.6)
π⋅C
This can be done by using a constant-k LC-ladder filter [26][50][89] terminated to its
image impedance. A constant-k LC-ladder filter that is terminated in its image impedance
(a) vin R1 C1 C2 R2
gmvin
(c) vin R1 C1 C2 R2
gmvin
Figure 4.2: (a) Small signal model of an amplifier with loading effect of next stage amplifier (b) The
inserted passive network isolates the amplifier parasitics and the load (c) Additional
inductor forms a 3rd-order passive network at the output
1. If not equal, he proposes adding an ideal transformer at the output to match C2 to C1 with the proper
ratio.
85
has a constant transfer function over the frequencies less than its cut-off frequency. Com-
pared to (4.4) with C 1 = C 2 = C ⁄ 2 , (4.6) is four-times larger than the gain-bandwidth
product of a single-stage amplifier without an additional coupling network. As a result, for
equal low-frequency gain, the maximum achievable BWER for a two-port load is four.
(a)
0.2
Figure 4.3: Normalized gain of the amplifier with 3rd-order network load and different inductor values:
(a) R1 = R2 = 1Ω, C 1 = C 2 = 1F (b) R 1 = 0.5Ω, R 2 = ∞, C1 = C 2 = 1F
Table 4.1: Bandwidth enhancement ratios for the two 3rd-order passive
networks in Figure 4.3
Case 1 Case 2
L Value [H]
R1=R2=1Ω R1=0.5Ω R2= ∞
0 1 1
0.2 1.44 3.4
0.4 2.46 2.42
0.5 2.2 2.17
0.6 2 1.99
0.707 1.83 1.83
1 1.52 1.55
1.3 1.31 1.36
ω overall = ω ° ⋅ N 2 – 1 (4.7)
N
GBW = A v ⋅ ω overall . (4.8)
88
For instance, N=2 and Av=10 correspond to a factor of 6.4 improvement in GBW. For
larger Av, GBW will increase dramatically by introducing additional single stages at the
price of increasing overall power consumption.
In practice, each stage has a loading effect on its previous stage, which reduces its
bandwidth, hence reducing the overall bandwidth. The matching networks introduced in
Section 4.2.1.2 can reduce the loading effect by separating the output of an amplifier from
the input of its next stage. One disadvantage of multi-stage amplifiers, in general, and
multi-stage amplifiers with two-port matching networks between each stage, in particular,
is excessive phase shift that each amplifier stage or each network adds to the signal path,
which can result in instability in feedback amplifiers.
–1 -
---------------
2N – 2
1. The overall GBW will actually improve if A v > (N 2 – 1) .
89
In this approach one can resort to passive networks with low sensitivity to component
values such as ladder structure [99]. Figure 4.4 shows a general low-pass ladder structure
inserted between two gain stages in an amplifier. The component values are generated
using standard look-up tables [97] or network synthesis methods [98]. The network order,
N, is an additional design parameter. Using higher-order networks will provide wider
bandwidth and sharper transition from passband to stopband. However, it may cause some
practical issues, such as unreasonable component values, large numbers of passive
components (large die area), and additional signal loss due to passive components
(primarily inductors). Typically these issues limit the order of the network to five, i.e., only
three additional passive components.
Design Example: Here, we show the procedure for designing a maximally flat
response 3rd-order passive network as an example. Figure 4.5(a) illustrates the two stages
of a given amplifier with an inductor inserted between them. Figure 4.5(b) demonstrates
that the inductor forms a 3rd-order ladder structure with C1 and C3, transistor parasitic
capacitances. The values for R1, R2, C1, and C3 are known for the amplifier. To achieve a
maximally flat frequency response at the output of the ladder, components values should
be equal to their corresponding 3rd-order Butterworth filter elements as follows [98]:
1
C 1 = ------------------------------ (4.10)
R 1 ( 1 – δ )ω c
Passive Network
L2 L4 L6
vin gmvin R1 C1 C3 C5 CN R2
Added Network
Figure 4.4: Passive ladder structure of order N, inserted between the gain stages
90
L2
R1
L2
C1 C3 vin R1 C1 C3 R2
gmvin
R2
(a) (b)
Figure 4.5: (a) An inductor is inserted between two gain stages. (b) The small signal model shows
formation of a 3rd-order ladder network
2
L 2 = --------------------------------------------------- (4.11)
2 2
( 1 – δ + δ ) ⋅ ωc ⋅ C1
1
C 3 = ------------------------------ (4.12)
R 2 ( 1 + δ )ω c
R1 – R2
δ = 3 ------------------
- (4.13)
R1 + R2
and ωc is the 3dB cut-off frequency of the network. From (4.10) the new amplifier band-
width at the output of the ladder structure is
1
ω c, new = -------------------------------------- . (4.14)
( 1 – δ ) ⋅ R1 ⋅ C1
The inductor value can be calculated from (4.11) and (4.14). C3 for the original amplifier
may not be equal to the value with the new cut-off frequency, calculated from (4.12).
Some explicit capacitance should be added to adjust for this. If we define the bandwidth
enhancement ratio (BWER) as the ratio between the new 3dB bandwidth and the old one
(without adding the inductor) of the single-stage amplifier, we can show:
91
ω c, new 1 R2 C 1 + C3
BWER ≡ ----------------- = ----------- ⋅ ------------------- ⋅ ------------------- . (4.15)
ω c, old 1 – δ R 1 + R2 C1
Equations (4.10), (4.12), and (4.13) simplify (4.15) to an expression based on the ratio
of R1 and R2. BWER decreases monotonically when R2/R1 increases. For a given
amplifier with R2<R1, adding the inductor always enhances the bandwidth by BWER.
When R2=R1, BWER=1 and there is no bandwidth enhancement with adding the inductor.
However, a maximally flat pass band and sharp cut-off response is still achieved.
The same analysis can be applied to the input stage of a trans-impedance amplifier
(TIA) as shown in Figure 2.21. The photodiode is modelled by a current input in parallel
with a capacitance, CPD, as shown in Figure 4.6. Although R1 is eliminated from the
model, design calculations using (4.10)–(4.14) can use an arbitrary value for R1. An
optimum value for R1 can be computed from (4.14) with fixed C1 (CPD in this case) and
R2, to maximize the 3dB bandwidth. It results in R 1 = 2.05R 2 with δ = 0.7 . After
designing the inductor and adjusting for C3, R1 can be eliminated. Essentially, the
trans-impedance gain will increase as no portion of the input current is absorbed by R1
anymore. The enhancement ratio should also be modified for the input passive structure
as:
1 R 2 C PD + C 3 ⎛ C3 ⎞
BWER = ----------- ⋅ ------ ⋅ ------------------------ = 1.63 ⋅ ⎜ 1 + -----------⎟ . (4.16)
1 – δ R1 C PD ⎝ C PD⎠
L2
IIn CPD C3 R2
Figure 4.6: The inductor at the input forms a 3rd-order ladder network with the photodiode capacitance
92
The preceding example can be generalized to any response shape when (4.10)-(4.12)
are replaced with their corresponding filter component equation. (4.15) and (4.16) should
also be modified to correspond to the new component values.
1 A
P in ≈ ------------------------------------------- ≈ ---------------------------------------- (4.17)
R in ⋅ ( C PD + C in ) Rf ⋅ ( C PD + C in )
where Rin and Cin are the input resistance and input capacitance, respectively. For the
circuit in Figure 4.7, if the transistors are in short channel region, both Cin and A are
proportional to the input transistor width, Win:
A ≈ gm ⋅ RL ≈ v sat C ox W in ⋅ R L (4.18)
RL
Vbias
Vin RS
Rin,Cin
Rf
Figure 4.7: Schematic of the input stage of the TIA
93
C in ∝ C ox L in W in (4.19)
where Cox is the gate oxide capacitance, vsat is the carrier saturation velocity, and Lin is the
input transistor channel length. When the input width increases there is a bound for the
input pole dominated by Cin. However, additional constraints such as power consumption
or input noise set an optimum width for the input transistor [101]. Adding the additional
inductor to isolate Cin and CPD enhances the bandwidth according to (4.16). In this design
we match the input resistance to our electrical measurement setup, which had a 50Ω input
resistance.
The complete schematic of the circuit including the added passive components is
shown in Figure 4.8. The second and third stages of the amplifier are designed as cascode
configuration with intermediate inductors and are isolated using a source follower buffer.
Adding the source follower avoids the large input capacitance of the 3rd-stage amplifier to
load the 2nd stage as well as providing a low impedance node at its output and increasing
its pole frequency.
Four passive networks are inserted between the stages of the amplifier to enhance the
bandwidth. The input network separates the photodiode capacitance and the parasitic
capacitance of the input stage. Adding one inductor will transform it to a 3rd-order ladder
structure. The next two networks are also 3rd-order and are placed between the cascode
RL
RD
Parasitic Capacitances
VB VB L4 VOut
RF
L2 L3
L1
AV Buffer
I In C PD
Figure 4.8: Schematic of the TIA with parasitic capacitances and additional inductors
94
transistors. The load capacitance in conjunction with the output capacitance (including
bonding pad) and output bondwire inductor form the output 3rd-order network.
The capacitors, as shown with a dotted line in Figure 4.8, are the parasitics from the
devices and only four inductors are added to the original circuit. The input and output
inductors are bondwire inductors and the inter-stage ones are on-chip spiral inductors. A
final optimization step in simulation is performed to include the bilateral effects of the
devices. Note that the output network is different from a conventional shunt-peaking
approach. For a photodiode capacitance of 0.5pF, the circuit achieves over 9GHz 3dB
bandwidth in simulation. This is 2.4 times larger than the bandwidth achieved using the
same circuit without the inductors. The individual effect of each passive network and the
effect of a combination of them are summarized in Table 4.2 from simulation results.
L3 causes the largest improvement in bandwidth because the device sizes of the second
cascode amplifier are large to drive 50Ω with a minimum loss of gain. L1 is separating the
two large capacitances that form the input pole frequency. In our design, this pole is the
dominant bandwidth limiting factor of the core TIA without a driver. L4 is not remarkably
enhancing the bandwidth because the output pole is not dominant. However, L4 will exist
95
in the circuit as the bondwire and should be modeled. All four passive networks have a
ladder structure for lower sensitivity to process variations.
Both on-chip inductors were implemented as spiral inductors in the top metal layer.
Accurate electromagnetic modeling of the inductors was done using ASITIC [102] and
SONNET [103] E&M simulators and gave similar results. The parasitic capacitances of
the inductors are not negligible, and their impact is considered in addition to device
parasitics.
Z ° ⋅ S 21
Z ( jω ) = -------------------------------------------------------------------------- (4.20)
1 – S 11 + Z ° ⋅ jωC PD ( 1 + S 11 )
where Z ° = 50Ω is the reference impedance and CPD is the photodiode capacitance. The
amplitude and group delay response of the implemented TIA, extracted from measure-
ment data, are shown in Figure 4.9(a) and Figure 4.9(b), respectively, when CPD=0.5pF.
Matched output will cause a 6dB drop in the gain, which is adjusted for in the reported
result. Group delay is calculated from the phase response of the amplifier and logarithmic
frequency steps of the network analyzer.
96
60
Rtrans
50
(a) 20
10
0
-10
S11
-20
-30
0.1 1 10
Frequency [GHz]
400
350
Group Delay [ps]
300
250
(b)
200
150
100
50
0
1 2 3 4 5 6 7 8 9 10
Frequency [GHz]
Figure 4.9: (a) Trans-resistance gain of the TIA with 0.5 pF photodiode capacitance and the input
matching. (b) Group delay response of the TIA
The 3dB bandwidth is 9.2 GHz, which is in good agreement with the simulations, and
the trans-impedance gain is 54 dBΩ. The input reflection coefficient, S11, remains below
-10 dB up to 7 GHz. Although we did not design for flat group delay, the group delay
ripples are ± 25ps . The dip in the frequency response of the trans-impedance at 2.5 GHz
can be correlated to a resonance mode between the on-chip supply by-pass capacitor and
bondwire and supply line inductances. Changing these parameters changes its depth and
97
30 mV
50 ps
Figure 4.10: Eye diagram of the TIA output with 10GB/s 231-1 PRBS at the input
frequency during the measurement and can be removed by using a different supply
by-passing technique in a revised version of the design. The design has low sensitivity to
inductor values. The simulated values for L1 and L4 are 0.5–0.6nH. L2 and L3 are 1nH
100x100 µm2 spiral inductors.
Figure 4.10 shows the eye diagram when a 231-1 pseudo random bit sequence is
applied to the input at 10GB/s. The ringing is partly due to the resonance mode at 2.5 GHz
and partly due to the absence of the photodiode capacitance that will cause peaking in the
overall transfer function. This peaking translates to a ringing response in the time domain
and will increase the ISI penalty and close the eye vertically. However, the TIA still
achieves the overall sensitivity of -18dBm for BER, which is better than 10-12 as we
discuss next.
The electrical sensitivity of the amplifier for different bit error rates (BER) is measured
using Antrisu’s MP1763C and MP1764C BERT system. A 231-1 pseudo random bit
sequence is applied to the input at 10GB/s, and the BER is measured for different
electrical input powers at 500-second intervals. The results are depicted in Figure 4.11.
For a data communication link, the required BER is typically 10-12. The TIA achieves a
sensitivity of -18dBm or 15.8µW for this BER when photodiode capacitance is not
present. At very low power inputs we were limited to the sensitivity of the BERT system.
98
Figure 4.11: The BER of the TIA for different input powers at 10GB/s
The TIA output swing was not large enough to meet the minimum requirement of the
BERT input.
Simulated total input noise current of the TIA, integrated over the bandwidth, equals
1.6µA. In an optical receiver, there are two other noise sources that contribute to increase
the minimum-detectable optical power. One is the intensity noise of the transmitted signal
originating mainly from spontaneous emission of the laser source [43]. Resulting current
noise at the receiver input can be quantified as
where R is the responsivity of the detector, Pin is the input optical power, and RIN is the
relative intensity noise of the laser integrated over the bandwidth. Second noise source is
the shot noise of the photodetector that is generated proportionally to the optical power
given by [43]
where q is the electron charge, Idark is the dark current of the detector, and ∆f is the system
bandwidth. From (4.21) and (4.22) we can compare the injected noise currents and deter-
99
L1 Pad
L4 Pad L2
L3
mine the dominant noise source. Assuming peak RIN/∆f= -130dB/Hz for a typical laser
and a detector with R=0.8A/W and I dark =10nA, minimum input optical power of
Pin=20µW will result in it,rms=0.014µA and is,rms=0.22µA (∆f=9.2GHz). Therefore, the
thermal noise of the TIA is the dominant noise source of the receiver, and we expect that
the optical sensitivity and the electrical sensitivity of the TIA are comparable. The ampli-
2
fier core occupies 0.8 × 0.8mm of area, as shown in Figure 4.12.
4.6 Summary
In this chapter, we studied wideband amplifier design for the front-end of high-speed
wireline links. We addressed the gain-bandwidth product (GBW) limits of amplifiers and
introduced a methodology that can be used to enhance the bandwidth of wideband
amplifiers with specified characteristics for their transfer functions. In a simple design
100
Chapter
Eye-Opening Monitor
5 for Adaptive
Equalization
5.1 Introduction
In Chapters 2 and 3, we discussed how the channel and receiver front ends of a
high-speed wireline communication system degrade received signal quality by adding ISI
and jitter. In Chapter 4, we provided a bandwidth enhancement method that can be utilized
in the design of the receiver front-end. To minimize the ISI caused by the channel
response we can use an equalizer (Section 2.3.4). For instance, Wu et al. [48][104] and
Reynolds et al. [49] have demonstrated significant bit error rate (BER) reduction by using
transversal filter equalizers in the receiver front-end of multi-mode fiber links to
compensate for modal dispersion.
When the channel response is initially unknown or if it may vary over time, an
adaptive equalizer is used in which the transversal filter coefficients are adjusted
automatically and continuously to track channel response variations. Since the adaptation
is an iterative process, a feedback mechanism is required to measure and report the signal
quality at the equalizer output. In this chapter, we propose an eye-opening monitor
(EOM), which is a circuit block that reports a quantitative measure of the quality of the
signal eye diagram and thus can be used as such feedback.
Figure 5.1 shows the block diagram of a transversal filter adaptive equalizer that uses
an EOM circuit. The EOM evaluates signal quality by making periodic observations of the
filter output and providing information about the filter performance to an optimization
algorithm. The algorithm updates all of the filter coefficients accordingly. This
architecture is desirable if the transversal filter is implemented using broadband passive
102
T T
in
c0 c1 c2
out
Data
Filter CDR
New
coefficients
Algorithm EOM
Figure 5.1: Adaptive transversal filter equalizer with an eye-opening monitor (EOM)
delay lines, i.e., LC networks [48][49][104] or active delay elements [80]. At multi-Gb/s
data rates, the passive or active delay cells become more sensitive to on-chip parasitic
components. In contrast to conventional LMS adaptive equalization (Section 2.3.4), when
an EOM-based adaptive equalizer is used, the nodes of the delay cells of the filter are not
loaded by additional hardware for adaptation circuitry. Therefore, the filter can be
designed as a separate module and its response remains intact. The other advantage of the
EOM-based architecture is that the cost function for the coefficient optimization is only
based on the filter output and is independent of the receiver decision on the symbols. This
is specially beneficial in links where training sequences are not used for adaptation and
most decisions are erroneous at the startup when BER is high. The EOM can also be
utilized as a standalone measurement system to verify the quality of the eye.
103
Eye-opening monitor circuits have also been utilized as part of adaptive equalizers
[117]-[125] mainly to mitigate various dispersion issues in optical fibers. In [120] the eye
monitor estimates the vertical eye opening at the sampling point. The receiver includes a
path parallel to the main path that embraces a decision circuit with a variable decision
threshold. The threshold is varied to sweep the eye vertically. The decision of the two
paths are compared and an error is recorded if they differ. When the error is integrated
over time for various thresholds, the eye vertical opening for a given error rate can be
estimated from the separation of the thresholds that resulted in that error rate. Ellermeyer
suggests a circuit for estimating the horizontal eye opening of the input signal [118][119].
104
A rectangular mask with fixed height is overlapped with the input eye. The width of the
mask is increased as long as eye traces do not occur inside it and is decreased otherwise. In
steady state, the mask width indicates the horizontal eye opening.
We propose an EOM circuit architecture that has a unique feature of mapping both the
vertical (amplitude) and horizontal (temporal) opening of the received eye to a two-
dimensional error diagram [28]. The error diagram is directly correlated to the eye
opening in both dimensions and is essentially the captured image of the signal eye
diagram. The output error rate is recorded with a digital counter as opposed to an
accumulated or integrated format. This is advantageous when the eye monitor is in a
feedback loop with a microcontroller that runs the optimization algorithm because error is
recorded in finer resolution and potentially has larger dynamic range. We have
implemented a prototype of this 2D EOM circuit in 0.13µm standard CMOS technology,
and we have verified its operation up to 12.5Gb/s.
In the following sections, the operation principle of the EOM is discussed first. Then,
the architecture and details of the associated circuit blocks are presented. Finally, the
experimental techniques for verifying the operation of the prototype and the measurement
results are described.
error count
MER ≡
total transition s
0%
0.1%
20%
Figure 5.2: The mask error rate (MER) varies for different mask shapes in a given eye diagram
mask normalized by the total transitions during the same time period. Figure 5.2 illustrates
an example where MER is obtained for three different masks in a given eye diagram. Any
given mask is associated with a MER that increases as the quality of the monitored eye
degrades. The horizontal and vertical opening of an eye can be determined from the mask
size for a specific MER. Moreover, different eye diagrams can be quantitatively compared
by comparing their associated mask sizes at a given MER. The eye that can fit a larger
mask for the given MER is more desirable.
Figure 5.3: The effective eye opening formed by combining the mask areas that have the same MER
information about the shape of the eye. The EOM architecture in this design can measure
the effective eye opening for different MER values. The aggregate of effective eye
openings is a 2D error map that covers the eye diagram completely and is a representation
of the shape of the eye as Figure 5.4 illustrates hypothetically.
The MER for a given mask is found from counting the errors, i.e., the number of data
transitions that cross either of the two vertical sides of the mask. The operation is
demonstrated in Figure 5.5. Two reference voltages, VH and VL, define the vertical
opening of the mask, and the two phases of the sampling clock, φearly and φlate, determine
its horizontal opening. Data is continuously compared with VH and VL, and these results
10-3
10-4
10-5
10-6
10-7
Effective Eye Opening
MER
for various MERs
Figure 5.4: The combination of effective eye openings is a 2D error map that is correlated to the shape
of the eye diagram
107
VH
Mask
VL
φearly φlate
SH
left side
SL
SH
right side
SL
φearly φlate
Figure 5.5: Operation principle of the EOM for one mask
are sampled at both early and late phases. At each phase, if the sampled values differ, a
mask violation has occurred and an error is flagged. The error detection logic is
error = S H ⊕ S L , where SH and SL are sampled comparison results for either of the
phases, i.e., either side of the mask, and the operator is XOR. The timing diagram in
Figure 5.5 illustrates one violation for each side of the mask. If the errors of the left (from
φearly) and the right (from φlate) sides of the mask are counted separately, horizontally
assymetrical eye diagrams can be captured effectively. We have added this capability to
the architecture by providing two independent error detector blocks for two sides of the
mask.
DFF
D Q
SH,late early
Data DFF S 2
H,early
VH D Q
2 logic,
VL φlate retime, sel0 sel1
φearly 16
DAC I Q I Q
15
Vcontrol next_φearly next_φlate
next_ref
CML
2
full-rate clock
CMOS
In each logic block for early or late phases, the errors due to rising and falling edge
samples are detected, retimed, and merged. The errors are detected for the edges
109
separately by independent XOR gates. Then the error signals from the falling sampling
edge are re-sampled at the next rising edge to align the two error signals in time. Then,
they are merged by a logic OR function. The merged error signal is divided down by a
factor of 16 using CML logic. This allows the use of low power CMOS logic for the
dividers in the subsequent stages. Finally, the two error signals from the φearly and φlate are
retimed by the early sampling phase and are combined. The error output passes through a
digital divider with four selectable divide ratios. A larger divide ratio is selected in order
to measure cases with high-error counts. The chip output, error_out signal, is a toggling
output. MER for a fixed mask size can be calculated from the frequency of error_out
signal, ferror, as
N ⋅ f error
MER = ---------------------- , (5.1)
BR
where N is the total divide ratio in the chain and BR is the input bit rate. A separate divider
chain is used to divide the late sampling clock, φlate, by 512. The output is used to monitor
the clock divider and phase rotator functionality and is also applied as a trigger signal dur-
ing the chip test and characterization.
The sampling clocks are generated from an external full-rate clock that is divided by
two with an on-chip divider to create half-rate I and Q phases. Two single-quadrant phase
rotators interpolate between I and Q and between I and Q to create, respectively, φearly
and φlate. Therefore, the output phase of each rotator covers a range of 90o or half of the bit
period as can be seen from the timing chart in Figure 5.7. Each rotator has a 15-bit
thermometer-encoded control line that sets the phase interpolation weights and results in a
phase step of 6o. The control-line value for each rotator is determined by a phase-set shift
register. The trigger signals of the shift registers that increment the control lines for φearly
and φlate are next_φearly and next_φlate, respectively. When both control lines are set to
zero, φearly and φlate have the same phase as Q and overlap in the center of the eye. Every
positive edge on next_φearly moves φearly one step to the left. Similarly, every positive edge
110
full-rate
clock
I
I
φearly φlate
Figure 5.7: Generation of φearly and φlate by phase interpolation
on the next_φlate moves φlate one step to the right. The 16th positive edge on either
next_φearly or next_φlate automatically resets the phase to the center (Q) position.
By separately stepping the next_φearly, next_φlate, and next_ref trigger signals, the
architecture provides three degrees of freedom for obtaining several rectangular mask
sizes in both horizontal and vertical dimensions. Seven settings for the differential
reference voltage DAC and 15 for each phase rotator provide 210 different masks. The
number of masks can be increased by applying reference voltages externally with a
smaller step size. The MER increases as the mask size expands in either dimension. The
EOM can be utilized in two ways. Mask expansion can be stopped at a threshold MER to
report the eye opening or all masks can be swept to capture the full error map that
represents the effective shape of the eye diagram.
111
data VH
VL CML
data
Buffer
VB VB VB
vo = gm ⋅ R ⋅ [ ( vi – VH ) – ( vi – VL ) ] (5.2)
vo = gm ⋅ R ⋅ [ ( vi – vi ) – ( VH – VL ) ] . (5.3)
The latter is the desired output for a differential comparator with differential reference
voltage. The parameters gm and R in both Chapter (5.2) and (5.3) are, respectively, the
transconductance of one MOS transistor and the load resistor. Since the reference voltages
VH and VL are stepped such that all the input swing range is covered by the vertical mask
opening, each source-coupled pair must tolerate a wide range of common-mode input and
thus needs a large CMRR. The second stage is also added to enhance the CMRR of the
comparator and to increase its sensitivity. The tail current devices are designed longer than
the minimum feature size to improve their output impedance and further enhance CMRR.
The comparator is optimized to achieve maximum gain-bandwidth product. This maxi-
mizes the comparators’ sensitivity and thus minimizes the degradation of the input eye
diagram shape due to the EOM non-idealities. We will discuss the impact of EOM
non-idealities on the eye opening in Section 5.6.
112
The comparator’s offset is another limitation that affects the EOM operation by
shifting the rectangular mask vertically. The input offset for each input source-coupled
pair can be modelled by a shift in either of VH or VL in (5.2). Equivalently, the overall
offset can be modelled as a constant term on the right-hand side of (5.3). In the absence of
offset, the comparator maximum sensitivity is when VH =VL, and both are equal to the
input common mode. With offset, the maximum sensitivity is when VH -VL equals the
amount of offset. This interpretation is used to de-embed the offset impact on MER
measurements as will be shown in Section 5.6. In the implementation of the prototype we
minimized offset by careful layout techniques to increase matching between transistors.
We also avoided using low-Vt (MOS threshold voltage) devices for the input stage
transistors of the comparator due to their poor Vt-matching property. Monte-carlo
simulation of the comparator shows a mean output offset voltage of 6.4mV with
worst-case value of 25mV. A CML buffer follows the second stage of the comparator to
convert the output swing to proper levels for CML DFF blocks in the subsequent stages.
The DFFs use standard master-slave topology with conventional CML latches and
resistive loads. The clock divider is a static divider based on similar CML latches. We
used low-Vt transistors in the latch circuit to enhance the latch switching speed.
The phase rotator circuit consists of a phase interpolator and a phase-set register that
adjusts the proper interpolation weight. The phase interpolator is formed by two parallel
differential stages, as Figure 5.9 shows. The differential input of each stage is connected to
φout
φout
φ1 φ1 φ2 φ2
s0 s0 s14 s14
15b
x15
Phase Set next_φearly
VB VB
Registers next_φlate
90
45
4GHz
6GHz
22.5 8GHz
0
0.25 0.5 0.75 1
Input Weight
Figure 5.10: Simulated phase interpolator transfer function for different bandwidth
one of the two input phases. By properly adjusting the differential control lines, s0-s15, the
tail current is steered between the two stages to set the input phases weights and obtain the
desired interpolated phase. To generate uniform phase steps and thus uniformly sweep the
mask horizontal opening, the transfer characteristic of the phase interpolator, i.e., the
relationship between output phase and input weight, should be linear. This characteristic
can be controlled by the input signal transition slope and the bandwidth of the interpolator.
Figure 5.10 illustrates the transfer function for three different bandwidths that is achieved
by generalizing the approach in [127]. The phase interpolator is modelled by a
bandlimited system that performs a weighted sum operation on two input signals.
Although smaller bandwidths linearize the transfer function, they cause increased jitter
because they reduce output signal transition slope and thus create more timing uncertainty
due to amplitude fluctuation at the signal threshold-crossing point.
The reference-set register for the DAC, the phase-set registers for the phase rotators,
and all the back-end dividers and the error combiner are implemented using CMOS
standard cells in the technology to achieve lower power consumption.
114
VH VL
/2 & Phase
Comparators
660µm DFFs
next_φlate Data
Rotators
error_out Clock
clock_out
Figure 5.11: The die photograph of the EOM with magnified active core
PRBS Source
Clock
Delay
Trigger
Add
Jitter 10.0GHz Spectrum Analyzer Oscilloscope
clock
next_φlate next_φearly
∆t data
Oscilloscope/Counter
∆t data clock_out
delay line EOM chip
error_out
VH VL Power Supply
The block diagram of our test setup is shown in Figure 5.12. A PRBS source provides
the data and clock for a wide range of data rates up to 12.5Gb/s. The data source has an
additional port that controls the amount of jitter added to the data artificially. Although the
full rate clock phase is primarily phase-locked to data when applied to the EOM, the
on-chip path difference does not preserve the phase relationship. In our measurements, we
compensated the path delay mismatch by an external delay line. We adjusted the external
delay to minimize the MER for the minimum size mask to guarantee that the mask is
centered with respect to the eye. In the adaptive equalizer loop this calibration can be done
once at start up, as the delay mismatch is a systematic effect. In addition, two external
delay lines were used at the input path to compensate external cable mismatches and
insure a 180o phase difference between differential inputs.
The trigger signal for next_φearly and next_φlate are applied externally, and step
horizontal opening of the mask. Similarly, vertical opening is controlled by varying VH
and VL externally. A frequency counter records the average frequency of the error_out
signal, from which MER can be calculated using (5.1).
116
~50ps
40ps 80mV
Figure 5.13: Accumulated phase of the clock_out signal that verifies functionality of the divider and
phase rotator with 10GHz input clock
We first tested the functionality of the divider and the phase rotators by observing the
clock_out output signal on the oscilloscope. Figure 5.13 shows the accumulated phases of
the clock_out when a 10GHz clock is applied to the clock input. We trigger the next_φlate
signal by applying a 3MHz square wave pulse. Although the standard-cell CMOS dividers
slow down the clock transition, the accumulated phases correctly cover 50ps, which is
equivalent to half of the bit period of a 10GB/s signal.
The objective of this test is to verify the functionality of the main blocks in the data
path of the EOM. We apply a 10Gb/s PRBS input signal and add a 41ps peak-to-peak
sinusoidal jitter (SJ) to it to degrade the eye quality, as in Figure 5.14(a). The next_φearly
and next_φlate signals are stepped simultaneously. The vertical opening of the mask is
constant and is set to 120mV with external references. Figure 5.14(b) shows the measured
error_out signal. There is an error-free region (no toggle) for a small mask opening that
corresponds to when φearly and φlate are close to their initial positions in the center of the
117
φearly,14 φ0 φlate,14
(a)
20ps 150mv
(b)
next_φearly
next_φlate
(c)
next_φearly
next_φlate
Figure 5.14: Qualitative eye-opening measurement. (a)10Gb/s input eye diagram (b) error_out signal
demonstrates an error-free region (c) magnified error_out signal shows MER increase
for wider mask
eye. But as the trigger signals step the sampling phases toward the edges of the eye and
thus the mask gets wider, the error frequency gradually increases. This can be seen in
Figure 5.14(c), which is the magnified error_out signal around regions with error and
118
Figure 5.15: Measured eye opening for various input eye diagrams with different peak-to-peak jitter
shows the frequency of error_out signal increases after each positive trigger edge. The
periodic behavior of the error_out signal is due to the self-resetting mechanism of the
phases.
Ultimately, the EOM will be used in an adaptive equalizer as shown in Figure 5.1. In
such a setting, the EOM output should track variations of the eye opening and provide a
correct gradient to assist the optimization algorithm in adjusting the filter coefficients. We
verified the behavior of the EOM in this scenario by measuring the eye opening when
various amounts of peak-to-peak SJ are added to the 10Gb/s, 231-1 PRBS input. The
vertical opening of the mask is constant. Figure 5.15 shows the measurement result with
three sample input eye diagrams that demonstrate the gradual closing of the eye. As
119
expected, the measured eye opening monotonically decreases as additional jitter closes the
eye. At low input jitter, the transition from small to large measured MER is abrupt and
thus the plot loses accuracy, because the resolution of the horizontal eye-opening step
becomes comparable to the peak-to-peak input jitter. Therefore, when the sampling clocks
approach the data edge, one horizontal opening step can increase the number of transitions
falling inside the mask from zero to all the transitions.
Ideally, when VH and VL are both equal to the input data common mode the mask error
rate should be minimal because the comparators have the highest sensitivity. Therefore,
the vertical mask is swept for V H = V cm + n∆V and V L = V cm – n ∆V , where Vcm is data
common mode and 1 ≤ n ≤ N . N is seven if on-chip DAC is used but can be larger with
external reference adjustment. However, due to the comparators' offset, the minimum
error count may occur when V H ≤ V cm or V L ≥ V cm . To guarantee that all the horizontal
range is covered, the horizontal sweep is done for – N ≤ n ≤ N . We measured three sample
chips and we observed that the minimum mask error rate occurs at n=1 or 2 corresponding
to 5mV to 10mV differential offset.
120
Log(MER) [10-x]
-7
<10
Figure 5.16 illustrates the two-dimensional error diagram that is generated as the result
of the measurement for the input eye in Figure 5.14(a). It demonstrates that the
asymmetrical input eye shape is captured. Furthermore, the diagram has 68dB dynamic
range for MER. The dynamic range is a function of the time period for MER measurement
per one mask. A longer period of error-free measurement corresponds to a smaller MER.
It can be shown that MER is bounded by the input noise. We show in Appendix D that
the MER measured by an ideal EOM can be expressed as
1 – ( VH – VL )
MER ≅ Q ⎛⎝ ----------------------------------⎞⎠ (5.4)
2σ
for a signal amplitude of “1” and a noise standard deviation of σ. Equation (5.4) simplifies
to conventional BER expression when VH =VL. Amplitude noise is assumed to have Gaus-
sian distribution, and Q(.) is its cumulative distribution function. Due to the exponential
nature of the Q(.), expected MER is about four orders of magnitude larger than BER for
BER about 10-12.
A ( t ) – ( VH – VL ) A ( t ) + ( VH – VL )
MER ≅ Q ⎛ -----------------------------------------⎞ ⋅ ⎛ 1 – Q ⎛ -----------------------------------------⎞ ⎞ (5.5)
⎝ 2σ ⎠ ⎝ ⎝ 2σ ⎠⎠
to take the impact of the EOM into account. A(t) is the response of the comparator to the
input sequence at the time of sampling, t. As the sampling clocks, φearly or φlate are stepped
toward the edges of the eye diagram, A(t) approaches the threshold level and MER
increases as a consequence. A two-dimensional MER map can be obtained from (5.5) for
different sampling times and VH-VL based on the input and comparator response. We gen-
erated this error map using the simulation results of the comparator in our design and
compared it with the measured 2D error map in Figure 5.16. A two-dimensional cross-cor-
relation of the two maps resulted in a 0.9 correlation coefficient that verifies that our
measurement is closely following the expected result from the simulation.
Log(MER) [10-x]
-0.6
5
-0.7
0
-0.8
-5 -0.9
-1
-10
-1.1
20ps 40mv -15
0 0.5 1
horizontal mask size [UI]
(a) (b)
Sampling threshold [mV]
(c) (d)
Figure 5.17: Comparing EOM and BERT operations: (a) 12.5Gb/s input eye (b) MER measurement
with EOM in presence of 10% digital error (c) BER measured with commercial BERT
(d) BER measured with commercial BERT in presence of 10% digital error
of digital errors intentionally added to the input. A 12.5Gb/s 231-1 PRBS is passed through
5 feet of lossy coaxial cable to introduce ISI and is then applied to the input of the EOM or
BERT. The eye diagram is closed and is shown in Figure 5.17(a). Figure 5.17(c) is the
response of the BERT with about 18dB BER dynamic range. However, when 10% digital
error is added to the input, the BERT cannot capture the eye diagram shape although the
channel has not changed. Evidently, from Figure 5.17(d) the BERT has lost its error
dynamic range completely. Figure 5.17(b) is the response of the EOM in the presence of
the digital errors and demonstrates that the EOM successfully captures the shape of the
123
eye diagram as in Figure 5.17(c). The MER dynamic range is reduced to about 8dB due to
the reasons discussed in Section 5.6.5.
5.8 Summary
We have developed an architecture that can essentially capture a two-dimensional map
of the eye diagram of a high-speed data signal. The error map can be used to extract
various features of the received signal. Specifically, it can be used in an adaptive equalizer
to generate the cost function for coefficient optimization. The cost function will solely
depend on the quality of the received signal and not on the decision of the receiver.
The architecture is based on comparing two samples of the signal at one sampling
point and therefore does not require a priori knowledge of data sequence or pattern
matching, which remarkably simplifies the architecture. A prototype was implemented in
0.13µm standard CMOS technology that was successfully tested up to 12.5Gb/s input data
rate. It consumes about 275mA from a 1.2V supply that is significantly lower than prior
art.
124
Chapter
Instantaneous
6 Demultiplexing for
Burst-Mode Links
6.1 Introduction
The wireline communication link has either continuous transmission or burst-mode
transmission. In burst-mode communication, data is transmitted in asynchronous packets
(a.k.a. bursts), and there are long variable-length intervals between packet transmissions
when the transmitter is off [36]. An example application for burst-mode links is in passive
optical networks, e.g., fiber-to-the-home, where multiple end users (EU) share an optical
channel to the central office (CO). The CO assigns a time slot to each EU and allows each
EU to upload burst-mode data in a designated time slot.
Gated-oscillator clock recovery provides instantaneous lock to the first data transition.
Such circuits have been reported at several hundred Mb/s [128]–[132]. Gated-oscillator
125
clock recovery relies on two oscillators that are activated with the rising and falling edges
of the input signal and are, hence, resynchronized with every transition. The frequency of
the gated oscillators are equal to the bit rates and the right value is typically maintained via
another replica oscillator in a PLL.
In this chapter we introduce an alternative method for instantaneous data recovery and
demultiplexing based on a finite state machine (FSM). The FSM receives the data and
decides on the output values based on the current input data and the previous state. The
previous state is provided to the FSM input with a bit-period delay. While decisions are
synchronized with every incoming data transition, no oscillator is required. Although the
jitter transfer function is flat similar to the gated oscillator-based approach, there is a
reduction by a factor of n, the demultiplexing ratio, in output jitter due to the integrated
demultiplexer function.
We first introduce the new general architecture for an 1:n clockless demultiplexer and
discuss the complexity of the FSM for different demultiplexing ratios, n. Then, we
describe the design procedure for a 1:2 demultiplexer and discuss different possibilities for
implementing the delay cells. We perform a comprehensive statistical study on process
variations of passive delay cells and explore their feasibility for this application. Lastly,
we demonstrate the experimental results of the fabricated 1:2 demultiplexer prototype.
Figure 6.1 shows the general block diagram of the proposed clockless data recovery
and the 1:n demultiplexer. The FSM consists of a combinational-states logic block and
bit-period delay cells. It maps the combination of the input and previous state to a current
state. The previous state is the output of the FSM at the last bit period, Tb, and is fed back
to the input of the states logic block with a delay of Tb. The output logic block generates
126
the demultiplexed outputs based on the current state. Both logic blocks respond to their
inputs instantaneously. Therefore, each data transition at the input immediately affects the
outputs of the demultiplexer, if the logic gate propagation delay is neglected.
The delay cell in Figure 6.1 guarantees that the information of the previous state is
available whenever a data transition occurs, i.e., every bit period. The output is updated
for every data transition, and thus there is no explicit jitter rejection. However, the input
data jitter at any transition impacts only one of the n outputs. Moreover, each output has
1/n the data rate of the input. Therefore, effective output data jitter in unit intervals (UI),
i.e., normalized to nTb, is reduced to 1/n of the input data jitter.
127
Next, we demonstrate the design of a 1:2 demultiplexer based on the clockless data
recovery method.
Figure 6.2 illustrates the states and state transitions of the 1:2 clockless demultiplexer.
Each arrow corresponds to a state transition in FSM. The binary value on the arrow
represents the current value of the input. For the 1:2 demultiplexer, the FSM has a total of
eight states, as shown in (b). The FSM stays in each state for a period of Tb. Then it
transitions to the next state based on the input bit. The prime superscript in the state name
is equivalent to the select line of a conventional demultiplexer. States with prime
superscript correspond to the ones for which the input bit affects out2. The first subscript
128
in the state name is the current input bit, and the second subscript is the previous input bit
stored to hold the unaffected output. For example, when FSM is in s1,0 it corresponds to
the state when the input bit, “1,” is transferred to out1 and stored previous bit, “0,” is
transferred to out2. If after Tb, a data transition occurs and the input bit is “0,” FSM
transitions to s’0,1, for which the input bit, “0,” is now transferred to out2 and stored
previous bit, “1,” is transferred to out1. Table 6.1 summarizes the two output values for all
the eight states.
Current Current
out1 out2 out1 out2
State State
s0,0 0 0 s0,1 0 1
s’0,0 0 0 s’0,1 1 0
s1,0 1 0 s1,1 1 1
s’1,0 0 1 s’1,1 1 1
A 3-bit FSM represents the state diagram in Figure 6.2. Each state is assigned a 3-bit
code word. To avoid races, i.e., erroneous transitions to other states, the codes are assigned
such that only one bit changes in every state transition. Therefore, delay mismatches in the
implementation cannot cause errors, and the FSM is race free. The code words are
presented in Table 6.2.
Table 6.2: Race-free code assignment for the states of the FSM
We associate the binary variables y0, y1, and y2 to the three bits that code the FSM
states. Therefore, when y0, y1, and y2 are updated every Tb, we say FSM has transitioned
to the next state. The updated binary values for each of y0, y1, or y2 is determined from the
state diagram and the code word table based on the current values of y0, y1, or y2 and the
input bit. The next value of each of the three binary variables is described as
y i∗ = f i ( y 0, y 1, y 2, x ) ( i = 0, 1, 2 ) (6.1)
where “*” signifies the updated value of yi. Function fi is a logic function, and the argu-
ments are the current values of the binary variables. Variable x corresponds to the current
input bit. The logic functions are designed based on standard methods such as the sum of
products (SOP) using Karnaugh maps [133].
Similarly, out1 and out2 can be represented as functions of y0, y1, y2, and x. Based on
the concept of a conventional 1:2 demultiplexer we predict the logic function for the two
outputs as
where S is a binary function representing which output should change and is equivalent to
the select line of an ordinary demultiplexer. The “.” and “+” are logical AND and OR
functions, respectively. For instance, in (6.2), the next value for out1 is the current value of
out1 if S=1 and is the input if S=0. The change is synchronous with x.
From the Figure 6.2 and Table 6.2 we can show that
S = y0 ⋅ ( y1 ⊕ y2 ) + y0 ⋅ ( y1 ⊕ y2 ) (6.4)
where ⊕ is the “exclusive or” function. Additionally, we can show out1=y1 and out2=y0.
Therefore, the output logic block in Figure 6.1 is omitted and the outputs are tapped
directly from the FSM output variables. Hence, the simplified output functions are
out 1 = y 0 ⋅ ( y 1 ⊕ y 2 ) + x ⋅ ( y 0 + y 1 ⊕ y 2 ) (6.5)
130
out 2 = y 1 ⋅ ( y 0 ⊕ y 2 ) + x ⋅ ( y 1 + y 0 ⊕ y 2 ) . (6.6)
Equations (6.5) and (6.6) are computed directly from digital maps of Table 6.2 code
words. However, they have the form as in (6.2) and (6.3) and can be obtained directly by
replacing (6.4) in (6.2) or (6.3).
For n>2, one approach to design the demultiplexer is to follow the same procedure as
in Section 6.2.2. The number of states increases exponentially with n. The state transition
table should also be updated. Furthermore, the number of binary variables that encode the
state increases. For instance, for n=4, the FSM has 64 states that needs 6 bits for encoding
the states. Consequently, the SOP terms require implementation of 6-input gates that are
all working at the speed of data rate. An alternative approach is to use a cascaded chain of
1:2 demultiplexers to implement the 1:n demultiplexer. For example, if n=4 each output of
the first 1:2 demultiplexer is used as the input to a second 1:2 demultiplexer. Therefore, a
total of three 1:2 demultiplexers is used. The latter approach has a total of nine delay cells
in contrast to six delay cells in the former approach. Therefore, if the delay cells are
implemented using passive elements, the latter approach will have area disadvantage.
However, the combinational logic design is much simpler because of fewer number of
variables. In addition, the required speed of operation for the 1:2 demultiplexers decreases
monotonically as they are placed closer to the outputs in the chain because the data rate at
the output of each stage is divided by two. Therefore, alternative digital gate circuit
topologies such as complementary-MOS can be used for 1:2 demultiplexers at the end of
the chain that can reduce the power consumption significantly.
131
(a)
(b)
Figure 6.3: Demultiplexer outputs for 1011000010 input sequence (a) Ideal case (b) Delay cell has
smaller delay than bit period
If the delay cell has a delay T b′ , different from Tb, the outputs might experience
glitches. However, any input transition will immediately correct those glitches and avoid
any unwanted output transition. Figure 6.3(b) illustrates an example where T b′ < T b . As
can be seen, out1 in the second bit period is holding its previous value, “1.” After T b′ new
values for y0, y1, y2 are ready at the FSM input while the correct input, x, corresponding to
out1 has not arrived yet. out1 starts to follow the incorrect x. However, after
∆t = T b – T b′ , the next data transition arrives and out1, out2, y0, y1, and y2 are
immediately corrected because they relate to x with a combinational-logic relation such as
(6.5) or (6.6). Although glitches are observed at the outputs as a consequence, delay
mismatch is corrected every cycle and does not accumulate when data transitions occur.
out1
out2
y2
10.5 12 13.5
t[ns]
Figure 6.4: Outputs of the 1:2 demultiplexer when T b′ < T b simulated with HSPICE
133
corrects such errors. The other weaker bumps in the outputs are related to hazards that
occur when two or more terms in the SOP of output function are changing simultaneously,
while the overall output function remains at the same logic level.
The delay mismatch bounds the maximum number of consecutive identical bits (CIB)
at the input. If delay mismatch is ∆t, as in Figure 6.3(b), the number of CIB’s should be
fewer than n = T b ⁄ ∆t . Otherwise, the FSM will swap the outputs, and the (n+1)th bit
will be resolved at the incorrect output.
∆t can be made very small by using a delay control circuit that forces ∆t to zero. A ring
oscillator is formed by closing a positive feedback loop around a replica of the delay cells.
The period of oscillation equals twice the delay of the delay cells, 2T b′ . A PLL locks the
frequency of the ring oscillator to an accurate reference clock by tuning the replica delay
cell. The same control voltage is used to adjust the delays in the FSM. In practice, the
logic circuits in the FSM have propagation delays that contribute to the total delay around
the feedback loop of the FSM. Furthermore, process variations could significantly impact
the propagation delay of the gates. Therefore, a delay control loop that adjusts the delay
cells alone would be inconsequential. The replica ring oscillator must include all the
blocks that contribute to the delay.
For the 1:2 demultiplexer, it can be shown that in the absence of input transitions, f2
from (6.1) is simplified to
y 2∗ = y 2 . (6.7)
Equation (6.7) shows the y2 output inverts for every period delay around the FSM loop,
T b′ . In other words, the y2 output oscillates with the period of 2T b′ when the input is con-
stant. In fact, y2 acts as the internal timer of the FSM in the absence of input transitions.
The y2 output in Figure 6.4 demonstrates this. When input is zero, y2 oscillates for four
cycles. As soon as the data transition arrives in the fifth cycle, the y2 phase is aligned and
the oscillation stops. If there were no additional input transitions y2 would oscillate again
with corrected phase. Now, the replica of the 1:2 demultiplexer with y2 as the output forms
134
(b)
replica FSM as
ring oscillator
the ring oscillator in the delay control loop. This ring oscillator includes all the digital
blocks that contribute to the delay. Figure 6.5 illustrates the architecture. The same archi-
tecture could be used to design a variable bit rate demultiplexer by adjusting the delay
around the loop based on the input bit rate.
The delay block can be implemented using active or passive delay elements. If the
delay control loop is not used, passive delay cells based on LC ladder structures can be
used, as shown in Figure 6.6 (also in Section 2.3.4). The delay is determined by the value
of the passive components from T D = n LC where L and C are the inductance and
capacitance, respectively, and n is the number of sections in the ladder. Integrated LC
delay lines have practically no sensitivity to the supply voltage while maintaining a low
sensitivity to process variations and temperature. This is because the L and C component
135
L L L
C/2 C C C/2
values are primarily determined by high-accuracy fabrication processes, and they will not
vary after fabrication is complete. In contrast the delay of active delay cells is a strong
function of temperature.
In addition, it has been shown [134] that using building blocks that depend only on the
lateral dimensions, such as vertical parallel plate (VPP) capacitors, one can achieve even a
tighter tolerance and better matching across the chip, wafer, and process lots for the
capacitance value. If VPP capacitors and spiral inductors are used to implement the delay
cell, the delay value will only depend on lateral dimensions of components. Lateral
dimensions are defined by lithography and etching processes that have inherently higher
accuracy than process steps such as deposition and planarization that control the vertical
dimensions. We will present a statistical analysis of passive delay lines that were used to
implement the 1:2 demultiplexer prototype.
T D = n LC (6.8)
where n is the number of LC sections. Using spiral inductors and high-density VPP or
MIM capacitors one can obtain large delay values. Using the image impedance tech-
niques, we can calculate the impedance of the line to be
136
L L L
L L L
one section
2 2
L LCω LCω
Z( ω) = ---- ⋅ 1 – -------------- = Z 0 ⋅ 1 – -------------- (6.9)
C 4 4
where Z 0 = L ⁄ C is the characteristic impedance of the line [50]. As can be seen, the
impedance becomes imaginary for frequencies above a critical frequency given by
2
ω c = ----------- . (6.10)
LC
The LC delay line can be designed in a differential form as shown in Figure 6.7. In
such circuits, the differential inductors can be interwound in order to benefit from the
mutual inductance of the two inductors. Therefore, larger value inductances will be
achievable with the same (or even smaller) area/size. It can be shown that if two equal
differential inductors with value L are interwound with mutual inductance of k (with
proper sign), the effective inductance value for each will be (1+k)L. We have taken
advantage of this fact in our implementation of the delay lines. In the next two sections we
measure several integrated passive delay lines and analyze the experimental results to
study the process variation of passive delay lines
Two sets of LC delay lines are implemented in the form of differential constant-k
filters in a 5-metal SiGe BiCMOS process in two different process runs. We will refer to
these two process runs by PR1 and PR2. The differential inductors are implemented using
coupled inductors and have 1.25 interwound turns in the top metal. Figure 6.8 shows the
137
Figure 6.8: Differential symmetric interwound inductors for one section of the delay line
symmetric layout of the inductors. Inductors are simulated using a 2.5D electromagnetic
simulator.
The first set of delay lines use MIM capacitors and consist of 24 LC sections in PR1.
In the second set, the VPP capacitors are used instead of the MIMs. It has 19 LC sections
and was fabricated in PR2. Based on our earlier discussion, we expect this VPP-based
delay line to show smaller delay variations compared to its MIM-based counterpart. In
VPP capacitors, the distance of the adjacent parallel plates of the capacitors are chosen to
be larger than the minimum allowable spacing between adjacent metals to reduce the
effect of lateral surface roughness on the capacitor value. The increased fringe capacitance
is modelled accurately with electromagnetic simulations. Table 6.3 summarizes the delay
line parameters.
Standalone delay structures using MIMs and VPPs with direct on-wafer probing were
tested. The results are summarized in the following sub-sections.
Twenty-seven MIM-based delay lines in PR1 and 47 VPP-based delay lines in PR2
were characterized using an Agilent Technologies E8364A network analyzer. To ensure
constant environmental conditions (including temperature and measurement setup
variations) during the measurement of all 74 sites, a set of preliminary experiments was
performed. Six random sites were selected as witness cases and were measured three times
each at different times during the measurement. Then, the results for each site were
compared. The observed variations were always less than 0.05% indicating the
measurement error and the degree of its repeatability. This very high repeatability of
results indicates minimum changes in the conditions of the experiments.
6.4.3.2 S-parameters
Magnitude of S11 and S21 parameters of MIM-based and VPP-based delay lines were
measured. A sample result for a MIM-based delay line, plotted in Figure 6.9, shows S11 <
-12dB (upto 30 GHz). Similar measurements for VPP-based delay lines show S11 < -16dB
0 S21
S11
-10
Amplitude [dB]
-20
-30
-40
-50
0.1 1 10 100
f [GHz]
Figure 6.9: Magnitude of the S-parameters of one MIM-based standalone delay line
139
1.5GHz
( ≤ f ≤ 20GH). They indicate that the delay line characteristic impedances are very
close to 50 Ω over that wide range of frequencies. Τhe low frequency loss of MIM-based
delay line is 1.2 dB and its 3dB bandwidth is 7.5 GHz.
The group delay is an indication of the delay value of the delay line at different
frequencies. The group delays of the whole ensemble for both MIM-based and VPP-based
lines are plotted in Figure 6.10 and Figure 6.11, respectively. The dominant source of
variations over different wafer sites for samples in MIM-based lines is the tolerance of
2
MIM capacitors. The reported MIM tolerance in this process technology is± 0.15fF ⁄ µm . It
translates to a total tolerance of ∆C=18.8 fF for the MIMs that we used. The time delay
variations per section can be approximated from (6.8)
∂T D
∆T D = ⋅ ∆C (6.11)
∂C
∆T D
----------- = 1--- ⋅ ∆C
-------- = 0.04 . (6.12)
TD 2 C
group delay[s]
n
20
15
10
0 5 f[GHz]
group delay[s]
n
20
10 15
0 5
f[GHz]
The normalized standard deviations of group delay (normalized to the mean group
delay at corresponding frequency) for MIM-based and VPP-based lines are plotted in
Figure 6.12. The variations for MIM-based lines are within the tolerance of the MIM
capacitors in (6.12). The delay lines with VPP capacitors are almost twice as accurate
across most of the frequency range. This corresponds to a factor of 3.3 improved tolerance
for the VPPs in agreement with [134]. Table 6.4 compares the average low-frequency
Normalized Standard Deviation
4%
3%
MIM-based line
2% VPP-based line
1%
f [Hz]
Figure 6.12: Normalized standard deviations for group delays of standalone delay lines
141
group delays and the average normalized standard deviations of that in both cases. Again,
it can be seen that the VPP-based delay lines are almost twice as accurate. Figure 6.13
shows the distribution of normalized delay at 1 GHz for both MIM- and VPP-based delay
lines. Passive LC delay lines are low sensitivity to process variations and no sensitivity to
supply variations.
Table 6.4: Statistical comparison for MIM and VPP-based lines
Standard
Parameter Mean (η) σ/η
Deviation (σ)
MIM low freq. group delay 56.7 ps 0.572 ps 1.01%
VPP low freq. group delay 52.14 ps 0.306 ps 0.59%
The die photos of the VPP-based line and MIM-based line are shown in Figure 6.14
and Figure 6.15, respectively. The passive delay lines are dominating the area. The spiral
inductors are formed in a loop in the oscillator to avoid long interconnect lines. The
capacitors are located in between the inductors. Inductor size is 150µm × 150µm .
test structure
delay line
buffer buffer
buffer
Figure 6.17: Die microphotograph of the 1:2 demultiplexer with three 5-section differential LC delay
line
which forces the out to high logic level. The AND gates are implemented using OR gates
by inverting the inputs and output. A 5-section differential LC delay line is implemented
for each of the delay cells in the feedback loop of binary functions y0, y1, and y2. The die
photograph is shown in Figure 6.17. Chip dimensions are 2.5mm × 1.7mm .The core logic
occupies only 11% of the total die area.
Figure 6.18 is the y2 output of the demultiplexer when the input is a constant “1.” As
mentioned, y2 oscillates with a period equal to twice the total delay around the FSM
feedback loop, 266ps. Therefore, the delay (bit period) is 133ps, and the demultiplexer
70mV
100ps
266ps
out1
out2
272ps 272ps
(a)
50mV
200ps
out1
274ps 822ps
(b)
out2
50mV
500ps
out1
274ps
out2
(c)
50mV
500ps
Figure 6.19: Demultiplexer outputs out1 and out2 for 3 input sequences (a)1100 (b)10000000 (c)
1000000010001000
works at input bit rate of 7.5 Gb/s. About 55% of the total delay is generated by the
passive delay lines and the rest is from the ECL gates and interconnect parasitics.
The two outputs of the demultiplexer, out1 and out2, are measured for three different
input sequences, as shown in Figure 6.19. The “sync” signal from the input signal source
145
is used to trigger the sampling oscilloscope for viewing the outputs and justifies that the
outputs are synchronized with the input. When the input is a repeating “1100” sequence,
the demultiplexed outputs should both be “10” sequence repeating at half the bit rate, i.e.,
twice the bit period, of the input. In addition, out2 is one input bit period delayed with
respect to out1. Figure 6.19(a) shows these outputs.
Figure 6.19(b) shows the outputs when the input sequence is “10000000,” which has
seven consecutive zeros. The two outputs are respectively “1000” at half bit rate and
all-zero. The droop for long sequences of one is due to ac-coupled outputs, which will not
be present in a dc-coupled version. A longer sequence, i.e., 16 bit, is tested in
Figure 6.19(c). The input sequence, “1000000010001000,” results in the outputs
“10001010” at half the bit rate and all-zero. The demultiplexer is locking to the input
phase and correctly demultiplexes to two outputs without using a synchronous clock. The
chip is using a 3.3V power supply and draws 316mA of current, of which 110mA is
flowing in the output buffers and bias circuits.
6.6 Summary
We introduced a new architecture that instantaneously recovers and demultiplexes data
without explicit clock recovery. The architecture is based on a finite state machine (FSM)
that assigns input to a proper output and maintains the value of other outputs. State
transitions are synchronized with the arrival of input-data transitions. Binary logic
functions map the current state along with the input bit to the next state. Analog delay cells
with bit-period delay feedback the value of the binary functions to the input and
synchronize FSM with the input data.
One approach to implement the delay cells in the architecture is to use passive delay
cells that are low sensitive to process and temperature variations. We performed an
experimental statistical analysis on passive delay lines to demonstrate this fact. We then
showed the measurement results of a prototype 1:2 demultiplexer based on integrated
passive LC delay lines that operates at 7.5Gb/s without using a clock signal.
147
Chapter
Conclusion
7
7.1 Thesis Highlights
In this work we explored the analysis and design of wireline communication systems
and focused on the basic challenges of high-speed wireline links. We bridged the gap
between system design and circuit design by: (1) understanding the relationship between
the wireline link reliability and the system parameters, (2) introducing practical circuit
architectures that enable realization of such systems with the required parameters, and (3)
demonstrating implementation of hardware prototypes using silicon-based integrated
circuit technologies that verify our solutions.
suffer from large parasitic components and thus small maximum frequency of operation.
This methodology was based on two-port broadband matching of multistage amplifiers.
We absorbed the device parasitics into the passive matching networks in order to allow
each amplifier stage to achieve its theoretical maximum gain-bandwidth product set by the
Bode-Fano limit [26][27]. We demonstrated a CMOS 0.18µm amplifier that operates at
10Gb/s and achieves 2.4 times the bandwidth improvement over a design that does not
apply our technique.
is initiated exactly in-phase with the first bit and continues synchronously to the stream.
We implemented a 1:2 clockless demultiplexer based on this concept in SiGe BiCMOS
technology and verified its operation at 7.5Gb/s.
This work relates the time response of an arbitrary LTI system to the ISI that results
from that system. Therefore, the analytical results for calculating the BER, in general, and
the data-dependent jitter, in particular, are based on the time response of the system. From
a circuit design perspective, it is also interesting and useful to derive such analytical
results for the frequency response, i.e., both the amplitude response and the phase
response, of the system. The relationship between the circuit parameters and the frequency
response are conventionally studied more rigorously and are better documented, e.g., the
theory of poles and zeros or the tables of filter design. For instance, the design of an
150
Two important topics related to the proposed eye-opening monitor (EOM) circuit that
require further investigations are the optimization algorithm and the loop dynamics of an
adaptive equalizer that uses the EOM. The algorithm can be chosen freely because it is
separate from the hardware of the filter. However, convergence speed is a practical
constraint that may eliminate probabilistic algorithms such as the genetic algorithm or the
simulated annealing algorithm. On the other hand, the cost function, i.e, the EOM output,
does not have a known direct relationship with the optimizing parameters, i.e., the filter
coefficients. Therefore, application of gradient descent-based algorithms is not straight
forward. As a result, understanding the trade-offs for the choice of the algorithm is a topic
of interest for future research in this area.
Possible future directions for enhancing the design of the proposed instantaneous
demultiplexer include targeting accurate delay implementation and low-power design.
Absolute value of the feedback loop delay is the only parameter that determines the
operating bit rate of the demultiplexer. The loop delay includes the delay of the feedback
delay cells, the propagation delay of the combinational logic, and the delay of the
interconnect parasitic components. The optimum way to adjust the loop delay and
accurately control the delay value is to form a delay reference loop by a replica of the
151
demultiplexer that is configured as a ring oscillator. The oscillating frequency of this ring
oscillator is determined by the same parameters that set the delay value of the loop.
Consequently, if the frequency is adjusted accurately in the delay reference loop, the
feedback delay will be determined with the same accuracy. Finally, the demultiplexer
design was not optimized for minimum power consumption. The power consumption also
affects the heat generated by the demultiplexer chip that sets the local temperature around
the active devices and impacts their propagation delay. Therefore, controlling the power
consumption can improve the operation of the block.
All in all, the thesis provides insight and develops useful tools and techniques for
designing high-speed wireline communication systems using integrated circuit
technologies.
152
Appendix
Overall BER
A Calculation
In this appendix we calculate the overall BER when both timing jitter and ISI are
present. To calculate the probability of error for the current bit that is being sampled, we
have to take into account the value of the next bit and the previous bit. This is in order to
add the impact of the timing jitter of the transitions before or after the current sampling on
the BER. Therefore, we should consider 3-bit sequences, where the middle bit is the one
being sampled. Out of the eight possibilities, we only need to calculate the BER for four
sequences of “000,” “001,” “100,” and “101.” Each of the other four cases, where the
middle bit is “1,” equals one of the sequences with “0” as the middle bit because of the
symmetry and is thus found automatically. Therefore, we have
1
BER ( T s ) = --- [ BER ( T s ″000″ ) + BER ( T s ″001″ ) + BER ( T s ″100″ ) + BER ( T s ″101″ ) ] . (A.1)
4
The error for the first term on the right is caused only by the ISI and noise because there is
no transition. Hence, we can write
In the second sequence, “001,” one transition occurs to the next bit. Because of the
timing jitter the transition can occur before or after the sampling point, Ts. If the transition
occurs after the sampling point, the BER is not affected by it because we assume the
system is causal. On the other hand, if the transition takes place before the sampling point,
the receiver samples ISI0(Ts)+s(Ts-tR). The random variable tR denotes the location of the
transition on the right of the current bit. It has a mean value equal to Tb. Therefore, the
overall BER for the “001” sequence can be found as a conditional probability conditioned
on tR as
153
⎧
⎪ BER ( ISI 0 ( T s ) ) tR ≥ Ts
BER ( T s ″001″ ) = ⎨ . (A.3)
⎪ BER ( ISI 0 ( T s ) + s ( T s – t R ) ) t R < T s
⎩
∞ Ts
We can calculate the BER for the “100” sequence similarly. However, the location of
the transition only affects the BER if it occurs before the sampling point, because it
changes the amount of the ISI1 at the sampling point. We have
⎧
⎪ BER ( ISI 1 ( T s – tL ) ) t L < T s
BER ( T s ″100″ ) = ⎨ (A.5)
⎪ BER ( ISI 1 ( T s ) + s ( T s ) ) t L ≥ T s
⎩
∞ Ts
For the “101” sequence we have both the left and right transitions. However, the right
transition does not impact the BER if tR>Ts because the system is causal. In that case, the
BER is equivalent to the BER for the “100” sequence. On the other hand, if tR<Ts, we also
implicitly know that tL<Ts. We can write
⎧
⎪ BER ( T s ″100″ ) t R > Ts
BER ( T s ″101″ ) = ⎨ . (A.7)
⎪ BER ( T s ″100″, t L < t R < T s ) t R < T s
⎩
Therefore, we have
∞ Ts T
⎛ s ⎞
BER ( T s ″101″ ) = BER ( T s ″100″ ) ∫ ft ( t R ) dtR + ∫ f t ( t R ) ⋅ ⎜ ∫ ft ( t L ) ⋅ BER ( ISI1 ( T s – t L ) + s ( T s – t R ) ) dtL⎟ dtR (A.8)
⎜ ⎟
T –∞ ⎝ –∞ ⎠
s
We assume the timing jitter distribution is Gaussian with means of zero and Tb, for tL
and tR, respectively, and standard deviation of σj. In addition, we assume the noise
distribution is Gaussian with zero mean and standard deviation σn. Therefore, all the BER
154
terms in the above equations will be in the form of a Q(.) function, where Q(.) is the
cumulative distribution function of the Gaussian distribution. We can approximate some
of the BER terms in (A.4), (A.6), and (A.8) by one. This applies to all the terms where the
argument of the BER, i.e., the argument of the Q(.), is large due to the effect of the step
response, e.g., in BER(ISI0(Ts)+s(Ts-tR)). Then, we estimate the overall BER by replacing
(A.2), (A.4), (A.6), and (A.8) in (A.1). We get
BER ( T s )=
Ts
⎛ 0.5 – ISI1 ( T s – tL )⎞ ⎞⎟ ⎛ . (A.9)
1 ⎜ ⎛ 0.5 – ISI 0 ( T s )⎞ T s – T b⎞ ⎞ T T b – T s⎞
--- Q ---------------------------------- + ∫ f t ( t L ) ⋅ Q ⎛ -----------------------------------------
- dt L ⋅ 1 + Q ⎛ ----------------
- + Q ⎛ -----s⎞ + Q ⎛ ----------------
-
4 ⎜ ⎝ σn ⎠ ⎝ σn ⎠ ⎟ ⎝ ⎝ σj ⎠ ⎠ ⎝ σj ⎠ ⎝ σj ⎠
⎝ –∞ ⎠
We have also neglected all the second-order terms that include products of two Q(.)
functions.
In reality, the total jitter distribution should also include the effect of the DDJ, as we
discussed in Chapter 3. Here, we investigate how the DDJ affects each of the terms in
(A.1). The DDJ does not have any impact on the BER(Ts| “000”) because “000” has no
transitions. In the “001” sequence, there is a “01” transition with a “0” as the penultimate
bit. Therefore, ft(tR) should be modified to a Gaussian with the mean of Tb+tc,0. The tc,0 is
defined in (3.22) and is calculated in Appendix B.
We do not have the knowledge of the penultimate bit for the transition in “100,” in
contrast to the “001” case. Therefore, to calculate BER(Ts| “100”), the previously Gaussian
distribution for ft(tL) should be modified by convolving it with a double Dirac delta
function DDJ distribution. Finally, both the ft(tR) and ft(tL) should be modified to calculate
BER(Ts| “101”). The ft(tR) distribution becomes a Gaussian with the mean of Tb+tc,1,
because the penultimate bit to the “01” transition is now “1.” The tc,1 is defined in (3.23)
and is calculated in Appendix B. The ft(tL) distribution convolves with a double Dirac
delta function DDJ distribution. Therefore, the resulting overall BER is
155
BER ( T s )=
1 ⎧ ⎛ ⎛ 0.5 – ISI 0 ( T s )⎞ ⎛ T s – T b – t c, 0 T b + t c, 0 – T s
--- ⎨ Q ---------------------------------- ⋅ 1 + Q ⎛ -------------------------------⎞ ⎞ + Q ⎛ -------------------------------
-⎞
4⎩⎝ ⎝ σn ⎠ ⎝ ⎝ σj ⎠⎠ ⎝ σj ⎠
.(A.10)
Ts ⎫
0.5 – ISI 1 ( T s – tL ) T s – T b – t c, 1 T s – t c, 0 T s – t c, 1 ⎪
⎛ ----------------------------------------- ⎞ dt ⋅ ⎛ 1 + Q ⎛ -------------------------------⎞ ⎞ + 1--- Q ⎛ ---------------- ⎞ + 1--- Q ⎛ ---------------- ⎞
+ ∫ t L ⎝
f ( t ) ⋅ Q
σn
-
⎠ L ⎝ ⎝ σj ⎠ ⎠ 2 ⎝ σj ⎠ 2 ⎝ σj ⎠ ⎬
- -
⎪
–∞ ⎭
156
Appendix
Threshold-
B Crossing Time
In this appendix we calculate tc,0 and tc,1, defined in (3.22) and (3.23), for a first-order
system. We can show
t 0 = τ ⋅ ln 2 . (B.1)
We also have
–2
⎛ – k⎞
∆t a = 0 = – τ ⋅ ln ⎜⎜ 1 – ( 1 – α ) ∑
a k' ⋅ α ⎟
⎟ (B.2)
–2
⎝ k' = – ∞ ⎠
–2
⎛ ⎞
(1 – α) 2
∆t a = 1 = – τ ⋅ ln ⎜⎜ 1 – ----------------- α + α
α
∑
a k' ⋅ α
–k
⎟
⎟
–2
⎝ k' = – ∞ ⎠
–2
⎛ – k⎞
= – τ ⋅ ln ⎜⎜ 1 – α + α – ( 1 – α )
2
∑
a k' ⋅ α ⎟
⎟
(B.3)
⎝ k' = – ∞ ⎠
–2
⎛ – k⎞
2 ⎜ ⎛ 1 – α ⎞
= – τ ⋅ ln ( 1 – α + α ) + ( – τ ) ⋅ ln ⎜ 1 – ⎝ -----------------------⎠
2
∑ a k' ⋅ α ⎟
⎟ .
⎝ 1 – α + α k' = – ∞ ⎠
–2 –3
∑ ∑
–k 2 –k
Φ≡ ak ⋅ α = a–2 ⋅ α + ak ⋅ α . (B.4)
k = –∞ k = –∞
After reorganizing the terms in the sum and renumbering the indices we have
157
–2
∑
2 –k
Φ = a–2 ⋅ α + α a k' ⋅ α (B.5)
k' = – ∞
where k' = k + 1 . As the ak’s are independent identically distributed (iid) random vari-
ables, the sum in the second term on the right is, by definition in (B.4), a random variable
with identical statistical properties to Φ. Specifically, all the statistical moments are equal
for the two random variables. If we denote this new random variable by Φ' , we have
2
Φ = a –2 ⋅ α + αΦ' . (B.6)
Also, note that Φ' and a-2 are independent random variables. Now we can write
⎧ 2 ⎫ 2
E { Φ } = E ⎨ a – 2 ⋅ α + αΦ' ⎬ = α E { a – 2 } + αE { Φ' } . (B.7)
⎩ ⎭
p ( a k = 1 ) = p ( a k = 0 ) = 0.5 . (B.8)
Then, we have
E { a – 2 } = 1 ⁄ 2 × 0 + 1 ⁄ 2 × 1 = 1 ⁄ 2. (B.9)
2
1 α
E { Φ } = --- ⋅ ------------ . (B.10)
2 1–α
The second-order moment can be calculated from (B.6) as follows
2 4 3 2 2
E { Φ } = α ⋅ m 2 + 2α m 1 E { Φ } + α E { Φ } (B.11)
4
2 1 4 3 0.5α
E { Φ } = --------------- ( α ⋅ m 2 + 2α m 1 E { Φ } ) = -------------------------------------- (B.12)
2 2
1–α (1 – α )(1 – α )
in which mi is the ith order moment of a-2 and E{Φ} is known from (B.10). It is easy to
show mi=1/2 for all i. Similarly, the kth order moment can be written as follows
158
k–1
⎛ k ⎞ 2k – i
∑
0.5 i
k
E { Φ } = --------------- ⎜ ⎟α E{Φ } . (B.13)
1–α
k ⎝ i ⎠
i=0
This gives a recursive expression based on lower-order moments. Now, we can calculate
⎧ ⎫
E ⎨ ∆t a = 0 ⎬ = E { – τ ⋅ ln ( 1 – ( 1 – α ) Φ ) } (B.14)
⎩ –2 ⎭
⎧ ⎫ ⎧ 1–α ⎫
= – τ ⋅ ln ( 1 – α + α ) + E ⎨ ln ⎛ 1 – ⎛ -----------------------⎞ Φ⎞ ⎬ .
2
E ⎨ ∆t
a–2 = 1 ⎬
(B.15)
⎝ ⎝ 2⎠ ⎠
⎩ ⎭ ⎩ 1 –α + α ⎭
2
α
From (B.4) we know Φ ≤ ------------ where the maximum occurs when all ak’s are “1.” In addi-
1–α
tion, Tb/τ is positive and so α ≤ 1 . Therefore,
2
α 2
( 1 – α )Φ ≤ ------------ ⋅ ( 1 – α ) = α ≤ 1 . (B.16)
1–α
Similarly,
2 2
1 – α -⎞
⎛ ---------------------- α 1–α α
Φ ≤ ------------ ⋅ ⎛ -----------------------⎞ = ----------------------- ≤ 1 . (B.17)
⎝ 2 ⎠ 1 – α 1 –α + α2⎝ ⎠ 2
1 –α + α 1 –α + α
Hence, we can use the Taylor series expansion of the natural logarithm to estimate (B.14)
and (B.15). We have
∞
k
∑
x-----
ln ( 1 – x ) ≅ – . (B.18)
k
k=1
Therefore,
∞
⎧ ⎫ k
∑ --k- ( 1 – α ) E { Φ }
1 k
E ⎨ ∆t a = 0 ⎬ = τ ⋅ (B.19)
⎩ –2 ⎭
k=1
159
∞
⎧ ⎫ 1 – α -⎞ k
1--- ⎛ ----------------------
∑
2 k
E ⎨ ∆t a = 1 ⎬ = – τ ⋅ ln ( 1 – α + α ) + τ ⋅ E { Φ }. (B.20)
⎩ –2 ⎭ k ⎝ 1 – α + α 2⎠
k=1
We can approximate (B.19) and (B.20) by neglecting all the moments of Φ for k>2
because we can show that the kth moment is proportional to the kth power of α and thus
shrinks exponentially. Then we have
4
⎧ ⎫ 1 2 2 τ 2 0.5α
E ⎨ ∆t a = 0 ⎬ ≅ τ ⋅ ( 1 – α )E { Φ } + --- ( 1 – α ) E { Φ } = --- ⋅ α + -------------- (B.21)
⎩ –2 ⎭ 2 2 1+α
⎧ ⎫ 1–α 1 1–α 2
E ⎨ ∆t a = 0 ⎬ ≅ τ ⋅ ⎛⎝ -----------------------⎞⎠ E { Φ } + --- ⎛⎝ -----------------------⎞⎠ E { Φ } – τ ⋅ ln ( 1 – α + α )
2 2
2 2 1 –α + α 2
⎩ –2 ⎭ 1 –α + α
(B.22)
.
2 4 5
τ α + 0.5 α + α 2
= --- ⋅ ----------------------------------------------------- – τ ⋅ ln ( 1 – α + α ) .
3
2 ( 1 + α ) ⋅ ( 1 –α + α ) 2
Finally, tc,0 and tc,1 can be found by replacing (B.1), (B.21), and (B.22) in (3.22) and
(3.23).
160
Appendix
Impedance Function
C
An impedance function is a rational function (ratio of two polynomials with real
coefficients) of frequency with no right half-plane poles. Additionally, the numerator
polynomial should be of at most one degree higher than the denominator one. The
conditions for an impedance function can be found in [26][96]. The upper-bound in (4.2)
is not valid if the load does not satisfy the conditions of an impedance function. In other
words, if the overall transfer function of an amplifier is of the form:
Av ( jω ) = g m ⋅ Z ( jω ) (C.1)
and Z ( jω ) is not an impedance function, then the Bode-Fano limit need not be satisfied.
Distributing passive structures between gain stages can result in overall transfer functions
that are not impedance functions per se [23]. Therefore, the GBW product can potentially
be higher than the limit in (4.2). One design approach for such a structure is stagger tuning
of the frequency responses. An early amplitude roll-off due to a low-frequency pole in one
stage can be compensated for with a peaking in the next stage. Similarly, the overall phase
response of passive structures can be properly controlled.
161
Appendix
Mask Error Rate
D
We assume that the amplitude noise cumulative distribution function is Q(.). The probabil-
ity of occurring a mask error is
MER = Pr { S H ≠ S L } , (D.1)
where Pr{.} denotes probability and SH and SL are defined in Section 5.3. SH and SL take
binary values and thus there are two combinations that contribute to (D.1). The condi-
tional probability of each of the combinations can be calculated, given the input bit to the
EOM. Therefore we have
1 ⎛ ⎧ ⎫ ⎧ ⎫
MER = --- ⋅ ⎜ Pr ⎨ SH = 0 S L = 1 in = 1 + z ⎬ + Pr ⎨ SH = 0 S L = 1 in = z ⎬ , (D.2)
2 ⎝ ⎩ ⎭ ⎩ ⎭
⎧ ⎫ ⎧ ⎫⎞
+ Pr ⎨ S H = 1 S L = 0 in = z ⎬ + Pr ⎨ S H = 1 SL = 0 in = 1 + z ⎬⎟
⎩ ⎭ ⎩ ⎭⎠
where z is the input noise at the sampling point. When the EOM is ideal, only input noise
impacts SH and SL values. The last two terms in (D.2) will be identically zero because they
both imply V H < V L . The first two terms both equal the probability that
1 – ( VH – V L ) 1 + ( VH – VL )
---------------------------------- < z < ---------------------------------
-. (D.3)
2 2
Hence, we can write
1 – ( VH – VL ) 1 + ( VH – VL ) 1 – ( VH – VL )
MER = Q ⎛⎝ ----------------------------------⎞⎠ – Q ⎛⎝ ----------------------------------⎞⎠ ≅ Q ⎛⎝ ----------------------------------⎞⎠ . (D.4)
2σ 2σ 2σ
When the impact of EOM is considered, the last two terms in (D.2) are not identical to
zero anymore because SH and SL are the output of two comparators with different noise
162
contribution. However, we can still neglect them in MER calculation for reasonable noise
levels in the comparators. The first term in (D.2) can be written as
⎧ ⎫ ⎧ ⎫ ⎧ ⎫
Pr ⎨ S H = 0 S L = 1 in = 1 + z ⎬ = Pr ⎨ S H = 0 1 + z ⎬ ⋅ Pr ⎨ S L = 1 1 + z ⎬ . (D.5)
⎩ ⎭ ⎩ ⎭ ⎩ ⎭
Because of the bandwidth limitations of the comparators, the probabilities on the right side
of (D.5) are functions of the sampling time and are smaller when sampling time is closer
to the data edge. If the response of the comparators to the input is denoted by y(t) we have
y(t) = ∑ Ai ( t ) . (D.6)
i
We define A i (t) as the response of the comparators to the input in a unit interval,
( i – 1 ) ⋅ T b < t < i ⋅ Tb , where Tb is the bit period. Several Ai(t) exist due to various combi-
nations of symbols that cause ISI. The overlap of Ai(t)’s for all i when transformed to
0 < t < T b is the eye diagram. If we limit ISI to the last n symbols only 2n distinct Ai(t)
could be achieved for a binary modulation. Then from (D.5) we can write
n
2
Ai ( t ) – ( VH – VL ) Ai ( t ) + ( V H – V L )
∑ Q ⎛ -------------------------------------------⎞ ⋅ ⎛ 1 – Q ⎛ -------------------------------------------⎞ ⎞ .
1
MER ≅ ----- (D.7)
n ⎝ 2σ ⎠ ⎝ ⎝ 2σ ⎠⎠
2
i=1
A ( t ) – ( VH – VL ) A ( t ) + ( VH – V L )
MER ≅ Q ⎛⎝ -----------------------------------------⎞⎠ ⋅ ⎛⎝ 1 – Q ⎛⎝ -----------------------------------------⎞⎠ ⎞⎠ , (D.8)
2σ 2σ
where the sum is implicit in the notation. Equation (D.7) can be used to generate a 2D map
of the MER.
163
Bibliography
[1] http://www.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm.
[2] http://www.internetworldstats.com.
[5] http://www.ieee802.org/3/.
[6] http://www.intel.com/products/processor/index.htm.
[7] http://www.cisco.com/en/US/netsol/ns340/ns394/ns259/ns261/networking_solutions_
white_paper09186a00800c464f.shtml.
[8] http://www.rambus.com/downloads/Networking_Backgrounder.pdf.
[10] http://www.corning.com/docs/opticalfiber/CO9562.pdf.
[12] G. E. Moore, “No Exponential Is Forever: But “Forever” Can Be Delayed,” IEEE
International Solid-State Circuits Conference Digest of Technical Papers,
(ISSCC'03), pp. 20-23, Feb. 2003.
[13] http://www.itrs.net/Common/2004Update/2004Update.htm.
[14] http://www.itrs.net/Common/2004Update/2004_04_Wireless.pdf.
164
[25] B. Analui and A. Hajimiri, “Method and Apparatus for a Multi-Pole Bandwidth
Enhancement Technique for Wideband Amplification,” U.S. Patent #6,778,017.
165
[26] H. Bode, Network Analysis and Feedback Amplifier Design, D. Van Nostrand
company, Princeton, 1945.
[32] B. Analui and A. Hajimiri, “System and Method for Clockless Data Recovery,”
U.S. and PCT Patents Pending.
[46] J. Buckwalter and A. Hajimiri, “An active analog delay and the delay reference
loop,” IEEE Radio Frequency Integrated Circuits (RFIC) Symposium Digest of
Papers, pp. 17-20, June 2004.
[50] D.M. Pozar, Microwave Engineering, second edition, John Wiley & Sons, New
York, 1998.
[52] S. Gondi and B. Razavi, “A 10Gb/s CMOS Adaptive Equalizer for Backplane
Applications,” IEEE International Solid-State Circuits Conference Digest of
Technical Papers, (ISSCC'05), pp. 328-329, Feb. 2005.
[54] H. C. van den Elzen, “On the Theory and the Calculation of Worst-Case Eye
Openings in Data-Transmission Systems,” Philips Research Reports, vol. 30, no.
6, pp. 385-435, Dec. 1975.
[58] B. Razavi, Editor, Monolithic Phase-Locked Loops and Clock Recovery Circuits:
Theory and Design, IEEE Press, New York, 1996.
[59] Y. Takasaki, Digital Transmission Design and Jitter Analysis, Artech House,
Boston, 1991.
168
[60] J. Savoj, A 10Gb/s CMOS Clock and Data Recovery Circuit, Ph. D. Dissertation,
University of California, Los Angeles, CA, 2001.
[63] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and Phase Noise in Ring
Oscillators,” IEEE Journal of Solid-State Circuits, vol. 34, no. 6, pp. 790-804, June
1999.
[70] M. Shimanouchi, “New Paradigm for Signal Paths in ATE Pin Electronics Are
Needed for Serialcom Device Testing,” Proceedings of the IEEE International Test
Conference, ITC’02, pp. 903-912, Oct. 2002.
169
[71] M. Shimanouchi, “An Approach to Consistent Jitter Modeling for Various Aspects
and Measurement Methods,” Proceedings of the IEEE International Test
Conference, ITC’01, pp. 848-857, Oct.-Nov. 2001.
[72] M. P. Li, J. Wilstrup, R. Jessen, and Dennis Petrich, “A New Method for Jitter
Decomposition through Its Distribution Tail Fitting,” Proceedings of the IEEE
International Test Conference, ITC’99, pp. 788-794, Sept. 1999.
[74] J. Wilstrup, “A Method of Serial Data Jitter Analysis Using One-Shot Time
Interval Measurements,” Proceedings of the IEEE International Test Conference,
ITC’98, pp. 819-823, Oct. 1998.
[78] G. L. Cariolaro and F. Todero, “A general spectral analysis of time jitter produced
in a regenerative repeater,” IEEE Transactions on Communications, vol. COM-25,
no. 4, pp. 417-426, April 1977.
[85] M. Neuhauser, H-M. Rein, H. Wernz, and A. Felder, “13 GB/s Si Bipolar
Preamplifier for Optical Front Ends,” Electronics Letters, vol. 29, No. 5, pp.
492-493, March 1993.
[97] A. I. Zeverev, Handbook of Filter Synthesis, John Wiley & Sons, 1967.
[98] W. Chen, Theory and Design of Broadband Matching Networks, Pergamon Press,
Oxford, 1976
[99] H. J. Orchard, “Inductorless Filters,” Electronics Letters, vol. 2, pp. 224-225, Sept.
1966.
[109] R. A. George, “Method and Means for Detecting Error Rate of Transmitted Data,”
US Patent #3,721,959, March 20, 1973.
[111] J. M. Keelty and K. Feher, “On-Line Pseudo Error Monitors for Digital
Transmission Systems,” IEEE Transactions on Communications, vol. COM-26,
no. 8, pp. 1275-1282, Aug. 1978.
[112] S. Shin, B.-G. Ahn, M. Chung, S. Cho, D. Kim, and Y. Park, “Optics Layer
Protection of Gigabit-Ethernet System by Monitoring Optical Signal Quality,”
Electronics Letters, vol. 38, no. 9, pp. 1118-1119, Sept. 2002.
[120] F. Buchali, S. Lanne, J.-P. Thiery, W. Baumert, and H. Bulow, “Fast Eye Monitor
for 10Gbits/s and its Application for Optical PMD Compensation,” Optical Fiber
Communication Conference and Exhibit, (OFC’01), vol. 2, pp. TuP5/1-3, 2001.
[122] F. Buchali, W. Baumert, H. Bulow, and J. Poirrier, “A 40 Gb/s Eye Monitor and its
Application to Adaptive PMD compensation,” Optical Fiber Communication
Conference and Exhibit, (OFC’02), pp. 202-203, March 2002.
[127] M. Zargari, A BiCMOS Active Substrate Probe Card Technology for Digital
Testing, Ph.D. Dissertation, Stanford University, Stanford, CA, March 1997.
[128] M. Banu and A. E. Dunlop, “Clock recovery circuits with instantaneous locking,”
Electronics Letters, vol. 28, no. 23, pp. 2127-2130, 5 Nov. 1992.
[129] M. Banu and A. E. Dunlop, “660Mb/s CMOS Clock Recovery Circuit with
Instantaneous Locking for NRZ Data and Burst-Mode Transmission,” IEEE
International Solid-State Circuits Conference Digest of Technical Papers,
(ISSCC’93), vol. 40, pp. 102-103, Feb. 1993.
[131] M. Nakamura, N. Ishihara, and Y. Akazawa, “A 156 Mbps CMOS clock recovery
circuit for burst-mode transmission,” IEEE Symposium on VLSI Circuits Digest of
Technical Papers, pp. 122-123, June 1996.
[133] J. F. Wakerly, Digital Design: Principles and Practices, second edition, Prentice
Hall, 1994.