Signal Integrity-Final Thesis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 194

Signal Integrity Issues in High-Speed Wireline Links:

Analysis and Integrated System Solutions

Thesis by
Behnam Analui

In Partial Fulfillment of the Requirements


for the Degree of
Doctor of Philosophy

California Institute of Technology


Pasadena, California
2005
(Defended July 25, 2005)
ii

c 2005

Behnam Analui
All Rights Reserved
Some materials were previously published in IEEE publications and the copyright is owned by IEEE.
iii

To

Yadollah Analui and Ashraf Kamali

for their unconditional love.


iv

Acknowledgements

This is perhaps my most favorite part of this thesis. The acknowledgement section is
usually the last section that is written but is often the first section that is read! I enjoy
reading the acknowledgments section because it is a short documentary that takes you
behind the scenes. It is real. You feel the extreme joy and satisfaction of the author for his
achievement, flowing from his words. You witness his passion of sharing his feeling with
everyone. A passion that is making him scream loudly in the silence of the library where I
am reading his thesis: THANK YOU to all that made it happen! And the voice fades away
but the satisfaction of learning, the bliss of friendship, and the pleasure of accomplishment
all remain for him in his personal treasure.

Here are the people to whom I want to express my deepest gratitude for making it
happen and for what they have contributed to my treasure. I am extremely grateful for the
dedication of their big brains and even bigger hearts to my work and my life.

For technical contributions to this work:

First and foremost, I am truly indebted to Prof. Ali Hajimiri. He granted me the
privilege to work in his research group at Caltech and guided me through all the ups and
downs of my Ph.D. with his special enthusiasm like that of a young coach who is full of
energy and knows the game very well. What he taught me goes well beyond his detailed
technical feedback about the research work in this dissertation. From him, I learned to be a
responsible engineer and a critical scientist as well as learned to write succinctly! I am
also grateful for all his advice and help in the transition process to my next career. In short,
I am proud to be his student and wish that my five years at Caltech are only the beginning
of a life-long relationship with him.
v

I was lucky to overlap three years with Jim Buckwalter in Ali’s group. Jim’s
excitement and energy for doing research was always a motivation for me. He was my
collaborator in the data-dependent jitter work. I thank him for the technical discussions
and his various contributions to that work. Dr. Alexander Rylyakov was my mentor during
my several visits to the IBM T. J. Watson Research Center. I thank him for his tireless
supervision in the eye-opening monitor work and contributions to all the stages of the
design, layout, and testing with his extreme patience and insightful comments. Finally, I
am very grateful to Prof. Hossein Hashemi for his own thrilling way of criticizing my
research work. We had several discussions in the office, at lunch, in the gym, at Echo
Mountain while hiking, and even at home when cooking! His solid reasoning always made
me think twice and, admittedly, he was often right!

I am grateful to Profs. David Rutledge and Shuki Bruck for their kind and helpful
advice during my research work at Caltech and while transitioning to my next career.
Dave is a true academician and has created a research lab truly devoted to the
advancement of microwave engineering. I benefited a lot from interacting with his group
and using his lab facilities. Prof. Bruck initiated a collaborative project with Caltech’s
High-Speed Integrated Circuit Group (Hajimiri’s group) that I enjoyed being a part of. I
also thank Prof. Bruck for his encouragement and for transferring his passion and positive
attitude to me in all of our conversations.

I acknowledge Profs. David Rutledge, Shuki Bruck, Sandy Weinreb, and Bob
McEliece for dedicating their time to be on my oral candidacy committee and for
providing their technical comments about the progress of my research work. I also
acknowledge Profs. David Rutledge, Shuki Bruck, Sandy Weinreb, and Yu-Chong Tai for
kindly accepting to serve on my Ph.D. defense committee and reading this thesis.

Many friends and colleagues have contributed to this work through their technical
feedback, reading my paper manuscripts, helping with layout and measurements, and
CAD technical support. I thank them all. Particularly, I am indebted to Abbas Komijani,
vi

Arun Natarajan, Prof. Donhee Ham, Dr. Ichiro Aoki, Prof. Hui Wu, Dr. Scott Kee, Sam
Mandegaran, Amir Faraji Dana, Dr. Masoud Sharif, Ali Vakili, Niklas Wadefalk, Ann
Shen, Dr. Saleem Mukhtar, Maryam Owrang, Dr. Lawrence Cheuang, Dr. Jose Tierno, Dr.
Thomas Zwick, Dr. Sergey Rylov, Dr. Mounir Meghelli, Dr. Daniel Friedman, Dr. Sudhir
Gowda, Dr. Michael Beakes, Dr. Jeremy Schaub, Dr. David Sanderson, Naveed
Near-Ansari, Dr. Jean-Olivier Plouchart, Dr. Noah Zamdmer, Dr. Yue Tan, and numerous
others.

For financial support of this work:

I appreciate the financial assistance of the sponsors of my research, particularly the


Lee Center for Advanced Networking at Caltech and the National Science Foundation
(NSF). I acknowledge Drs. Mehmet Soyuer, Modest Oprysko, and Dan Friedman for
facilitating my visit to the IBM T. J. Watson Research Center, Yorktown Heights, NY, as
an intern. My visit to IBM resulted in the eye-opening monitor work, which was one of the
most exciting parts of my research. In addition, I gained a lot of experience and found
many friends at IBM.

I thank IBM Microelectronics and Jazz Semiconductor for fabricating my hardware


prototypes. In particular, I am grateful to Dr. Marco Racanelli, Dr. Arjun Karroy, and Dr.
Scott Stetson from Jazz Semiconductor for their consistent support. I acknowledge Analog
Devices and especially Dr. Larry DeVito for the outstanding student designer award in my
third year. Finally, I thank Agilent for providing some of the test equipment pieces for my
measurements.

For making my Caltech years memorable:

My Caltech years were really fun! I am grateful for the luxury of interacting with
many unique individuals who made the happy story of my life at Caltech. I sincerely thank
them all. Particularly, I thank Donhee, my office-mate, who spent a lot of time showing
me around LA and teaching, with his gifted excitement, a lot about classical music. I hope
we get together once in a while to have “soltani.” I thank Sam, my other office-mate,
vii

whose quest for knowing more has strengthened my sense of curiosity. He has taught me
numerous scientific facts that I would otherwise pass by without even noticing. He was
also the first person to scientifically introduce me to the theory of evolution, one of the
most profound theories of all time, in my opinion. I thank Abbas, with whom I started the
five-year journey at Caltech, for his always unique perspective and insightful comments
that made me wonder “why didn’t I think as simply as that?” I also thank him for his
happy, energetic heart that often made him break the unwritten rules of being an adult and
reminded me of the first few pages of my all-time favorite book, “The Little Prince.” I
thank Ehsan, Arash Y., Matthieu, Jeremy, Fati, Nikoo, Maryam, Maziar-Lisa, Arun, Jim
Bucky, Amir S., Farshad, Baharak, Roberto, Xiang, Xiaofeng, Yujin, Taka, Dai, Chris,
Shervin, Alireza, Michella, Carol, Michelle, Heather Jackson, Linda, Veronica, Parandeh,
Jim Endrizzi, Tess, Dale, Prof. McGill, Tara, Erni, Chandler’s pizza staff, Ampex, Nodal,
and Mr. and Mrs. Bentley for being a part of my life at Caltech. Finally, I want to express
my deep gratitude to Lisa Cowan and Prof. Shuki Bruck. Conversations with them were
always exceptionally delightful and charming, and I am very thankful to them for that.

For their influential role in my life:

Everyone has his own list of folks whose influences on his life are hard to express in
words. It is also hard, if not impossible, to pay them back for what they contributed to his
life. However, they are usually the ones who don’t expect to be paid back. Here is my list:

Ali H., for being a great advisor and friend and for teaching me fairness and balance.
Mehrdad Sharif-Bakhtiar, for teaching me electronics and living by principles. Hossein,
for teaching me to ask questions, for teaching me to listen to my honest self more often,
for increasing my confidence, for introducing me to the fun side of sport, and for being a
true brother. Amir-Helia as a single entity, for bringing Maryam to my life and for being
the little prince and princess around whom I feel I am the adult kid I always want to be. In
their presence, I can sit for hours, stare at their smiles, and take pleasure. Daei Mehdi, for
his advice that typically covers all the aspects of all the issues with any probability larger
viii

than zero. Maman Maria, the coolest mother-in-law ever, for her regular calls from her
office to see how I am doing and for having a heart that enjoys every moment of her life;
after all she is my wife’s mom! Bita, Behrad, and Behdad, for their love and support while
being thousands of miles away. Maryam, my angel, my other half, my lovely wife, for
giving me courage, for being full of surprises, for making our life a wonderland, and for
tolerating me when I said: “Honey! I am a little busy this weekend” for many weekends. I
believe she shares my extreme thankfulness for all who have made our life a
dream-come-true. Finally, my mom and dad, for supporting me unconditionally. Since
high school, I have concentrated all my efforts on making them proud, and I believe this
has led me to all my successes. I humbly bow to them and dedicate this thesis to them as a
little sign of appreciation for all their sacrifices.
ix

Abstract

This work focuses on the basic signal integrity issues of high-speed wireline links. It
bridges the gap between optimum system design and circuit design for such links by: (1)
understanding the effects of the system parameters on the bit error rate (BER), (2)
introducing circuit architectures for the realization of systems that minimize the BER, and
(3) demonstrating integrated circuit prototypes that verify the solutions.

First, we develop a theory that analytically relates the data link BER to the system
characteristics, e.g., the channel response, the pre-amplifier bandwidth, and the transmitter
clock jitter. We generate the BER contours to find the optimum receiver bandwidth as well
as the optimum sampling point and its associated timing margin. We also develop the
theory of the data-dependent jitter (DDJ), which is a significant component of the timing
jitter in high-speed links. We provide an analytical distribution function for the DDJ of an
arbitrary linear time-invariant system and include the impact of the DDJ on the BER.

Second, we propose a bandwidth enhancement method for wideband amplifiers. This


is useful for the realization of high-speed links in technologies that suffer from large
parasitic components. The method leverages two-port broadband matching to enable
amplifier stages to achieve their maximum gain-bandwidth product. We demonstrate a
10Gb/s CMOS 0.18µm amplifier with this technique that has 2.4 times the bandwidth
improvement over a design that does not apply the technique.

Third, we develop an eye-opening monitor (EOM) that enables full integration of


adaptive equalizers. The EOM evaluates the signal eye diagram quality and reports a
quantitative measure, which is correlated to the signal integrity. We demonstrate a
prototype in 0.13µm standard CMOS that operates up to 12.5Gb/s and has 68dB error
dynamic range.
x

Finally, we introduce an instantaneous clockless demultiplexer for burst-mode


communication applications. We propose a clockless finite state machine that recovers
and demultiplexes the received burst of data instantaneously. The architecture consists of a
combinational logic structure and a bit-period-delayed feedback loop. We demonstrate a
1:2 clockless demultiplexer based on this concept in SiGe BiCMOS technology that
operates at 7.5Gb/s.
xi

Table of Contents

Acknowledgements iv

Abstract ix

List of Figures xv

List of Tables xx

Chapter 1: Introduction 1
1.1 Information Technology: Desire for Higher Speed..................................................... 1
1.2 Scope of this Thesis .................................................................................................... 2
1.3 Why Silicon-Based Integrated Circuits? ..................................................................... 4
1.4 Challenges ................................................................................................................... 5
1.5 Contributions............................................................................................................... 6
1.5.1 Fundamental Issues: Signal Integrity................................................................. 6
1.5.2 High-Speed Integrated Circuit Topologies in Silicon........................................ 7
1.5.3 Novel Architectures: High-Speed Signal Processing to Maintain Signal Integ-
rity ............................................................................................................ 7
1.6 Thesis Organization..................................................................................................... 8

Chapter 2: Principles of High-Speed Communications 10


2.1 Trade-Offs in Link Design ........................................................................................ 10
2.2 Modulation Schemes ................................................................................................. 11
2.2.1 Modulation....................................................................................................... 11
2.2.2 Symbol Coding ................................................................................................ 12
2.2.3 Power Spectral Density.................................................................................... 14
2.3 Link Reliability ......................................................................................................... 16
2.3.1 Eye Diagram .................................................................................................... 16
2.3.2 Bit Error Rate (BER) ....................................................................................... 17
2.3.3 Inter-Symbol Interference (ISI) ....................................................................... 19
xii

2.3.4 Equalization ..................................................................................................... 21


2.3.5 ISI Impact on BER........................................................................................... 27
2.3.5.1 First-Order LTI System.......................................................................... 28
2.3.5.2 Second-Order LTI System ..................................................................... 34
2.4 Wireline Communication Transceiver ...................................................................... 36
2.4.1 General Architecture........................................................................................ 36
2.4.2 Channel ............................................................................................................ 38
2.4.3 Pre-Amplifier................................................................................................... 39
2.4.4 Adaptive Equalizer .......................................................................................... 40
2.4.5 Clock Recovery ............................................................................................... 40
2.5 Timing Jitter .............................................................................................................. 42
2.5.1 Timing Jitter Definition ................................................................................... 42
2.5.2 Jitter Impact on the BER.................................................................................. 43
2.6 Overall Impact of Jitter and ISI on the BER ............................................................. 45
2.6.1 Ideal Sampling Clock ...................................................................................... 46
2.6.2 Non-Ideal Sampling Clock .............................................................................. 47
2.6.3 ISI and Jitter Trade-off..................................................................................... 48
2.6.3.1 The Bathtub Curve ................................................................................. 48
2.6.3.2 The BER Contours: 3D Bathtub Curve.................................................. 49
2.7 Summary ................................................................................................................... 51

Chapter 3: Data-Dependent Jitter in Wireline Communications 53


3.1 Introduction ............................................................................................................... 53
3.2 Framework ................................................................................................................ 55
3.2.1 Data Jitter......................................................................................................... 55
3.2.2 Data-Dependent Jitter ...................................................................................... 57
3.3 An Analytical Expression for DDJ: First-Order System........................................... 59
3.3.1 Analytical Expression for Threshold-Crossing Time ...................................... 59
3.3.2 Peak-to-Peak Jitter ........................................................................................... 62
3.3.3 Scale-One DDJ ................................................................................................ 62
3.4 An Analytical Expression for DDJ: General LTI System ......................................... 63
3.4.1 Perturbation Method ........................................................................................ 63
3.4.2 Peak-to-Peak Jitter and Scale-One DDJ .......................................................... 66
3.4.3 Data-Dependent Jitter Minimization ............................................................... 67
3.5 Experimental Verification ......................................................................................... 69
3.6 DDJ Impact on the BER............................................................................................ 75
3.7 Summary ................................................................................................................... 78

Chapter 4: Bandwidth Enhancement for Wideband Amplifiers 79


4.1 Introduction ............................................................................................................... 79
4.2 Wideband Amplifier Limits ...................................................................................... 81
xiii

4.2.1 Single Stage Amplifiers ................................................................................... 81


4.2.1.1 One-port (two-terminal) load network ................................................... 81
4.2.1.2 Two-port (four-terminal) matching network.......................................... 83
4.2.2 Multi-Stage Amplifiers .................................................................................... 87
4.3 Design Methodology ................................................................................................. 88
4.4 Example Design ........................................................................................................ 92
4.5 Experimental Results ................................................................................................ 95
4.6 Summary ................................................................................................................... 99

Chapter 5: Eye-Opening Monitor for Adaptive Equalization 101


5.1 Introduction ............................................................................................................. 101
5.2 Prior Art .................................................................................................................. 103
5.3 EOM Principle of Operation ................................................................................... 104
5.4 EOM Architecture ................................................................................................... 107
5.5 Circuit Implementation ........................................................................................... 111
5.6 Experimental Results .............................................................................................. 114
5.6.1 Test Setup....................................................................................................... 115
5.6.2 Clock Path...................................................................................................... 116
5.6.3 Qualitative Eye-Opening Measurement ........................................................ 116
5.6.4 Eye-Opening Measurement Variations .......................................................... 118
5.6.5 Complete System Test ................................................................................... 119
5.7 EOM vs. BERT ....................................................................................................... 121
5.8 Summary ................................................................................................................. 123

Chapter 6: Instantaneous Demultiplexing for Burst-Mode Links 124


6.1 Introduction ............................................................................................................. 124
6.2 Instantaneous 1:n Demultiplexer............................................................................. 125
6.2.1 General Architecture...................................................................................... 125
6.2.2 Design of a 1:2 Demultiplexer....................................................................... 127
6.2.3 Cascade Architecture ..................................................................................... 130
6.3 Delay Mismatch ...................................................................................................... 131
6.4 Delay Implementation ............................................................................................. 134
6.4.1 Passive Delay................................................................................................. 134
6.4.2 LC Delay Line Implementation ..................................................................... 136
6.4.3 Experimental Results and Analysis ............................................................... 138
6.4.3.1 Measurement Accuracy and Repeatability........................................... 138
6.4.3.2 S-parameters......................................................................................... 138
6.4.3.3 Standalone Delay Lines: Group Delay................................................. 139
6.5 Prototype Measurement Results.............................................................................. 142
6.6 Summary ................................................................................................................. 146
xiv

Chapter 7: Conclusion 147


7.1 Thesis Highlights .................................................................................................... 147
7.2 Directions for Future Work ..................................................................................... 149

Appendix A: Overall BER Calculation 152

Appendix B: Threshold-Crossing Time 156

Appendix C: Impedance Function 160

Appendix D: Mask Error Rate 161

Bibliography 163
xv

List of Figures

Chapter 1: Introduction
Figure 1.1: Categories of wireline communication applications..............................3

Chapter 2: Principles of High-Speed Communications


Figure 2.1: RZ and NRZ formats representing a “1011001” sequence..................12
Figure 2.2: Power spectrum of 2PAM NRZ on linear axes....................................15
Figure 2.3: Power spectrum of 2-PAM NRZ on logarithmic axes .........................15
Figure 2.4: Creation of the eye diagram with the length of 2.Tb from signal ........17
Figure 2.5: Bit error generation due to noise in a symbol detection-based receiver ..
18
Figure 2.6: The BER calculation from the area under the tail of the noise distribu-
tion ..............................................................................................18
Figure 2.7: Loss contributions from conductor and dielectric in a FR4-based strip-
line [42].......................................................................................20
Figure 2.8: The output of an 800m MMF channel is severely distorted due to
modal dispersion .........................................................................21
Figure 2.9: Polarization mode dispersion in a SMF with noncylindrical core.......21
Figure 2.10: Equalizer filter in two topologies (a) FFE (b) DFE. ............................23
Figure 2.11: FIR filter with tapped-delay line topology and N+1 taps. ...................25
Figure 2.12: The constant-k filter-based LC delay line: (a) pi-section (b) 3-section ...
25
Figure 2.13: The implementation of the LMS algorithm for adaptive equalization 27
Figure 2.14: Total amplitude distribution at the sampling point when ISI impact of
one bit is taken into account .......................................................30
Figure 2.15: The BER vs. SNR for various normalized bandwidths compared to the
zero-ISI BER of equation (2.19), sampled at optimum point, i.e.,
Ts=Tb. ..........................................................................................31
Figure 2.16: ISI and noise trade-off as normalized bandwidth variations justifies
existence of a minimum for BER ...............................................32
Figure 2.17: The optimum bandwidth for minimum BER when sampling point is in
the middle of the eye at Tb/2 ......................................................33
Figure 2.18: Pulse response of a second-order system at various normalized 3dB
xvi

bandwidths: (a) ζ=0.5 (b)ζ= 2 ⁄ 2 .............................................35


Figure 2.19: The contours of log10[BER] for N0=4e-3v2/Hz: (a) ζ=0.4 (b) ζ=0.5
(c)ζ= 2 ⁄ 2 . ................................................................................36
Figure 2.20: General architecture of a serial link.....................................................37
Figure 2.21: The front end of an optical communication receiver with the photo
detector and a shunt-feedback trans-impedance amplifier (TIA) ...
39
Figure 2.22: PLL-based clock recovery architecture ...............................................41
Figure 2.23: Jitter is deviation of the threshold-crossing time from a reference time .
42
Figure 2.24: Accumulated eye diagram with data jitter histogram ..........................43
Figure 2.25: Impact of data jitter on BER from data path and clock path ...............44
Figure 2.26: Impact of the data jitter on the BER by causing bit slipping ...............45
Figure 2.27: Bathtub curve for σj=0.05 UI ..............................................................48
Figure 2.28: (a) Three dimensional bathtub curve for a first-order system for various
normalized bandwidths; σj=0.05UI and N0=4e-3v2/Hz (b) Con-
tours of BER from top view of plot (a).......................................49
Figure 2.29: BER contours (a) σj=0.025UI and N0=4e-3v2/Hz (b)σj=0.05UI and
N0=5e-3v2/Hz .............................................................................50

Chapter 3: Data-Dependent Jitter in Wireline Communications


Figure 3.1: (a) Distribution of total jitter from the convolution of RJ and DJ PDF
(b) Eye diagram and jitter histogram measurement for a data
sequence passed through a microstrip transmission line on FR4
PCB. ............................................................................................57
Figure 3.2: Data-dependent jitter is caused by ISI impact of prior bits .................58
Figure 3.3: Response of a general LTI system to a random bit sequence and gener-
ation of DDJ................................................................................58
Figure 3.4: Ensemble of normalized DDJ values for different ratios of bandwidth
to bit rate along with the appropriate model to use for data-depen-
dent jitter PDF. ............................................................................60
Figure 3.5: Threshold-crossing histogram and DDJ distribution: (a) α=0.1 (b)
α=0.3...........................................................................................61
Figure 3.6: Comparison of the measurement results for DDJ1 and the analytical
expression in (3.12) for a first-order system...............................63
Figure 3.7: Deviation of the threshold-crossing time due to the effect of the kth bit.
64
Figure 3.8: Worst case accuracy of the perturbation method in predicting DDJ: (a)
for a first-order system. (b) for a second-order system...............65
Figure 3.9: (a) Variation of the impacts of the last three prior bits on DDJ in a sec-
xvii

ond order system. (b) Existence of a minimum in the


peak-to-peak data-dependent jitter..............................................68
Figure 3.10: The output eye diagram of a 4” microstrip line on FR4 PCB at (a) 5
Gb/s and (b) 6.5 Gb/s demonstrates larger peak-to-peak determin-
istic jitter at lower bit rate. ..........................................................69
Figure 3.11: Step response, pulse response, and the individual jitter contributions of
prior bits as calculated from (3.13) for the systems under test: (a)
Mini Circuit ZFL-1000 amplifier (b) Copper microstrip line on
FR4 PCB (c) HP 11688A lowpass filter (d) BNC coaxial cable 71
Figure 3.12: TIA test-board setup for the 10 Gb/s TIA............................................73
Figure 3.13: TIA step response and impact of a-2 pulse on t0 in a “101” sequence at
3.3Gb/s........................................................................................73
Figure 3.14: TIA eye diagram when DDJ1 and DDJ2 are observable (a)1.65 Gb/s
(b)3.37Gb/s. ................................................................................75
Figure 3.15: BER contours for σj=0.05UI and N0=4e-3v2/Hz for two reference
times for the sampling point (a) t=0 (b) bandwidth-dependent
threshold-crossing time...............................................................77

Chapter 4: Bandwidth Enhancement for Wideband Amplifiers


Figure 4.1: Single stage amplifier: (a) First-order load (b) General passive imped-
ance load. ....................................................................................82
Figure 4.2: (a) Small signal model of an amplifier with loading effect of next stage
amplifier (b) The inserted passive network isolates the amplifier
parasitics and the load (c) Additional inductor forms a 3rd-order
passive network at the output......................................................84
Figure 4.3: Normalized gain of the amplifier with 3rd-order network load and dif-
ferent inductor values..................................................................86
Figure 4.4: Passive ladder structure of order N, inserted between the gain stages. ...
89
Figure 4.5: (a) An inductor is inserted between two gain stages (b) The small signal
model shows formation of a 3rd-order ladder network. .............90
Figure 4.6: The inductor at the input forms a 3rd-order ladder network with the
photodiode capacitance...............................................................91
Figure 4.7: Schematic of the input stage of the TIA ..............................................92
Figure 4.8: Schematic of the TIA with parasitic capacitances and additional induc-
tors. .............................................................................................93
Figure 4.9: (a) Trans-resistance gain of the TIA with 0.5 pF photodiode capaci-
tance and the input matching (b) Group delay response of the
TIA..............................................................................................96
Figure 4.10: Eye diagram of the TIA output with 10GB/s 231-1 PRBS at the input ...
97
Figure 4.11: The BER of the TIA for different input powers at 10GB/s. ................98
xviii

Figure 4.12: The die photograph of the 9.2 GHz TIA..............................................99

Chapter 5: Eye-Opening Monitor for Adaptive Equalization


Figure 5.1: Adaptive transversal filter equalizer with an eye-opening monitor
(EOM). ......................................................................................102
Figure 5.2: The mask error rate (MER) varies for different mask shapes in a given
eye diagram...............................................................................105
Figure 5.3: The effective eye opening formed by combining the mask areas that
have the same MER. .................................................................106
Figure 5.4: The combination of effective eye openings is a 2D error map that is
correlated to the shape of the eye diagram. ..............................106
Figure 5.5: Operation principle of the EOM for one mask. .................................107
Figure 5.6: The EOM architecture. ......................................................................108
Figure 5.7: Generation of φearly and φlate by phase interpolation. .......................110
Figure 5.8: The differential comparator circuit. ................................................... 111
Figure 5.9: The phase interpolator and phase-set register. ...................................112
Figure 5.10: Simulated phase interpolator transfer function for different bandwidths.
113
Figure 5.11: The die photograph of the EOM with magnified active core. ...........114
Figure 5.12: Measurement setup ............................................................................115
Figure 5.13: Accumulated phase of the clock_out signal that verifies functionality of
the divider and phase rotator with 10GHz input clock. ............116
Figure 5.14: Qualitative eye-opening measurement. (a)10Gb/s input eye diagram (b)
error_out signal demonstrates an error-free region (c) magnified
error_out signal shows MER increase for wider mask .............117
Figure 5.15: Measured eye opening for various input eye diagrams with different
peak-to-peak jitter. ....................................................................118
Figure 5.16: Measured 2D error map with 68dB dynamic range...........................120
Figure 5.17: Comparing EOM and BERT operations: (a) 12.5Gb/s input eye (b)
MER measurement with EOM in presence of 10% digital error
(c) BER measured with commercial BERT (d) BER measured
with commercial BERT in presence of 10% digital error. ........122

Chapter 6: Instantaneous Demultiplexing for Burst-Mode Links


Figure 6.1: Instantaneous 1:n demultiplexer. .......................................................126
Figure 6.2: State diagram for a FSM-based 1:2 demultiplexer ............................127
Figure 6.3: Demultiplexer outputs for 1011000010 input sequence (a) Ideal case
(b) Delay cell has smaller delay than bit period .......................131
Figure 6.4: Outputs of the 1:2 demultiplexer simulated with HSPICE................132
Figure 6.5: Demultiplexer with delay control loop ..............................................134
Figure 6.6: 3-section constant-k filter-based passive LC delay line.....................135
Figure 6.7: 3-section differential constant-k filter-based delay line.....................136
xix

Figure 6.8: Differential symmetric interwound inductors for one section of the
delay line...................................................................................137
Figure 6.9: Magnitude of the S-parameters of one MIM-based standalone delay
line ............................................................................................138
Figure 6.10: Collective group delays of 27 standalone MIM-based delay lines ....139
Figure 6.11: Collective group delays of 47 standalone VPP-based delay lines .....140
Figure 6.12: Normalized standard deviations for group delays of standalone delay
lines ...........................................................................................140
Figure 6.13: Distributions of normalized delay at 1GHz for both MIM and
VPP-based delay lines ..............................................................141
Figure 6.14: Die photo of 19-section VPP-based LC delay line ...........................141
Figure 6.15: Die photo of 24-section MIM-based LC delay line ...........................142
Figure 6.16: A three-input ECL OR gate ..............................................................142
Figure 6.17: Die microphotograph of the 1:2 demultiplexer with three 5-section dif-
ferential LC delay line. .............................................................143
Figure 6.18: The y2 output in the oscillator mode. ................................................143
Figure 6.19: Demultiplexer outputs out1 and out2 for 3 input sequences (a)1100
(b)10000000 (c) 1000000010001000. ......................................144
xx

List of Tables

Chapter 2: Principles of High-Speed Communications


Table 2.1: Various high-speed wireline communication standards ......................37

Chapter 3: Data-Dependent Jitter in Wireline Communications


Table 3.1: Comparing measured DDJ1 and predictions of analytical expression in
(3.19)...........................................................................................70
Table 3.2: Comparing measured DDJ1 and predictions of analytical expression for
the 10GB/s CMOS TIA ..............................................................74

Chapter 4: Bandwidth Enhancement for Wideband Amplifiers


Table 4.1: Bandwidth enhancement ratios for the two 3rd-order passive networks
in Figure 4.3. ...............................................................................87
Table 4.2: Comparison of the individual effects of the inductors on BWER .......94

Chapter 6: Instantaneous Demultiplexing for Burst-Mode Links


Table 6.1: 1:2 Demultiplexer output in each state ..............................................128
Table 6.2: Race-free code assignment for the states of the FSM........................128
Table 6.3: Summary of delay line parameters ...................................................137
Table 6.4: Statistical comparison for MIM and VPP-based lines......................141
1

Chapter
Introduction
1
1.1 Information Technology: Desire for Higher Speed
Integrated systems are among the key technologies that have revolutionized the
information era by enabling high-speed computation and communication technique as
well as high-speed access to stored information. The commodities benefiting from this
revolution, e.g., internet, personal computer, and cellular wireless phone, have become
commonplace. The evolution of such commodities has caused a dramatic growth in the
amount of information generated and in the number of end users who access that
information. A recent study estimates 0.5 million terabytes of original information was
distributed over the internet in 2002, double the amount in 1999 [1]. The number of
internet users worldwide has also increased by 146% from the year 2000 to the year 2005
[2]. The continuous growth of internet traffic necessitates an upgrade in the backbone
infrastructure and local area networks to support even higher data transfer rates and larger
numbers of end users. The Synchronous Optical NETwork (SONET) [3][4] and Ethernet
[5] are evolving in response to this demand for communication at higher speed.

In addition to end users, there are two more types of network nodes that access
information and will take advantage of higher data transfer rates: processor nodes and
storage nodes. Today's microprocessors run at about 100 times faster than 15 years ago [6]
due to device scaling and architecture design advances. As on-chip clock frequencies
increase, I/O bus bandwidth becomes the speed bottleneck in a multi-chip environment.
Developing high-speed chip-to-chip links allows increased processing power and faster
networking with other chipsets. The impact of higher-speed links becomes increasingly
significant in distributed computing networks and the so called “super computers” with
2

multiple processing nodes, where the system's performance relies on fast, error-free
communication between the processing nodes.

High-speed access to data storage devices for fast transfer of large volumes of
information is another emerging application for high-speed data communication. An
example is storage area networks (SAN) that, in contrast to a single large-capacity device,
consist of a scalable network of storage nodes. The projected amount of data that will be
stored using a SAN-based database is three million terabytes in 2005 [7]. Advancement of
technology in various areas will accelerate the generation of more original information.
For instance, sensor networks will be ubiquitous and will constantly sense and aggregate
data from various environments. This data needs to be stored in sizable databases.
Similarly, large databases will be necessary to store human genome sequences that occupy
about 0.75 gigabytes per human being. The continuous increase in the size of databases is
an additional incentive for developing high-speed mass-storage media networks.

In brief, high-speed reliable communication in various forms has to evolve inevitably


to maintain efficient connectivity and information accessibility in a growing population of
networks that consist of processors, storage nodes, and end users.

1.2 Scope of this Thesis


This thesis explores basic challenges in high-speed wireline communication, i.e., at
10Gb/s and beyond, and provides silicon-based integrated circuits and systems as
solutions. High-speed communication systems, also called high-speed links, typically use
electrical or fiber optic channels for data transmission due to the large available bandwidth
in such channels that allow higher transmission rate. Wireline links can be categorized
based on the transmission distance and the transmission rate, as shown in Figure 1.1.
Chip-to-chip links establish communication between chips on a single printed circuit
board (PCB), such as the memory unit and the CPU. Backplane links refer to the
communication between nodes on different expansion cards that are inserted on a single
3

Figure 1.1: Categories of wireline communication applications

backplane board or a motherboard. An example is the link between several line cards on
one backplane of a network router [8]. In a backplane link, the data rate is typically higher
and the transmission distance is longer compared to a chip-to-chip link. However, the
channel used is still electrical.

If the transmission distance is more than 100m and the data rate is above 10Gb/s,
electrical transmission lines are not deployed anymore mainly due to the significant loss
of the channel. In such cases, multi-mode fiber (MMF) or single-mode fiber (SMF) are
used. While the copper attenuation easily reaches 10dB/m at 10Gb/s [9], typical loss of
modern single-mode fiber remains below 0.5dB/Km in the 1200nm-1600nm wavelength
range [10].

Although several applications exist in various categories discussed above, the front
end architecture of the high-speed chip-to-chip, backplane or optical communication links
is similar, as will be discussed in Chapter 2. The focus of this dissertation is on the basic
challenges of high-speed link design resulting from channel impairments and hardware
4

restrictions and thus the contributions that can be applied to any of the above categories of
wireline communication links.

1.3 Why Silicon-Based Integrated Circuits?


Silicon-based integrated circuits (IC) play a central role in the evolution of the high-
speed networks. The major advantage of silicon-based technologies is their ever-
increasing capacity for integration that enables realization of complex ICs at very low
cost. For example, one consequence of integration includes doubling the number of
transistors in a microprocessor every 18 months [11][12], while dropping the price of each
transistor by a factor of 100 over 15 years [12].

Besides cost advantage, integration has two other benefits. First, the parasitic impact
of package and board on the interface between functions that are now integrated on the
same die is eliminated. Therefore, the operation speed can be significantly higher as the
speed is now only limited to the on-chip device and interconnect parasitic components.
Second, the power consumption of the chip is lowered because the I/O drivers, required at
the interface between chips in a multichip environment, will be eliminated.

The scaling trend of the silicon-based technologies, specifically the CMOS


technologies, enables fabrication of devices with a higher frequency of operation. The
latest International Technology Roadmap for Semiconductors (ITRS) projects the
possibilities and challenges of integrated system design to year 2018 [13]. For instance, by
2009, the maximum unity current gain, ft, and unity power gain, fmax, of a MOS device
with 32nm gate length are expected to be 280GHz and 310GHz, respectively [14]. Scaling
as well as development of advanced silicon technologies such as silicon germanium
(SiGe) HBT transistors with germanium-doped base and silicon-on-insulator (SOI)
technology with small device to substrate parasitics enables silicon technologies to meet
the performance requirement of high-speed applications. Consequently, along with a
remarkable integration advantage, silicon-based technologies seem to be capable of
5

delivering more functionality for a lower price and are the perfect candidate for the
implementation of low-cost high-speed integrated circuits.

1.4 Challenges
The main challenge in designing high-speed links is to understand and combat channel
response restrictions. As the speed of link operation increases, the channel impacts that
were primarily neglected will have a significant effect on the link reliability. For instance,
frequency dependent loss and dispersion caused by channel degrade signal integrity and
introduce inter-symbol interference (ISI) and data-dependent jitter (DDJ) that increase
error probability. To optimize the link reliability, the ISI and jitter impact of the channel
response should be quantified. Furthermore, the error probability should be related to the
ISI and jitter, in addition to the channel noise.

If the channel response is known, Nyquist pulse shaping is used to eliminate the ISI
[15]-[17]. However, pulse shaping is typically not applied to high-speed communication
because the channel response is not necessarily known a priori. The alternative approach
that is feasible for high-speed links is adaptive equalization. Equalizer is a filter that is
designed to reshape the received pulse to minimize the overall effect of ISI and noise at
the sampling point. Adaptive equalizer automatically adjusts filter parameters to
accommodate unknown channel response and its variations over time. Adaptive
equalization algorithms based on fast fourier transform have been efficiently realized by
digital signal processors (DSP) at low frequency (multimega bits per second). However,
the prohibitively large power consumption of the DSP and analog-to-digital converters at
multigiga-bits-per-second speed makes such an approach impractical. Instead, realization
of adaptive equalizers at 10Gb/s and beyond requires analog high-speed adaptive
transversal filters and robust adaptation techniques with their associated circuitry.

The signal degradation induced by the channel is intensified by the nonidealities of the
receiver circuit. Integrated circuits eliminate the bandwidth limitations imposed by the
6

parasitic components of the packages and the wiring between the packages in a discrete
design. However, intrinsic device and metal interconnect parasitics on the chip still restrict
the maximum achievable bandwidth. Single-chip implementation of 10Gb/s systems
requires an understanding of the factors that limit the on-chip operation speed and
encourages development of circuit techniques and topologies for overcoming those
limitations.

1.5 Contributions
This thesis focuses on the analysis, design, and hardware implementation of
high-speed wireline integrated communication systems. The investigation of the
challenges described in Section 1.4 has led to original contributions in this thesis that can
be categorized into the following specific topics:

• Fundamental understanding of factors that affect signal integrity in high-speed data


links, as the link data rate increases.

• Design of high frequency silicon-based circuit topologies for the receiver.

• Development of novel architectures and signal processing techniques for maintaining


signal integrity in the high-speed links.

We will elaborate on these topics individually in the following and conclude the
chapter with the thesis organization.

1.5.1 Fundamental Issues: Signal Integrity

This work provides a fundamental understanding of data-dependent jitter (DDJ) from


a design perspective. Jitter is the deviation of the threshold-crossing times of data or clock
transitions from a reference time. In a conventional communication system, the clock that
is used to sample the signal and recover the data is derived from the received signal itself.
Therefore, the clock inherits the phase uncertainty of the data transitions in the received
7

signal. Understanding how data and clock jitter exacerbate bit error probability is
fundamental to the design of a high-speed link. In our work, we focus on DDJ that is
predominantly caused by system bandwidth limitation. The ISI resulting from the
bandwidth limitation shifts the threshold-crossing times and translates to jitter. We provide
a comprehensive analytical framework to model and predict DDJ caused by any linear
time-invariant (LTI) system [18]–[22]. Associating the LTI system response to the DDJ
provides insight for circuit and system designers for minimizing jitter and complements
conventional measurement-based methods. In addition, we can predict the DDJ
contribution of a system at any data rate from its step response. Experimental data verify
our model predictions for various systems with less than 7.5% error.

1.5.2 High-Speed Integrated Circuit Topologies in Silicon

We propose a method for bandwidth enhancement of wideband amplifiers [23]. Using


the bandwidth enhancement methodology, we demonstrate the first 10Gb/s 0.18µm
CMOS trans-impedance amplifier [24][25]. This methodology is based on two-port
broadband matching of multistage amplifiers. Passive components are introduced between
stages and form wideband networks with controlled transfer impedance functions. Device
parasitics are absorbed into the passive networks. As a result, the networks isolate
cascaded stages and avoid loading. Therefore, in theory, each amplifier stage can achieve
its maximum gain-bandwidth product set by the Bode-Fano limit [26][27]. The prototype
we implemented shows 2.4 times the bandwidth improvement over a design that does not
apply the technique.

1.5.3 Novel Architectures: High-Speed Signal Processing to Maintain


Signal Integrity

The first contribution to this topic is the development of a novel eye-opening monitor
that enables full integration of adaptive equalizers in the receiver high-speed front-end
[28][29]. An eye-opening monitor (EOM) is a block that evaluates the quality of the
8

received signal eye diagram and periodically reports a quantitative measure, which is
directly correlated to the signal quality. This output is used as a cost function for automatic
adjustment of the filter coefficients in an adaptive equalizer. Our proposed EOM can
effectively capture a two-dimensional image of the eye diagram shape at the output of the
equalizer. Its simple error detection mechanism allows implementation at very high speed.
The prototype implemented in 0.13µm CMOS was successfully tested up to 12.5Gb/s
[29]. It provides up to 68dB output error dynamic range that is sufficient for the
optimization algorithm of the equalizer coefficients.

The other thesis contribution in this area is a novel architecture for instantaneous
clockless demultiplexing [30]–[32]. Instantaneous data acquisition is required in
burst-mode communication systems, where the data stream arrives at the receiver in
asynchronous packets separated by unknown quiet intervals. Conventional narrowband
phase-locked loops require a long preamble with a large acquisition time and are therefore
not suitable. As an alternative to gated oscillators that require a full-rate clock for
operation, we have proposed a clockless finite state machine that recovers and
demultiplexes the received burst of data instantaneously. The architecture consists of a
combinational logic structure with immediate response and a bit-period-delayed feedback
loop. Therefore, every time a burst is received, the operation is initiated exactly in-phase
with the first bit and continues synchronous to the stream. We implemented a 1:2 clockless
demultiplexer based on this concept in a SiGe BiCMOS technology and verified its
operation at 7.5Gb/s.

1.6 Thesis Organization


The dissertation consists of seven chapters. Chapter 2 covers the basic principles of the
wireline communication systems. The objective of this chapter is to provide an
introduction to wireline communication transceiver architecture and familiarize the reader
with various channel impairments in high-speed applications that impact the link
9

reliability, i.e., probability of making an error. Each of the next four chapters, Chapters 3
to 6, cover one of the topics that was discussed in Section 1.5. The approach common to
all the chapters is to, first, discuss the prior art and introduce the novel concept of the
circuit topology or system architecture that is developed as part of the thesis. Then the
design steps are discussed. Finally, each chapter concludes by demonstrating a hardware
prototype and by providing experimental data from the measurements of the prototype
that verify the concept. Chapter 3 describes the contributions to the understanding of
data-dependent jitter from a circuit design perspective. Chapter 4 presents the
methodology for bandwidth enhancement of wideband amplifiers. The eye-opening
monitor technique is the subject of Chapter 5. Lastly, Chapter 6 discusses the development
of the instantaneous demultiplexing architecture. We conclude with a summary in Chapter
7 that covers the achievements of the thesis as well as suggestions for future research to
expand the results of the current work.
10

Chapter
Principles of
2 High-Speed
Communications
2.1 Trade-Offs in Link Design
The objective of the communication is to transfer information reliably from a
transmitter to a receiver [33]. The measure of reliability is the probability of making an
error in detecting the received information bit. A typical high-speed wireline link is
designed to transfer data at a specified rate, e.g., 10Gb/s, while maintaining a given level
of reliability, e.g., error probability of 10-12. The physical medium that acts as the channel
is selected based on the required bandwidth as well as the maximum amount of signal
attenuation that can be tolerated due to channel loss. Once the channel is selected, the first
step in link design is to derive the relationships between the error probability and the link
specifications, e.g., gain, bandwidth, sensitivity, jitter generation, and jitter tolerance.
Then, the parameters of the transmitter and the receiver blocks are determined from such
relationships as to minimize the error probability. Other parameters such as cost or power
dissipation that affect the practicality of the design are also considered at this stage.

In this chapter, we introduce the underlying concepts of a high-speed wireline link and
describe the first step mentioned above. We discuss the link design challenges caused by
the inter-symbol interference (ISI) and jitter. Then, we analyze the relationship between
the error probability, the ISI and jitter. We study the combined impact of the ISI and jitter
on the link reliability and demonstrate the trade-offs between the ISI, jitter, and system
parameters such as bandwidth. Finally, we provide a unified relationship that enables
minimization of the link error probability in the presence of both the ISI and jitter. The
assumptions and definitions of this chapter will be used throughout the dissertation.
11

2.2 Modulation Schemes


2.2.1 Modulation

The dominant modulation scheme that is used in high-speed links is two-level pulse-
amplitude modulation (2-PAM). In the 2-PAM, the binary information is encoded to two
signal levels. Typically, bipolar signals, i.e., symmetrical levels around a well defined zero
threshold, are selected. For instance, -1/2 represents binary “0” and +1/2 represents binary
“1”. This way, if the information source generates “0” and “1” with equal probability, the
data sequence will have a zero average. This is advantageous in reducing the wander jitter,
which is the long term deviation of data transition time from a reference time and happens
due to the drift or variation of the threshold.

The 2-PAM is also the dominant modulation method in optical communications.


However, the physical phenomenon that transfers the information in optical
communication is optical power or light intensity that can not take a negative value. The
optical signal levels are switched between a high value and zero, e.g., by modulating an
optical source to on and off states. Therefore, the unipolar 2-PAM modulation in optical
communication is typically referred to as the on-off keying (OOK) modulation.

The main reason for using 2-PAM in high-speed communication systems is its
simplicity. The 2-PAM signal can be generated by simply turning a transistor or a laser
source on and off. The detection mechanism is also simple, and it does not require accurate
power-level control. A main reason to use more complex modulation techniques such as
M-PAM or M-QAM is to achieve a higher data rate using the same channel bandwidth.
However, because the channel in wireline links has abundant bandwidth, the designers
tend to deploy simple 2-PAM modulation instead of a more complex technique and use a
faster device technology to achieve the high data rates. Nevertheless, more recently, as
data rates have hit the bandwidth limitations of copper transmission lines, it has become
reasonable to design circuits for implementation of more complex modulations such as the
12

4-PAM [34][35]. In such a case, although a more complicated slicer, clock, and data
recovery architecture is needed, the required bandwidth is only half of the bandwidth
required for the 2-PAM because the symbol rate is half of the data rate. In this chapter, we
assume that a 2-PAM modulation is used with amplitude 0 and 1 for information bits “0”
and “1,” respectively.

2.2.2 Symbol Coding

The signal shapes in high-speed links typically take a nonreturn-to-zero (NRZ) format,
which means that the pulses that represent each bit last for a full bit period, Tb. This is in
contrast to the return-to-zero (RZ) signaling, where the pulses last only for half of the bit
period. Figure 2.1 illustrates the NRZ and RZ representations of a “1011001” data
sequence with unipolar pulses. RZ is typically preferred in the long-haul optical
telecommunication networks, where the optical power is expensive, because the RZ
format has relaxed signal-to-noise requirements compared to the NRZ format [36]. In
other words, since the RZ pulses are on a shorter period of time, for a given average
transmitted power, they have higher peak power compared to the NRZ pulses. This results
in a lower bit-error probability because the optical receivers respond to the peak power.

1 0 1 1 0 0 1

RZ

NRZ
t
Tb

Figure 2.1: RZ and NRZ formats representing a “1011001” sequence


13

On the other hand, the RZ pulses have a larger transition density. They require a larger
channel and receiver bandwidth, twice as much as the NRZ. Therefore, the NRZ pulses
are the dominant format in high-speed wireline links due to their smaller bandwidth
requirements and simpler implementation. In this dissertation, we assume that all the data
sequences follow the NRZ format.

One potential problem with the NRZ format is the occurrence of long sequences of “0”
or “1,” also referred to as consecutive identical digits (CID) [37]. When a long sequence
of CID is transmitted, there is not any transition in the data for a long time. Consequently,
the receiver that extracts the timing information from the spacing between received data
transitions loses synchronization. To avoid the loss of synchronization, the data is encoded
before the transmission using run-length-limited code words. Such code words guarantee
a maximum number of CID bits. For instance the 8-bit/10-bit (8b/10b) encoding scheme
that was proposed by IBM [38] is widely used in several high-speed wireline applications,
such as Fibre Channel for storage networks and 10Gigabit Ethernet for local area
networks. The 8b/10b coding replaces a byte of information with 10 transmission bits. It
guarantees a maximum of five CID bits. In addition, it keeps dc balance of the signal by
allowing an equal number of “1s” and “0s” for transmission. A disadvantage of the 8b/10b
coding is the 25% increase in the data rate. Basically, to keep 10Gb/s data throughput, the
signaling speed must be increased to 12.5Gb/s because of the 25% data overhead added by
the encoding. This is undesirable in some applications such as SONET. Instead SONET
standard recommends using data scrambling (no overhead) or very low overhead
encoding. However, the link for SONET is required to tolerate up to 72 CID [36][39].

Coding techniques are also used for other purposes in data transmission such as error
correction [36] and spectral shaping [40][41]. In this dissertation, we assume that the data
sequence is a random binary sequence using 2-PAM NRZ signaling.
14

2.2.3 Power Spectral Density

A 2-PAM NRZ signal can be formulated as

x(t) =
∑ a k ⋅ pi ( t – k T b ) ( a k ∈ {0, 1} ) (2.1)
k = –∞

where pi(t) is the unit pulse function that is defined as

⎧ 1 0 ≤ t ≤ Tb
pi ( t ) = ⎨ . (2.2)
⎩ 0 otherwise

The coefficient ak represents the kth transmitted bit which is “0” or “1” with known
statistics. Because the transmitted bits are each a random variable, x(t) is a stochastic
process. We can show x(t) is a cyclostationary process [17], which means that the mean
and autocorrelation function of the process are time dependent and periodic. The average
power spectral density, i.e., the Fourier transform of the time-averaged autocorrelation
function, demonstrates how the signal power is distributed in the frequency domain. It is
an indication of the required bandwidth for transmission of a 2-PAM NRZ signal. The
power spectral density can be calculated as

1 1 2
S ( f ) = --- δ ( f ) + --- T b ⋅ [ sinc ( f ⋅ T b ) ] (2.3)
4 4
where the sinc(x) function is defined as

sin ( πx )
sinc ( x ) = ------------------- . (2.4)
πx
The first term on the right-hand side of (2.3) is the dc power that is caused by using
unipolar signaling. The double-sided power spectrum is plotted in Figure 2.2. Due to the
zeros of the sinc function, the spectrum experiences frequency nulls at integer multiple
frequencies of the data rate, 1/Tb. This indicates that the synchronization mechanism in the
receiver should be a nonlinear process because the received signal itself does not have any
information at the clock frequency. Figure 2.3 shows the same power spectral density on a
15

sinc2(fTb)

δ(f)

-3/Tb -2/Tb -1/Tb 0 1/Tb 2/Tb 3/Tb f

Figure 2.2: Power spectrum of 2PAM NRZ on linear axes


Amplitude

Frequency[Hz]
Figure 2.3: Power spectrum of 2-PAM NRZ on logarithmic axes

log-log scale. The gain is normalized to one, and the clock frequency is assumed to be
10GHz for a data rate of 10Gb/s. Evidently, the spectrum covers a broad frequency range
from dc all the way to the clock frequency. All wireline communication systems require a
wide bandwidth channel and circuit blocks to allow transmission of broadband NRZ
signal with minimum distortion. Bandwidth restrictions of the channel and/or receiver
circuits are the primary cause of signal impairment in high-speed communication, which
limits the link reliability as we will discuss later.
16

Although the NRZ modulation scheme requires a broadband channel and receiver,
excessive bandwidth in the receiver can be harmful to the receiver sensitivity because
wider bandwidth results in a larger integrated noise power. Therefore, from a sensitivity
standpoint, an optimum bandwidth exists that maximizes the sensitivity by balancing
between performance degradation due to the inter-symbol interference (small bandwidth)
and noise (large bandwidth). The conventional rule of thumb is to choose the receiver
bandwidth equal to 70% of the data rate [36]. We will analytically demonstrate the validity
of this rule and its underlying assumptions in Section 2.3. The receiver typically consists
of several blocks such as the pre-amplifier, the main amplifier, and the equalizer. The
individual bandwidth of each of these blocks should be larger than 70% of the data rate
because when the blocks are cascaded the overall bandwidth is reduced. We will consider
this in Chapter 4 when proposing bandwidth enhancement techniques for wideband
amplifiers.

2.3 Link Reliability


2.3.1 Eye Diagram

The eye diagram of a data sequence is a form of representation of the signal that
provides insight about the quality of the signal. As we will discuss in Section 2.3.2,
Section 2.3.3, and Section 2.5.1, a received signal has several characteristics, such as
amplitude noise, inter-symbol interference, or jitter, that affect the probability of
extracting correct information from it. The eye diagram of a signal contains information
about such characteristics of the signal. To generate the eye diagram, the signal is divided
into frames where each has a length of an integer multiple of the symbol period (bit
period, Tb, in the case of 2-PAM modulation). Then all the frames are overlapped to create
a single diagram with one frame length that contains several traces of the signal. An
example for a 2-PAM eye diagram with the length of 2.Tb is shown in Figure 2.4, which
looks like an eye, hence the name.
17

2Tb

Figure 2.4: Creation of the eye diagram with the length of 2.Tb from signal

2.3.2 Bit Error Rate (BER)

A common performance measure for the reliability of a communication link is the


probability of making an error in transmission of information. This parameter is estimated
by the bit error rate (BER), which is defined as the ratio of the number of errors to the total
number of bits transmitted. One of the primary impairments that causes errors in
communication is noise. Noise is modeled as an additive component in the input of the
receiver, as shown in Figure 2.5 for a simplified communication link. The distribution of
the noise is assumed to be Gaussian with white (flat) power spectral density. Amplitude
fluctuations at the sampling point due to noise can inject errors in the detection of the
symbols, as illustrated in Figure 2.5 for a 2-PAM signal.

The error probability or BER for a simplified link as in Figure 2.5 can be calculated
from the noise distribution at the sampling point. We have

BER = P ( 0 ) ⋅ P ( 1 0 ) + P ( 1 ) ⋅ P ( 0 1 ) . (2.5)

P(0) and P(1) are the probabilities that the transmitted bit is “0” and “1,” respectively. If
we assume “0” and “1” are equiprobable, P(0)=P(1)=0.5. P(0|1) is the probability of sam-
pling a “0” if the transmitted bit is “1.” This is equal to the area under the tail of the noise
distribution below the threshold level, as illustrated in Figure 2.6. Furthermore,
18

noise

r(t)
Transmitter Channel + r(kTb)

fs=1/Tb

r(t) r(kTb)

clock Bit Error

Figure 2.5: Bit error generation due to noise in a symbol detection-based receiver

P(1|0)=P(0|1) if the noise distribution at the zero-amplitude level and the one-amplitude
level are equal. Therefore, the BER is simplified to

BER = Q ⎛ ------⎞
1
(2.6)
⎝ 2σ⎠

for a Gaussian noise source with cumulative distribution function of Q(.) and standard
deviation of σ. Equation (2.6) is not a very accurate approximation of BER in a real
high-speed wireline link. Several other factors including inter-symbol interference (ISI),
data timing jitter, and sampling clock uncertainty will affect the BER. We will introduce
these issues in the following sections and study their effects on the link BER.

V
TH

0 V 1
TH

Figure 2.6: The BER calculation from the area under the tail of the noise distribution
19

2.3.3 Inter-Symbol Interference (ISI)

In reality the noise is not the only impairment of the communication channels. The tail
of a received pulse shape associated with a symbol can last for longer than a symbol
period, Tb, and therefore it can interfere with its neighboring symbols. This effect is called
inter-symbol interference (ISI). The received signal for a 2-PAM NRZ, as x(t) in (2.1), can
be written as

r( t) =
∑ a k ⋅ po( t – k T b ) + n ( t ) ( a k ∈ {0, 1} ) (2.7)
k = –∞

where n(t) is additive noise and po(t) is the received pulse shape. The signal r(t) is sampled
at times t=Ts+mTb to regenerate the symbols, where m is an integer and 0<Ts<Tb.

r ( T s + mT b ) = a m p o ( T s ) + ∑ a k ⋅ po ( T s + ( m – k )T b ) + n ( T s + mT b ) . (2.8)
k = – ∞, k ≠ m

The second term on the right hand side is the ISI term that affects the decision of each
symbol. Nyquist proposed conditions on the overall response as well as pulse shapes that
completely null the ISI term [15]. Based on his works, the classical method to eliminate
ISI is to design transmit and receive filters such that the overall received pulse shape is a
Nyquist pulse, i.e., po(Ts+(m-k)Tb)=0 for m ≠ k , [16][17][40].

The channel bandwidth limitation is the primary cause of ISI. Consequently, it


becomes an issue in high-speed communication, as faster data rates are required to be
transmitted on the same channels. In electrical transmission line channels, the frequency-
dependent loss is the main source of bandwidth limitation and thus the main cause for ISI.
This frequency-dependent loss is mainly caused by the skin effect of the conductor and the
dielectric absorption. Figure 2.7 shows an example transfer function of a stripline
transmission line with FR4 dielectric [42]. In addition to the channel, the bandwidth
20

Figure 2.7: Loss contributions from conductor and dielectric in a FR4-based stripline [42]

limitation of the receiver blocks, and electromagnetic reflection due to impedance


mismatch between connections or cables can exacerbate the overall ISI.

Dispersion is another significant source of the ISI. Dispersion occurs when the phase
of the channel response transfer function is not a linear function of the frequency, and thus
the group delay, which is the derivative of the phase, will be frequency dependent.
Consequently, when a signal with broad spectrum travels in the channel, various
frequency components get delayed by different amounts. The overall effect is to broaden
the pulse shape of the signal in time domain and thus cause ISI. Both electrical and fiber
optic channels are dispersive.

Dispersion is a major source of ISI in optical fiber [43]. In multi-mode fibers (MMF),
modal dispersion is dominant. Modal dispersion is caused when various optical modes are
excited on the fiber and travel at different speeds. Modal dispersion becomes more
problematic at longer transmission distances because optical modes get separated more,
and thus received pulse causes severe ISI. An example is shown in Figure 2.8 for 800m of
MMF at 10Gb/s. In single-mode fibers (SMF), modal dispersion is absent and chromatic
dispersion and polarization-mode dispersion are dominant [43]. Chromatic dispersion is
mainly caused by the frequency-dependent refraction index of the fiber material. The
21

Optical Modes
Multi-Mode Fiber

Laser

Output

Figure 2.8: The output of an 800m MMF channel is severely distorted due to modal dispersion

fundamental optical mode that is excited on an SMF has nonzero spectral width and thus
will disperse because various spectral contents will experience a different index of
refraction. Polarization-mode dispersion is due to the group velocity difference of
orthogonal polarization modes in an SMF that does not have a perfectly cylindrical shape.
This is shown in Figure 2.9.

2.3.4 Equalization

Equalization refers to any technique used in link design to compensate for the
impairments induced by ISI. For example, the equalizer can be a filter at the receiver that

x
x

y
SMF
y

Figure 2.9: Polarization mode dispersion in a SMF with noncylindrical core


22

reshapes the channel's undesirable frequency response such that the final received pulses
are ISI free. An example is a pulse that satisfies the Nyquist criterion [15][17] and is
commonly called a Nyquist pulse shape. The Nyquist criterion was implicitly introduced
in Section 2.3.3 for the received pulse shape po(t). It can be stated as follows.

A received pulse shape po(t) satisfies the Nyquist criterion and is called a Nyquist
pulse shape if


p o ( t = kT b ) = ⎨ 1 k = 0 . (2.9)
⎩ 0 k≠0

If the equalizer filter is designed such that when cascaded with the channel the overall
response satisfies (2.9), the link will be ISI free according to (2.8) and for Ts=0.

Equalization can be applied either in the transmitter or the receiver. The transmitter
equalizer is sometimes referred to as the transmitter pre-emphasis. It amplifies the high
frequency content of the signal at the transmitter to compensate for the high frequency
attenuation of the channel after the signal travels through it. The advantage of the
transmitter pre-emphasis is that it does not amplify the receiver noise because the
compensation process takes place in the transmitter. Nevertheless, the transmitter
pre-emphasis can cause large crosstalk between neighboring transmission lines in a
parallel link due to the strength of the signal high frequency content. Receiver equalization
is intended basically to add a filter at the receiver to minimize the overall effect of ISI and
noise at the sampling point. It is typically preferred to the transmitter equalization because
the equalizer can be made adaptive to accommodate the unknown channel response and its
variations over time. In this work we focus on receiver equalization.

The straightforward equalization technique is to design the filter such that the overall
response to the cascade of the channel and the filter satisfies the Nyquist criterion for zero
ISI in (2.9). This technique is known as the zero-forcing (ZF) technique [44]. Essentially,
the ZF algorithm forces the filter transfer function to be equal to the inverse of the
channel's transfer function. Evidently, the ZF algorithm only accounts for the ISI
23

impairment and neglects the noise. In band-limited channels, the equalizer filter transfer
function with the ZF criterion becomes a highpass filter that amplifies the high frequency
content of the signal to compensate for the channel’s high frequency attenuation.
However, the filter will also amplify the noise and will potentially degrade the received
signal to noise ratio. An alternative approach to the ZF method is to use the mean square
error (MSE) algorithm to design the filter. The MSE algorithm considers the noise and ISI
together and avoids extensive noise amplification at the receiver by allowing occurrence
of partial ISI. The MSE criterion minimizes the BER rather than the ISI. The filter
parameters are designed to minimize the number of decision errors on the received
symbols.

The previous equalization techniques are referred to as feedforward equalization


(FFE), because the filter is inserted in the feedforward signal path, as shown in
Figure 2.10(a). An alternative approach is decision feedback equalization (DFE), where
the filter is inserted in the feedback path, as Figure 2.10(b) illustrates [16][17][40][44].
The filter input is the decision result of the previously received bits. Therefore, DFE is a
nonlinear equalization. One way to design the filter in DFE is to match its transfer
function to the transfer function of the channel. Assuming the receiver decisions are
correct, the filter reproduces the ISI that would have been generated by the channel. This
ISI is then subtracted from the current symbol. The advantage of the DFE architecture is

Equalizer +
Pre-Amp Filter Pre-Amp +
-
Equalizer
Filter

(a) (b)

Figure 2.10: Equalizer filter in two topologies (a) FFE (b) DFE
24

that because it acts on the receiver decisions that are noise free, the DFE structure does not
cause noise enhancement. However, the DFE architecture can potentially cause error
propagation in cases where the BER is very large, when the assumption for correct
receiver decisions is violated. A combination of FFE and DFE can be implemented to
compensate linear and nonlinear ISI and to avoid significant noise enhancement.

In low-speed communication links, e.g., voice channels, magnetic recording channels,


and digital subscriber loop (xDSL) data channels, equalization is performed digitally as
part of the baseband digital signal processing (DSP). The filtering process is done in
frequency domain leveraging efficient Fast Fourier Transform (FFT) algorithms.
Unfortunately, none of these luxuries are available at high-speed data rates, such as
10Gb/s. The required precision and clock-rate at such speeds for analog-to-digital
conversion and the DSP processors drive the power consumption of such implementations
to very unrealistic and impractical values. At such frequencies, analog implementation of
the equalizer filter is more favorable.

The equalizer core in high-speed implementations is a finite impulse response (FIR)


filter. The FIR filter components are shown in more detail in Figure 2.11. This form of
implementation of the FIR filter is known as direct form FIR, transversal filter, or
tapped-delay line [45]. In analog implementation, the delay cells can be realized with
active [46][47] or passive elements [31][48] or a combination of those [49]. Passive delay
cells are based on broadband inductor (L) capacitor (C) networks, or transmission line
structures. For instance, Figure 2.12(b) shows a 3-section constant-k filter topology based
on the pi-section LC ladder network of Figure 2.12(a) [50]. It consists of L and C elements
that overall represent a lumped model of a transmission line. The transfer function of the
structure, within its passband, resembles an ideal delay line with the delay value

T D = n LC (2.10)

where n is the number of LC sections. The large layout area of the inductor elements is a
disadvantage for this topology when implementing an integrated delay cell, especially
25

In TD TD TD

c0 c1 c2 cN

Σ
Out

Figure 2.11: FIR filter with tapped-delay line topology and N+1 taps

L
L L L

C/2 C/2
C/2 C C C/2

(a) (b)

Figure 2.12: The constant-k filter-based LC delay line: (a) pi-section (b) 3-section

when n is large. However, because the passive delay cell is a linear system, it is desirable
when a linear equalizer is needed to preserve the amplitude information of the signal as
the signal travels in the delay line. An FFE equalizer that compensates linear ISI is an
example of such a case.

Equalization is effective when the transversal filter coefficients are adjusted


appropriately to compensate for the channel ISI. However, in most practical cases, the
channel response is unknown at the link startup time and/or it is time-varying. For
instance, in a chip-to-chip link, variations of the geometric shape of the interconnect
during fabrication, the improper matching at the connections, or the via and stub loading
make it practically impossible to predict the channel response. Another example is the
MMF-based optical communication, where the channel response is usually unknown in
startup because of the variations in the excitation condition of the laser, the geometrical
26

shape of the MMF core, and the fiber length. Furthermore, factors such as changes in the
environmental condition or aging may result in a time-varying channel response. Adaptive
equalization, which was first proposed by Lucky [16][51] for communication systems,
remedies these issues by automatically adjusting filter coefficients and constantly tracking
any time variations in the channel response.

Adaptive equalization can be implemented with or without a training sequence. In the


former case, a set of known symbols is transmitted over the channel to the receiver. The
equalizer has a priori knowledge of these symbols and determines filter coefficients to
minimize the ISI for them. A similar approach for equalization using training sequence is
to perform channel estimation based on the received training sequence. The coefficients of
the filter can be calculated based on the estimated channel response, e.g., with the ZF
criterion. Training sequence-based equalization is not used in high-speed communication,
mainly because it requires complex signal processing with power hungry implementation
that is simply not yet practical at 10Gb/s or beyond.

An effective adaptive equalization algorithm that has received more attention for
high-speed implementation recently is the least mean square (LMS) algorithm (e.g., [52]).
This is due to its ease of implementation. The LMS algorithm is an MSE-based algorithm,
in which the optimization criteria is defined in order to minimize the mean square of the
difference between the filter output and the receiver's decision. The equation for updating
the coefficient of tap m can be simplified to [17]

C m ( k ) = Cm ( k – 1 ) + ∆ ⋅ ε k – 1 ⋅ xm ( k – 1 ) (2.11)

where εk-1 is the difference between filter output and receiver decision and ∆ is a scaling
parameter that affects the convergence speed. The hardware structure of the adaptive
equalizer filter with the LMS algorithm can be implemented as illustrated in Figure 2.13.
The architecture only requires the implementation of high-speed summation and multipli-
cation to add the LMS algorithm to the transversal filter structure of Figure 2.11, which is
feasible in today's advanced integrated technologies. As can be seen from Figure 2.13, the
27

In TD TD

c0 c1 c2
Σ Σ Σ

Σ

Out
εk -
+

Figure 2.13: The implementation of the LMS algorithm for adaptive equalization

LMS algorithm is a decision-based optimization algorithm, i.e., the parameter that is min-
imized depends on the decision of the receiver. In blind equalization algorithms that do not
use any training sequence, this can be a potential disadvantage for the LMS algorithm. In
high BER conditions many of the receiver decisions may be incorrect, which will result in
a slow convergence of the algorithm. In Chapter 5 we discuss one of the contributions of
this thesis, i.e., an alternative technique for adaptive equalization.

2.3.5 ISI Impact on BER

The calculations in Section 2.3.2 assume the ISI is zero at the sampling instance. In
practice, the noise margin at the sampling point is reduced because of the ISI, and thus the
probability of error is increased. Using (2.5) and (2.8) and assuming equiprobable data
bits, the condition for finding probability of error can be rewritten as
28


⎛ ⎞
P e = P ⎜⎜
1---⎟

⎝ k = – ∞, k ≠ m
a k ⋅ p o( T s + ( m – k )T b ) + n ( T s + m T b ) >
2⎟

(2.12)

and can be calculated from the joint probability distribution of the noise and ISI. However,
finding this joint probability distribution is very complicated for an arbitrary system [16].
Lucky et al. provide a solution for Pe in the form of a finite sum [16]. They assume that
only a finite number of ISI terms in (2.8) affect the joint probability distribution of the
noise and ISI. The suggested solution requires tedious numerical computation for finding
the Pe for any system and does not provide an insight to correlate the Pe to the system
parameters. An alternative approach for finding the impact of the ISI on Pe is to find an
accurate bound on the Pe in the presence of the ISI. For example, Saltzberg provided a
tight bound on Pe that depends only on the noise variance and the samples of the received
pulse shape [53]. Therefore, the complexity of the bound grows linearly with the number
of ISI terms. Excellent tutorials on several computationally efficient methods to calculate
Pe for the ISI channels can be found in [54]–[57].

In this section we estimate the error probability by providing a simple relationship


between the BER and the system bandwidth that is also helpful in understanding the
trade-off between noise and the ISI for various system bandwidths. We perform our
calculations based on a first-order linear time-invariant (LTI) system and leverage the
results to estimate the Pe based on the practical system parameters such as bandwidth.

2.3.5.1 First-Order LTI System

If the link has a first-order system response with an associated time constant, τ, the
received pulse shape can be written as
29


⎪ 0 t≤0

⎪ t
– --
⎪ τ
po ( t ) = ⎨ 1–e 0 ≤ t ≤ Tb (2.13)

⎪ t
– --
⎪ ⎛ --- – 1⎞ ⋅ e
1 τ
Tb ≤ t
⎪ ⎝α ⎠

–Tb ⁄ τ
where we define α ≡ e that relates the system time constant and the bit period. The
ISI term for a0 can be calculated by replacing (2.13) in (2.8) for m=0 as

–1 T –1
-----s
Tb
∑ ∑
–k–1
ISI = a k ⋅ p o( T s – k T b ) = α ak ⋅ ( 1 – α ) ⋅ α (2.14)
k = –∞ k = –∞

where the sum goes only to k=-1 because we assume the system is causal. Ts is the sam-
pling time offset from t=0 and 0 ≤ T s ≤ T b . The sampled value of the current symbol, i.e.,
a0, can be calculated from the first term on the right in (2.8) as

T
-----s
Tb
p ( Ts ) = 1 – α . (2.15)

The optimum sampling point for the first-order system is at Ts=Tb because (2.15)
reaches its maximum at this sampling point. Equation (2.14) demonstrates that the
interference impact of the prior bits decreases exponentially. When the impact of only one
prior bit, i.e., a-1, is significant, ISI terms will be concentrated around two mean values,
ISI0 and ISI1. The two mean values can be calculated from the expected value of ISI in
(2.14) when it is conditioned on the value of a-1. We have

T
-----s + 1
1 Tb
ISI 0 = E { ISI a– 1 = 0 } = --- α (2.16)
2
T
-----s
α Tb
ISI 1 = E { ISI a –1 = 1 } = α ⎛⎝ 1 – ---⎞⎠ (2.17)
2
30

* =
p o(

p o(
VTH=0.5
ISI 1
ISI 0

Noise
T s)

T s)
+IS

+IS
ISI distribution I0
distribution Total distribution

I1
Figure 2.14: Total amplitude distribution at the sampling point when ISI impact of one bit is taken into
account

where E{.} is the expected value. The two mean ISI terms perturb the amplitude at the
sampling point. Because of the stochastic nature of the data, the ISI at any point can be
modeled by a random variable. Then, the ISI distribution can be represented by two prob-
ability mass functions, i.e., two delta functions at the values ISI0 and ISI1, with probability
weight p and (1-p), respectively, where p is the probability of a-1=0. The overall ampli-
tude distribution can be found by the convolution of the ISI distribution and the Gaussian
noise distribution as shown in Figure 2.14.

The optimum slicing threshold, VTH, can be calculated from the average of the four
possible mean signal levels in Figure 2.14, which simplifies to VTH=0.5 independent of Ts.
N0
If the receiver input noise is white with double-sided power spectral density ------ , the
2
amplitude noise variance at the sampling point is reshaped by the first-order system
transfer function. The total noise power can be calculated from

∞ N0
------ N

2 2
σ = --------------------- df = -----0- . (2.18)
2 2 4τ
1+τ ω
–∞

Similar to Section 2.3.2, we can now calculate the total BER as


31

0.5 – ISI 0 0.5 – ISI 1


BER = --- ⎛ Q ⎛ -----------------------⎞ + Q ⎛ -----------------------⎞
1
4⎝ ⎝ σ ⎠ ⎝ σ ⎠ (2.19)
p o ( T s ) + ISI 0 – 0.5 p o ( T s ) + ISI 1 – 0.5
+ Q ⎛ ----------------------------------------------⎞ + Q ⎛ ----------------------------------------------⎞ ⎞
⎝ σ ⎠ ⎝ σ ⎠⎠

which can be evaluated for different sampling time by using (2.15)–(2.18). Figure 2.15
compares the BER at various signal-to-noise ratios (SNR) in the zero-ISI case in equation
(2.6) with the BER in the ISI channels from equation (2.19), when the systems have the
same noise bandwidth, i.e., equal σ. The BER curves are plotted for various 3dB band-
width-to-bit rate ratios (BW/BR) for the ISI channel. The figure shows that the ISI
degrades the performance of the link at large SNR values when the ISI dominates over
noise. Also, as the bandwidth-to-bit rate ratio decreases, the BER degrades more.

0
BW/BR=0.5
BW/BR=0.75
Zero ISI

-5
log10[BER]

-8
-10
-10 -12
-14
21 22 23 24
-15
10 15 20
SNR [dB]
Figure 2.15: The BER vs. SNR for various normalized bandwidths compared to the zero-ISI BER of
equation (2.19), sampled at optimum point, i.e., Ts=Tb
32

-2

-4

-6
log10[BER]

-8
Ν0 [V2/Hz]
-10
5e-3
6e-3
-12 7e-3
8e-3

-14
0.2 0.4 0.6 0.8 1 1.2
f-3dB/Bit Rate

Figure 2.16: ISI and noise trade-off as normalized bandwidth variations justifies existence of a
minimum for BER

Figure 2.16 relates the BER to the system 3dB bandwidth, f-3dB, for various noise
power spectral density, when the signal is sampled at the optimum point, i.e., Ts=Tb.
Evidently, at very small f-3dB, the ISI is severe and limits the BER. However as bandwidth
gets excessively large, the noise power that is injected into the receiver is the dominant
contributor to the link-quality degradation and causes higher BER. Consequently, there is
a trade-off between the system noise and the ISI impact. There exists an optimum
bandwidth that minimizes BER. The optimum bandwidth in the case of the first-order
system is around 40% of the bit rate when Ts=Tb. In typical wireline link architectures the
sampling clock is in the middle of the eye at Tb/2. Although this results in simple
hardware implementations, Tb/2 is not necessarily the optimum sampling point. The BER
for when sampling occurs in the middle of the eye, i.e., Ts=Tb/2, is plotted in Figure 2.17.
The same trade-off exists between the noise and the ISI. However, the optimum
33

-2
Ν0 [V2/Hz]
3e-3
-4 4e-3
5e-3
6e-3
-6
log10[BER]

-8

-10

-12
0.3 0.5 0.7 0.9 1.1 1.3 1.5
f-3dB/Bit Rate
Figure 2.17: The optimum bandwidth for minimum BER when sampling point is in the middle of the
eye at Tb/2

bandwidth is now at around 70% of the bit rate which agrees with the well-known
optimum bandwidth-to-bit rate ratio for best sensitivity in broadband receivers [36].

We also notice by comparing Figure 2.16 and Figure 2.17 that for equal input noise,
the location of the sampling point affects the minimum achievable BER. In fact, the plot in
Figure 2.16 can be reproduced for all possible sampling points, and the optimum sampling
point can be determined from the plot that results in the smallest minimum-achievable
BER. The optimum receiver bandwidth is also determined from the same plot. We will
elaborate on this topic for the link design when we add the effect of jitter to the BER. We
will analytically derive the two-dimensional BER contours that allow the designer to
simultaneously determine the optimum bandwidth and the optimum sampling point to
minimize the BER.
34

2.3.5.2 Second-Order LTI System

One characteristic of a first-order system is the monotonic step response. In real


systems the step response can have oscillatory tail, e.g., because of multiple
electromagnetic reflections. The relationship between the response of such a system and
the the BER can be modelled by studying the link BER of an under-damped second-order
LTI system. The transfer function of an all-pole second-order LTI system can be written as

2
ωn
H ( s ) = ------------------------------------------- (2.20)
2 2
s + 2ζω ns + ω n

where ζ is the damping factor and ωn is the natural frequency. The step response of an
under-damped system, i.e., when ζ<1 is

– ζω n t
sin ⎛⎝ ω n 1 – ζ t + cos ζ⎞⎠
1 2 –1
s ( t ) = 1 – ------------------ e (2.21)
2
1–ζ
and the system 3dB bandwidth is

ωn 2 2 2
f – 3dB = ------ 1 – 2ζ + 1 + ( 1 – 2ζ ) . (2.22)

Figure 2.18 shows the pulse response of a second-order system for two different values of
ζ. For each value, the pulse response is plotted for four different f-3dB. We can carry out
the same procedure as in Section 2.3.5.1 to find the BER equation. All of the ISI terms that
have significant impact on the BER from (2.12) are included in the calculations. In addi-
tion, for every given pair of ζ and f-3dB, the BER can be calculated at several sampling
points in the unit interval. The trade-off between the noise and the ISI is also present in the
second-order system. This trade-off, and hence the existence of an optimum system band-
width to minimize the BER, can be seen by plotting the BER vs. f-3dB. In Figure 2.19, we
have plotted the BER contours that show the BER values vs. f-3dB at various sampling
points, for three different values of ζ. The cross section of the contours for a constant Ts
that are in parallel to the y-axis show the noise-ISI trade-off. The optimum f -3dB that
35

1.2

0.8
Amplitude [V]
f-3dB
0.6 0.3
(a) 0.5
0.4
0.75
1
0.2

-0.2
0 2 4 6 8 10
t/Tb [UI]
1.2

1
Amplitude [V]

0.8 f-3dB
0.3
0.6
0.5
(b) 0.75
0.4
1
0.2

-0.2
0 2 4 6 8 10
t/Tb [UI]

Figure 2.18: Pulse response of a second-order system at various normalized 3dB bandwidths: (a) ζ=0.5
(b)ζ= 2 ⁄ 2

results in the minimum BER occurs around 70% of the bit rate. Furthermore, the contours
can be used to select the sampling point that results in the minimum achievable BER.

Similar calculations can be performed for any linear time-invariant (LTI) system,
when the system pulse response is available. The BER relationship that includes the ISI
impact can be derived as in (2.19), which provides insight about the relationship of the
response of the system and the BER. One contribution of this thesis is to use a similar
36

1.2 1.2

-4
1.1 1.1 -14

-4

-12
-14

-12

-8
-8
1 1

-1

f-3dB/Bit Rate
f-3dB/Bit Rate

-1
0.9 -16
0.9

-12
-14
0.8 0.8

-4

-16
-4
-14

-1
-8
-14 0.7

-8
2
0.7 -12
-1

-1 -14
-1
-8 -12

6
0.6 -4
0.6 -8
-4
-1
0.5 0.5

-1
-4
0.4 0.4 -4
-1 -1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Ts/Tb [UI] Ts/Tb [UI]


(a) (b)

1.2
-8
-4

-12
-1

1.1 -16
-16
1
f-3dB/Bit Rate

0.9
-20
-8

0.8
-4

-12
-1

-1

0.7
6

-2
0

0.6 -8
-4 -1
2 -16
0.5
-1

0.4 -8
-4
-1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Ts/Tb [UI]
(c)
Figure 2.19: The contours of log10[BER] for N0=4e-3 v2/Hz: (a) ζ=0.4 (b) ζ=0.5 (c)ζ= 2⁄2

approach in Chapter 3 to find a general relationship between the system response and the
data-dependent jitter (DDJ) to minimize DDJ and improve link reliability.

2.4 Wireline Communication Transceiver


2.4.1 General Architecture

Although there exist several applications for wireline communication, as it was


discussed in Section 1.2, the general transceiver architecture that is used in their
37

implementation is more or less the same. Table 2.1 lists some of the wireline standards
developed for 10Gb/s communication with various applications. Differences such as
transmission distance or power consumption impact the design parameters such as
channel type, number of repeaters, gain budget, and jitter budget. Figure 2.20 illustrates
the general architecture of a wireline communication transceiver, also known as a serial
link, that can be applied to any of the standards in Table 2.1.

Table 2.1: Various high-speed wireline communication standards

Standard Bit Rate BER Application Channel


LAN Copper or
10Gigabit Ethernet Family 10Gb/s 10-12
Backplane MMF/SMF*
10Gigabit Fibre Channel 10Gb/s 10-12 SAN MMF/SMF
Long Haul
SONET OC-192, OC-768 10/40Gb/s 10-12 SMF
Telecomm.
*MMF: multi-mode fiber SMF: single-mode fiber

On the transmit side, low-speed data arrives at the multiplexer that serializes the
parallel data into a single high-speed serial data sequence synchronous to the transmitter
clock. The driver either directly sources the data to the electrical transmission line or
drives an optical modulator that modulates the data onto optical pulses, which then get
transmitted over the fiber optic channel. In both cases, several channel impairments
degrade the quality of the high-speed signal until it arrives at the receiver. The degraded

Electrical or Optical Channel


Mux Driver
n

Tx PLL

Clock & Data


Pre-Amp Equalizer DeMux
Recovery
n

Figure 2.20: General architecture of a serial link


38

signal is amplified with a low noise wideband pre-amplifier. Then, the equalizer filter
partially revives the signal by reducing the ISI and increasing the signal-to-noise ratio.
Next, a sampling clock is extracted from the signal by a clock recovery phase-locked loop
(PLL). The clock is used to sample the received signal to retime and recover the data. The
clock is also used in a demultiplexer to deserialize the single data sequence back to the
original data in parallel lines.

If the channel is formed by an optical fiber, the driver is connected to an optical


modulator that modulates a laser source with the input data. On the receive side of an
optical link, the light is shed on a photodetector that generates an output electrical current
proportional to the received optical power. Therefore, the pre-amplifier will be a
trans-impedance amplifier with small input impedance. The focus of this thesis is on the
receiver side of the architecture.

2.4.2 Channel

The channel can be electrical or optical. The simplest electrical channel is unshielded
twisted-pair copper wire such as the ones used in Category 5 (CAT5) that consists of 4
pairs of twisted pair and is commonly used in 10/100Mb/s Ethernet LAN. Chip-to-chip
and backplane communication at multi-Gb/s require channels with less loss at high
frequencies. They use coaxial cable or controlled impedance PCB microstrip transmission
line or stripline. However, the loss of such channels is not tolerable either, when the
transmission distance is above hundreds of meters. Multi-mode fiber (MMF) is deployed
for longer than 100m transmission. The dominant impairment of the MMF is modal
dispersion that is caused by the difference in the propagation velocity of the various
excited optical modes, as was discussed in Section 2.3.3. Since the MMF and the electrical
channels discussed above mainly induce linear distortion on the signal, they can be
modeled with a linear system. We will make this assumption throughout the dissertation
that the channel can be modeled with a linear time-invariant (LTI) system. Therefore, all
39

the analysis results from the thesis contributions can be generally applied to all of the
channels above.

2.4.3 Pre-Amplifier

The main function of the pre-amplifier is to amplify the received weak signal to the
sensitivity level of the next stage in the receiver. The stages following the pre-amplifier
often require fixed-minimum swing at their input, e.g., in emitter-coupled logic (ECL).
While the required output swing of the pre-amplifier is constant, the amplitude of the input
signal can take a wide range of values depending on the transmitted power and the channel
attenuation. Therefore, the pre-amplifier needs to have a wide dynamic range and high
gain. In addition, the pre-amplifier should be low noise to have minimum impact on the
signal-to-noise ratio. Figure 2.21 shows an example schematic of a pre-amplifier for an
optical link with a second main-amplifier stage. The main amplifier is in the form of a
limiting amplifier (LA) or automatic gain-controlled (AGC) amplifier for maintaining a
constant output amplitude. In this example, the pre-amplifier is a trans-impedance

Rf

Amp Amp

TIA Main Amplifier

Figure 2.21: The front end of an optical communication receiver with the photo detector and a
shunt-feedback trans-impedance amplifier (TIA)
40

amplifier (TIA) with shunt-feedback configuration. The low input impedance of the TIA
absorbs most of the current generated by the photodetector. Also, it avoids bandwidth
limitation that can be caused by the large photodetector capacitance. Designing a TIA with
large gain and bandwidth and reasonable sensitivity is challenging particularly in CMOS
technologies due to their poor device parasitic components or low-current unity-gain
frequency, ft. Chapter 4 discusses this issue and provides a methodology for overcoming
these challenges.

2.4.4 Adaptive Equalizer

We elaborated on the need for adaptive equalization in high-speed links in


Section 2.3.4. The channels we consider in this dissertation, such as the electrical
transmission line or the MMF only impose linear distortion and can be modeled with an
LTI system. Therefore, in most implementations, a feedforward adaptive equalizer
suffices to minimize the ISI imposed by the channel. In Chapter 5 we discuss a
contribution of the thesis that proposes a new architecture for adaptive equalization based
on an eye-opening monitor system.

2.4.5 Clock Recovery

As can be seen from Figure 2.3, the 2-PAM NRZ data sequence has zero energy at the
data rate frequency and its integer multiples. Therefore, the received signal does not
contain any direct component of the timing information from the transmitter clock. In
addition, the signal has travelled over a channel with an arbitrary length that causes an
unknown delay or phase for the signal at the receiver. In a symbol detection-based
scheme, a synchronous clock is required to sample each signal at an optimum sampling
point to recover the data. Therefore, in such systems, a synchronization technique or clock
recovery is needed.
41

Loop Filter VCO


Data Clock
Phase
Detector

Figure 2.22: PLL-based clock recovery architecture

Clock recovery methods for communication applications can be categorized into two
groups: Feedforward and feedback clock recovery [41]. Feedforward methods generally
comprise of a nonlinear element in front of the signal for generations of the spectral lines
at the clock frequency followed by a very high-quality bandpass filter to extract the clock.
The nonlinearity can take many forms, e.g., derivative [58][60] or square law [59]. It is
very costly to integrate a high-quality bandpass filter at 10Gb/s [60]. Therefore
feedforward techniques are rarely deployed in high-speed wireline communication.

Feedback clock recovery is based on a phase-locked loop (PLL) structure. A


simplified architecture is shown in Figure 2.22. A voltage-controlled oscillator (VCO)
generates the required sampling clock. The frequency and phase of the VCO are
controlled by the output of the loop filter to track and minimize the error between the data
transition phase and the clock. Although the PLL-based clock recovery acquires input data
phase and locks to it to always keep the sampling phase at the optimum point, it partially
filters high-frequency timing variations of data transitions by retiming the data. This
feature is desirable in data repeaters and regenerators, e.g., in a SONET architecture
because it avoids accumulation of timing jitter, i.e., excessive timing deviation from ideal
42

threshold-crossing points. We will discuss the problems arising from the timing jitter and
its impact on link reliability next.

2.5 Timing Jitter


2.5.1 Timing Jitter Definition

In a perfect transmission using 2-PAM NRZ, the data transitions, i.e, “01” or “10”
occurrences, cross the decision threshold at integer multiples of the bit period, Tb. Because
of several causes, e.g., random noise and ISI, the actual threshold-crossing times of data
transitions deviate from their ideal values, as shown in Figure 2.23. The timing jitter of the
data is deviations from a reference time at a defined threshold [61]. We will show in the
next section how the data timing jitter impacts the link reliability and increases the BER.
Because of the random nature of the sources of jitter, the timing jitter is modelled by a
random variable and is thus characterized by a distribution. The jitter distribution is then
used to find its impact on the BER. Figure 2.24 shows an accumulated eye diagram of a
data sequence measured with an oscilloscope, superimposed onto the jitter histogram. The
histogram is generated by capturing and accumulating the time of all of the threshold-
crossing events. This histogram approximates the jitter distribution.

Tb
VTH

Jitter [sec]
t

Figure 2.23: Jitter is deviation of the threshold-crossing time from a reference time
43

Figure 2.24: Accumulated eye diagram with data jitter histogram

2.5.2 Jitter Impact on the BER

In calculations of Section 2.3.2 and Section 2.3.5, we had two implicit assumptions.
First, we assumed the sampling clock is ideal, i.e., all clock periods equal Tb and the clock
does not have any timing jitter that randomly moves the sampling point. Second, we
assumed that all the transmitted bits are sampled and bits are never lost or sampled twice.
Existence of data jitter violates these two assumptions and increases BER.

Figure 2.25 shows the clock and data recovery stage of the high-speed receiver front
end right before sampling. Data jitter at the input of clock and data recovery impacts the
BER in two ways. First, the data jitter decreases the horizontal eye diagram opening of the
signal, which means for a given BER, larger data jitter leaves a smaller sampling window
in the eye diagram that achieves that target BER. In other words, for a fixed sampling
point, the BER increases as the data jitter increases because of the error induced by the
jitter.

The BER from jitter can be calculated from the area under the tail of the jitter
distribution, as illustrated in Figure 2.26. This area corresponds to the data transitions that
happen on the wrong side of the sampling clock. Therefore, such an event is called bit
slipping. The errors due to jitter are independent of errors caused by amplitude noise or
44

ISI. If we assume an ideal sampling clock, zero amplitude noise, and zero ISI, we can
calculate the BER only caused by jitter as

∞ –T b + Ts

∫ fTJ ( t ) ⋅ dt ∫
1 1
BER ( T s ) = --- + --- f TJ ( t ) ⋅ dt (2.23)
2 2
Ts –∞

where fTJ(t) is the probability distribution function of the total jitter from all sources and Ts
is the location of the sampling clock in the eye. We assume the sampling point varies
within a unit interval (UI), i.e., 0 ≤ T s ≤ T b . The 1/2 factor represents the probability of a
transition event. If the sampling point is in the middle of the eye at Ts=Tb/2, (2.23) simpli-
fies to

BER =
∫ fTJ ( t ) ⋅ dt . (2.24)
Tb
-----
2

Although the probability of a data transition event on each side of the eye diagram in
Figure 2.26 is half, both sides of the eye can independently contribute to BER by adding
an error, as indicated by the area under the tail of both distributions.

Data
Recovery

Phase Loop
Detector filter
VCO

Figure 2.25: Impact of data jitter on BER from data path and clock path
45

Tb/2
0 0 1 1 1 1 0 1 1 1 0 0

fTJ(t)

Sampling
Clock

Figure 2.26: Impact of the data jitter on the BER by causing bit slipping

The second impact of jitter on the BER comes from the uncertainty of the sampling
clock. Data jitter acts as a reference noise for the clock recovery PLL, and therefore it
creates clock jitter at the output of the clock recovery. Because of the inherently slow
response of the clock recovery to the input fluctuations, the clock jitter at the time of
sampling is uncorrelated to the data that it samples. Therefore, the clock jitter degrades the
BER further. In the next section, we combine the results of this section with Section 2.3.5
to find the overall impact of impairments such as noise, ISI, and jitter on the link BER. In
the next chapter we break down the total jitter to its components and show an analytical
model for data-dependent jitter that is caused by ISI. We also highlight the impact of
data-dependent jitter on the BER.

2.6 Overall Impact of Jitter and ISI on the BER


In Section 2.3.5 and Section 2.5.2 we studied the impact of noise, ISI, and jitter on the
BER independently. In a real implementation of a high-speed link all of these impairments
exist and degrade the BER simultaneously. We combine the impact of all those effects in
this section.
46

2.6.1 Ideal Sampling Clock

When the sampling clock is ideal, i.e., all the clock cycles are exactly one bit period, a
detection error may occur due to the combination of noise and ISI. We calculated the BER
for such a case in (2.19). This BER is a function of the sampling time. The minimum BER
is achieved when sampling at the optimum point, which is not necessarily at Tb/2. If the
sampling point is moved from the optimum sampling point, the ISI contribution changes
from (2.16) and (2.17) and thus the BER increases. However, the general form of the BER
remains the same as in (2.19). We denote the BER that is caused from the amplitude noise
and the ISI by BERISI ( T s ) as it is a function of the sampling point, Ts. We can rewrite
(2.19) as

1
BERISI ( T s ) = --- [ BER ( ISI 0 ( T s ) ) + BER ( ISI 1 ( T s ) ) ] (2.25)
2
where we have defined

0.5 – ISI 0 p o ( T s ) + ISI 1 – 0.5


BER ( ISI 0 ( T s ) ) ≡ Q ⎛⎝ -----------------------⎞⎠ = Q ⎛⎝ ----------------------------------------------⎞⎠ (2.26)
σ σ

0.5 – ISI 1 p o ( T s ) + ISI 0 – 0.5


BER ( ISI 1 ( T s ) ) ≡ Q ⎛ -----------------------⎞ = Q ⎛ ----------------------------------------------⎞ . (2.27)
⎝ σ ⎠ ⎝ σ ⎠

In the presence of the timing jitter, the relative location of the sampling point and the
threshold crossing of the data is changing randomly. In other words, if we assume that the
timing jitter, ∆t, is a random variable with zero mean, the sampling point for the bit that is
sampled right after a transition is Ts-∆t. Therefore, the BER for such a bit for a given ∆t is
BER ISI ( T s – ∆t ) . The overall BER for such bits is found by integrating the BER over all
possible values of ∆t weighted by the probability distribution function. For instance, for a
“01” transition, the overall BER from the combined effect of the noise, the ISI, and the
timing jitter is

∞ ∞
1 1
BER ( T s ″01″ ) = ---
2 ∫ f TJ ( t ) ⋅ BER ISI ( T s – t ) dt + ---
2 ∫ fTJ ( t ) ⋅ BERISI ( Tb – Ts + t ) dt . (2.28)
–∞ –∞
47

The two terms on the right in (2.28) correspond to when bit “1” and bit “0” in the “01” pair
are in error, respectively. Notice that when the ISI and noise are absent and only the timing
jitter is present we have


BER ISI ( t ) = ⎨ 0 t ≥ 0 . (2.29)
⎩ 1 t<0

Therefore, (2.28) simplifies to (2.23). Furthermore, if the timing jitter is absent, the jitter
distribution is a delta function concentrated at t=0, and if it is replaced in (2.28), the equa-
tion is simplified to BERISI ( T s ) .

We have calculated the approximate overall BER caused by the combined effects of
the ISI, noise, and jitter in Appendix A, equation (A.9), which is rewritten here as

BER ( T s )=
Ts
⎛ – ISI 0 ( T s ) 0.5 – ISI 1 ( T s – t ) ⎞ . (2.30)
1--- ⎜ ⎛ 0.5 Ts – Tb Ts Tb – T s
⎜ Q ⎝ ----------------------------------⎞⎠ + ∫ f t ( t ) ⋅ Q ⎛⎝ ---------------------------------------⎞⎠ dt⎟⎟ ⋅ ⎛⎝ 1 + Q ⎛⎝ -----------------⎞⎠ ⎞⎠ + Q ⎛⎝ -----⎞⎠ + Q ⎛⎝ -----------------⎞⎠
4 σn σn σj σj σj
⎝ –∞ ⎠

Equation (2.30) provides the overall BER as a function of the sampling time and the sys-
tem response, i.e., ISI0 and ISI1, for the given noise and timing jitter standard deviations.

2.6.2 Non-Ideal Sampling Clock

In reality the sampling clock has some uncertainty or jitter associated with it that can
be modeled by a probability distribution function, pdfclk(Ts), where Ts is now a random
variable. Then, the total BER can be found by using the continuous total probability
theorem [62] as

Tb

BER =
∫ p dfclk ( Ts ) ⋅ BER ( Ts ) dTs (2.31)
0

that simplifies to (2.30) if the clock is ideal with a delta probability distribution function.
We have neglected the contributions of the tails of the clock and have bounded the clock
48

distribution to one bit period. Although (2.31) does not have a closed form for an arbitrary
clock distribution, we can use it to numerically compute or simulate the BER and compare
the impacts of jitter and ISI. In the rest of this chapter we assume that the sampling clock
is ideal and pdfclk(Ts) is a Dirac delta distribution function.

2.6.3 ISI and Jitter Trade-off

2.6.3.1 The Bathtub Curve

If the BER caused by the timing jitter in equation (2.23) is plotted vs. the sampling
point in a unit interval, i.e., when 0 ≤ T s ≤ T b , a curve is achieved that resembles the
shape of a bathtub and is thus called a bathtub curve. It graphically demonstrates that as
the sampling point approaches the edges of the data eye diagram, the BER significantly
increases. An example bathtub curve is shown in Figure 2.27, when the total jitter
distribution is Gaussian with zero mean and standard deviation, σj=0.05 UI. A unit
interval (UI) is a unit of time that equals the time normalized to a bit period. The bathtub
curve is a useful tool for characterization of high-speed links. It is used to define an eye
diagram opening for a given BER. For instance, in Figure 2.27, the eye diagram opening
at the BER=10-12 is about 0.3 UI. The eye diagram opening corresponds to the available
timing margin for the location of the sampling clock in the eye diagram that can achieve

0
10
-2
10
Log[BER]

-4
10
-6
10
-8
10
-10
10
-12
10
0 0.2 0.4 0.6 0.8 1
Ts [UI]
Figure 2.27: Bathtub curve for σj=0.05 UI
49

the target BER. Therefore, the bathtub curve can be used as a measure for the trade-off
between the link data jitter budget, σj, and the clock jitter budget, the eye opening.

2.6.3.2 The BER Contours: 3D Bathtub Curve

We can generalize the concept of the bathtub curve to a data link with noise and ISI. If
we use (2.30) to calculate the BER, we can plot a three-dimensional bathtub curve as a
function of the sampling time and the system bandwidth that represents the ISI in the case
of a first-order system. Consequently, we obtain an insight about the trade-offs between
the data link’s jitter and ISI budget and the sampling clock timing margin. Such trade-offs
are important in determining the specifications of the pre-amplifier response and the clock
and data recovery characteristics for achieving minimum BER.

Figure 2.28(a) shows the 3D bathtub curve when the link is modeled with a first-order
system. The BER is calculated for various sampling points and normalized 3dB
bandwidths, when σj=0.05UI and N0=4e-3V/Hz2. If N0=0, the cross section of the plot,
when bandwidth approaches infinity, becomes the conventional bathtub curve.
Figure 2.28(b) shows the contours of the BER as a function of the sampling point and the
bandwidth, which is equivalent to the top view of the 3D bathtub curve. The contours -8

1.4
-6
-8

-4
-2
-4
-6
-2
f-3dB/(Bit Rate)

1.2
-10

1
-10
log10[BER]

-8
-6
-8

-4
-2
-4
-6
-2

-12

0.8
-10
ate)

0.6
tR

-10
-8
-8

-6
-4
-6

-4
-2
i

-2
dB /(B

0.4
0.1 0.3 0.5 0.7 0.9
Ts [UI]
f-3

Ts [UI]
(a) (b)
Figure 2.28: (a) Three dimensional bathtub curve for a first-order system for various normalized
bandwidths; σj=0.05UI and N0=4e-3v2/Hz (b) Contours of BER from top view of plot
(a)
50

-1
-2

-4
1.4 1.4

-2

-1

-3

-2
-8
-6

-5
-3
-4
-8

-6
-2
-6

-5
-4
-4

-6
1.2 -10 1.2
f-3dB/(Bit Rate)

f-3dB/(Bit Rate)
-7

-10-8
1 1

-2

-1
-7
-12

-4
-10
-2

-1

-3

-2
-6

-6
-5
-3
-8
-8

-4

-2
-6

-5
-6
-4

-4
0.8 0.8

-9
-14

-12 -14
-12

-7
-8
0.6 0.6

-10-8

-1
-2

-7
-2

-10

-1

-4
-3
-16

-2
-8

-6

-6
-5
-8

-3
-2
-4
-6

-5
-6
-4

-4
0.4 0.4
0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9
Ts [UI] Ts [UI]

(a) (b)
Figure 2.29: BER contours (a) σj=0.025UI and N0=4e-3V /Hz (b)σj=0.05UI and N0=5e-3V2/Hz 2

show that the BER is independent of bandwidth as the sampling point approaches the data
edges because the timing jitter dominates the BER. Moreover, the optimum bandwidth
that minimizes BER is about 75% of the bit rate. At this bandwidth, the optimum sampling
point is neither in the center of the eye nor at Ts=Tb as we saw in a first-order system.
Finally, we can see that the timing margin for the sampling clock reduces drastically at
smaller system bandwidths.

We can use (2.30) to find the contours of the BER for any given σj and N0. The noise
standard deviation, σn, is a function of the bandwidth and N0. Figure 2.29 shows the BER
contours for two more cases. In Figure 2.29(a), the BER is plotted when the σj=0.025 UI,
half the value in Figure 2.28(b)’s plot. All the other parameters are the same. The jitter
reduction lowers the minimum achievable BER. It also moves the optimum sampling
point to the left, closer to Ts=Tb, which is the optimum sampling point for first-order
response in the absence of jitter. The optimum bandwidth is shifted to lower values, as the
sampling point is closer to Ts=Tb. This is because the ISI terms from (2.16) and (2.17)
decrease, and amplitude noise dominates BERISI. Therefore, a smaller system bandwidth
that filters more of the noise power can achieve a lower BER. Figure 2.29(b) shows the
51

BER contours when N0=5e-3V/Hz2 and all the other parameters are the same as
Figure 2.28(b). The increase in amplitude noise degrades the overall BER by 3-4 orders of
magnitude. The optimum system bandwidth-to-bit rate ratio is smaller compared to
Figure 2.28(b), as we discussed above. The sampling timing margin for achieving BER of
10-12 has significantly decreased.

The y-axis in Figure 2.29 is the normalized bandwidth only for a first-order system.
For a general LTI system the y-axis is related to the size of the pulse response at each
subsequent sampling point, which is in turn associated with the received pulse response as
a result of the combination of the channel response and pre-amplifier transfer function.
Therefore, the designer can use the BER contours to determine the optimum front end
time response shape for achieving a target BER. In addition, the optimum sampling point
and its associated timing margin can be obtained from the BER contours with a target
BER and is used to design the parameters for the clock recovery circuit.

In the next chapter, we introduce the data-dependent jitter (DDJ) phenomenon that is
the impact of the ISI on the threshold-crossing time of the data. The DDJ modifies the
jitter distribution by effectively increasing the jitter variance. In this case, (2.30) is not
sufficient for finding the total BER. We complete the equation for computing the BER by
including the impact of DDJ in our calculations.

2.7 Summary
In this chapter we introduced the principles of wireline communication systems. We
discussed the system level challenges for designing a reliable high-speed communication
link. We studied the impacts of the noise, the ISI, and the timing jitter on the link bit-error
probability and provided the relationships between the system parameters, e.g., bandwidth
and sampling point, and the BER. These relationships are the first step in designing the
system for minimizing the BER. Finally, we combined the effects of the ISI, noise, and
jitter and calculated the overall BER when all of these impairments are present. We used
52

the result to demonstrate some of the existing trade-offs between system parameters. We
showed how the analytical formulation for the BER can be used to find the system level
specifications for the blocks of the receiver, such as the pre-amplifier and the clock
recovery. In the next chapter we analyze data-dependent jitter (DDJ) and provide an
analytical probability distribution function for it that modifies the timing jitter distribution
we discussed in this chapter. We add the DDJ component to the BER and demonstrate its
remarkable impact on the performance of high-speed serial links.
53

Chapter
Data-Dependent Jitter
3 in Wireline
Communications
3.1 Introduction
As we discussed in Chapter 2, the reliability of high-speed serial communication links
depends upon timing jitter. The timing jitter of data transition is deviations of the
threshold-crossing time, i.e., time at which data crosses a decision threshold, compared to
a reference clock. The transmitter, the channel, and the receiver contribute to the timing
jitter of the data sequence. In addition, at least a part of the timing jitter of the data is
inherited as phase uncertainty of the recovered sampling clock in the clock recovery
system. The bit error rate (BER) of the regenerated data sequence in the receiver is
degraded by the timing jitter of the data and sampling clock. Nonidealities such as
bandwidth limitation and medium dispersion exacerbate jitter effects.

Data timing jitter is separated into two main categories, namely, random jitter (RJ) and
deterministic jitter (DJ) [61]. RJ is random variations of threshold-crossing time due to
amplitude fluctuations around the crossing time or phase noise of the transmitter clock
[63]. DJ is further categorized into data-dependent jitter (DDJ), duty cycle distortion jitter,
and bounded uncorrelated to data jitter (e.g., crosstalk jitter or sinusoidal jitter) [61]. DDJ
is threshold-crossing time deviations correlated to the previous bits on the current data bit.
It is also known as pattern jitter. DDJ is often caused by bandwidth limitations of the
system or electromagnetic reflections of the signal. Therefore, DDJ has a larger impact on
high-speed transmission systems with restricted bandwidth. In this chapter, we propose
methods for characterizing DDJ theoretically based on system parameters and study its
impact on BER.
54

The impact of timing jitter on the performance of different communication links has
been studied extensively [59],[64]–[69]. However, these works have focused on the effect
of digital pattern on the output jitter of the extracted clock. They have neglected the
limitations of all other blocks in the communication link. For instance, Byrne et al. have
investigated the accumulation effect of timing jitter in a series of regenerators with special
attention to the effect of pattern jitter [65]. However, the analysis is limited to a simple
second-order tank as the timing extraction block. Saltzberg has estimated the aggregate
effect of RJ and DDJ using Taylor series expansion and has calculated the jitter of the
extracted sampling clock [66]. Similarly, Gardner has compared the effect of pattern jitter
on different clock recovery schemes [67]. He has presented a relation between DDJ and
the sampling clock phase variation with qualitative explanations. Huang has proposed
pulse shapes that result in DDJ-free data streams [68]. But, he has emphasized the
peak-to-peak data-dependent jitter and has calculated it from the two data sequences that
result in the maximum shift of the threshold-crossing time. He has assumed a given form
for the received data stream, namely an ideal non-causal Nyquist pulse. All these works
condition the system that generates DDJ to several assumptions. A model for the DDJ
generated from a general LTI system is still lacking.

In a different context, jitter modelling techniques are developed for separating and
measuring jitter performance of devices in communication links [61],[70]–[72]. Reliable
jitter measurement methods are more important in high-speed devices, where bandwidth
limitations aggravate DDJ. Therefore, predicting DDJ contribution is essential to accurate
measurement systems. For instance, Shimanouchi has related the bandwidth of an
automatic test equipment (ATE) system and the DDJ [70]. However his analysis was
based on the previous data transition only. In addition, he limits the model to first-order
system response.

Although the significance of DDJ has been realized in the aforementioned literature,
theoretical analysis of DDJ and study of its relation to system parameters such as
55

bandwidth has been neglected. The main contribution of this chapter is to propose a
method for predicting data-dependent jitter for a general LTI system in a context suitable
for circuits and system designers. The dependence of DDJ on system parameters provides
additional insights for minimizing jitter and highlights that increasing the bandwidth does
not necessarily minimize DDJ. In addition, the method reduces the simulation or
measurement time remarkably by relating DDJ characterization linearly to the number of
prior bits considered. The conventional computation grows exponentially with the number
of bits because it requires passing all possible sequences through the system. The
theoretical results are matched with jitter histogram measurements.

In the rest of this chapter, we first define data-dependent jitter formally. Then we
derive an analytical expression for DDJ of first-order LTI systems. The expressions are
associated to conventional approximations of the distribution of data-dependent jitter, and
the results are experimentally verified. Next, we generalize the analysis for any LTI
system with known step response. A perturbation method is introduced that approximates
DDJ by separating the jitter contributions of the previous bits. We compare the measured
deterministic jitter of real communication media with analytical expressions that we
derive for DDJ and demonstrate that the presented analytical results estimate DDJ
accurately and are reliable for predicting jitter. Finally, we update the BER calculation
from the previous chapter by accounting for the correlation between DDJ and ISI.

3.2 Framework
3.2.1 Data Jitter

A typical serial communication receiver regenerates data by sampling the received


signal. Sampling occurs synchronously to a clock extracted from the received signal.
Ideally, the sampling clock should occur between adjacent data transitions to optimize the
BER. For a given symbol rate, each threshold-crossing time occurs ideally at integer
56

multiples of symbol period. However, it deviates from the ideal value due to several
factors in the link (e.g., noise, limited channel bandwidth, or limited receiver front-end
bandwidth). Consequently, the knowledge of the effect of the system on data
threshold-crossing times and the sampling clock timing is essential for optimizing BER.

As we discussed in Section 2.5.1, data jitter is the deviation of the data


threshold-crossing times from a reference time. The total jitter is modeled as the sum of
two independent random variables, random jitter (RJ), ∆trj, and deterministic jitter (DJ),
∆tdj [61];

∆t tj = ∆t rj + ∆t dj . (3.1)

Hence, the total jitter probability distribution function (PDF) is the convolution of the PDF
of RJ and DJ,
f tj ( ∆t ) = f rj ( ∆t ) ⊗ f dj ( ∆t ) (3.2)

where f(∆t) is the PDF of each jitter term.


Random jitter is modeled by a Gaussian random variable. Deterministic jitter has
systematic origins such as bandwidth limitation, crosstalk, or power supply noise. In
general, it can be modeled as a stochastic process because transmitted data or data in
neighboring channels is random. Efforts for modelling the probability distribution
function of deterministic jitter are typically based on results from measurement techniques
and numerical computation algorithms [72]–[75]. The distribution function of the DJ has
been previously modeled as two impulses [61][73]. DJ is characterized by the distance
between the two impulses [75]. Figure 3.1(a) illustrates how the total jitter distribution
results from the combination of the RJ and the DJ. Figure 3.1(b) shows a typical
measurement result for the eye diagram of a received data sequence around
threshold-crossing time. The measured jitter histogram approximates data jitter
distribution in Figure 3.1(a). In this chapter, we study analytically data-dependent jitter,
one of the major components of deterministic jitter. We propose methods for
57

DJ

DJ DJ

5 ps 200 mV
⊗ =

fdj(∆t) frj(∆t) ftj(∆t)


(a) (b)
Figure 3.1: (a) Distribution of total jitter from the convolution of RJ and DJ PDF (b) Eye diagram and
jitter histogram measurement for a data sequence passed through a microstrip
transmission line on FR4 PCB

characterizing DDJ theoretically, based on system parameters. Analytical studies on other


sources of deterministic jitter can be found in [21].

3.2.2 Data-Dependent Jitter

Data-dependent jitter (DDJ) is the deviation of each data threshold-crossing time from
a reference time due to the residual signal of the previous data bits delayed due to the
memory of the system. Limited bandwidth of the transmission medium (e.g., PCB traces),
receiver front-end (e.g., TIA), or electromagnetic reflections cause prior symbols to
interfere with the current transition. While the effect of inter-symbol interference (ISI) on
the amplitude of the received symbols has been studied (e.g., [15][17]), its effect on the
timing needs further analysis. The effect of ISI on timing is to change the
threshold-crossing time of a data transition and cause DDJ, as shown by an example in
Figure 3.2. Here, depending on the value of the bit prior to the “01” transition, the
transition can occur earlier or later at the output of the system, as shown on the right.

To analyze DDJ, the data link with ISI is modeled as an LTI system. A sequence of
random 2-PAM NRZ data is passed through the LTI system. The last two bits of the
sequence are either “01” or “10” to model a rising edge transition or falling edge
transition, respectively. The variation of the crossing time of the transition can be related
to the data statistics to calculate DDJ. The process is illustrated in Figure 3.3, for a “01”
58

0 0 0 1

Link
with ISI

0 1 0 1 t1 t2

Figure 3.2: Data-dependent jitter is caused by ISI impact of prior bits

a-4 a-3 a-2 a-1


LTI
System VTH
x(t) t=0 y(t)

∆t histogram
Figure 3.3: Response of a general LTI system to a random bit sequence and generation of DDJ

transition. For symmetric input rising and falling transitions and a threshold of half-signal
swing, the jitter distributions for rising and falling transitions are identical and calculation
of one is sufficient.

A random data sequence arriving at the input can be represented by

–2

x( t) = u(t) + ∑ a k ⋅ pi ( t – k T b ) ( a k ∈ {0, 1} ) (3.3)


k = –∞

⎧ 1 0 ≤ t ≤ Tb
pi ( t ) = ⎨ (3.4)
⎩ 0 otherwise

where u(t) is the unit step function and models the rising edge, pi(t) is the unit pulse signal,
as described in (3.4) with duration of bit period, Tb, and the aks are the random bits that are
either “1” or “0” with a given probability. The sum in (3.3) starts from k=-2, i.e., a-1=0, to
59

guarantee a rising edge at t=0. One can write a similar equation for a falling edge in which
case a-1=1, a0=0, and the rest of the equation is the same as (3.3). Because the system is
linear, we can use superposition theorem to find the output as

–2

y( t) = s(t) + ∑ a k ⋅ po ( t – k T b ) , (3.5)
k = –∞

where s(t) and po(t) are, respectively, the system step response and unit pulse response.
The solution to

y ( t c ) = v th = 0.5 (3.6)

for tc determines the time of the threshold-crossing event as a function of data statistics
and system parameters. We compare tc to the time of the threshold-crossing event when all
the aks are zero, and we denote it by t0. We can calculate t0 by solving
s ( t 0 ) = V TH = 0.5 .
Then, DDJ is defined as

∆t ≡ t 0 – t c . (3.7)

We will solve (3.6) for the first-order system as an example in Section 3.3 and analyze the
general LTI system in Section 3.4.

3.3 An Analytical Expression for DDJ: First-Order


System
3.3.1 Analytical Expression for Threshold-Crossing Time

In this section we analyze the DDJ of a first-order system, as described by the transfer
function

1
H ( s ) = -------------- . (3.8)
1 + τs
60

Here, τ is the system time constant and the associated 3dB bandwidth is 1/(2πτ). From
(3.6) and (3.7), we can find a closed-form solution for the DDJ random variable of a
first-order system as
–2
⎛ – k⎞
⎜ ⎛
∆t = – τ ⋅ ln ⎜ 1 – ⎝
– α-⎞
1-----------
α ⎠ k = –∞
a k ⋅ α ∑⎟
⎟ (3.9)
⎝ ⎠
–Tb ⁄ τ
where we define α ≡ e similar to Chapter 2. In a system with a large bandwidth
compared to the input data rate, α approaches zero. On the other hand, if the bandwidth is
small the data transitions take longer. The upper limit on α for this calculation is set if we
assume the rising transition crosses the threshold within a bit period. This bounds α to
values smaller than 0.5. At α=0.5 the bandwidth is only 11% of the bit rate.
Equation (3.9) relates the impact of each prior bit and the threshold-crossing time
deviation. For any data transition the prior bits are random sequences that overall result in
an ensemble of ∆t values. As α ≤ 0.5 , the more recent bits have a dominant effect on jitter
and a-2 has the largest impact. Also, the residual effect of the bits vanishes exponentially
for a larger system bandwidth to bit rate ratio, i.e., when α approaches zero. Figure 3.4
captures these effects by plotting ∆t in unit intervals (UI) for different values of α. For
each α, all the possible values of ∆t are plotted. We include the impact of four prior bits
and neglect the effect of more distant bits. A larger α corresponds to smaller

∆tpp
0.25
DDJ PDF
16-impulse

0.2
2-impulse

8-impulse
4-impulse
∆t [UI]

0.15

0.1

0.05

0 0 0.0 0.1
0 .01 0.2 0.3
65
α

Figure 3.4: Ensemble of normalized DDJ values for different ratios of bandwidth to bit rate along with the
appropriate model to use for data-dependent jitter PDF
61

bandwidth-to-bit rate ratio causing divergence in ∆t values and larger data-dependent


jitter. If we change the scale on the x-axis and plot DDJ for small values of α we will
observe similar DDJ characteristics on a different scale of ∆t. In fact it can be seen from
(3.9) that for each data sequence, ∆t takes a unique value. Therefore, on a smaller scale for
α the same divergence characteristics would be observed for ∆t values. The parameter ∆t
has a self-similar behavior for different scales of α.

For 0.01 ≤ α ≤ 0.065 ∆t is concentrated around two values. In this range of system
bandwidth, the DDJ distribution can be modeled with two impulses that carry the
probability weight expressed in section Figure 3.2 However, for larger α the distribution
should be extended to four or more impulses, as can be seen from Figure 3.4. In a
first-order system, the concentration of data jitter around two values corresponds to
bandwidth range, where only the penultimate bit, a-2, has a remarkable effect on jitter.
Since a-2 is “1” or “0,” the data jitter is divided into two mean probability masses,
modeled by the two impulse functions. This is exactly the same as the conventional model
for DDJ distribution based on the double Dirac delta function. We can also observe the
behavior of DDJ similar to predictions of Figure 3.4. The threshold-crossing time and
related histogram in the output of a first-order system is shown in Figure 3.5 for two
different α values and demonstrates bifurcation of DDJ distribution from two delta
functions to four delta functions as α increases.

(a) (b)

Figure 3.5: Threshold-crossing histogram and DDJ distribution: (a) α=0.1 (b) α=0.3
62

Similar behavior for data-dependent jitter distribution is generalized to higher-order


systems as will be seen in Section 3.4. A dominant prior bit (not necessarily a-2) will be
identified that shapes data-dependent jitter distribution as two impulse functions.

3.3.2 Peak-to-Peak Jitter

Data-dependent jitter is bounded. It can be characterized by its peak-to-peak jitter


value, ∆tpp. From (3.9), the maximum and minimum of ∆t are obtained for “all one” and
“all zero” sequence of ak’s. The “all zero” sequence corresponds to the latest
threshold-crossing time, which is also selected as the reference time, t0. Therefore, we can
calculate peak-to-peak data-dependent jitter as

∆t pp = – τ ⋅ ln ( 1 – α ) (3.10)

which is overlaid with a dashed line on the plot in Figure 3.4. Since the latest crossing
time is referenced, the plot shows that ∆tpp sets an upper bound on ∆t.

3.3.3 Scale-One DDJ

In modern serial communication links, measured total jitter distributions resemble the
jitter histogram in Figure 3.1(b). In such systems, a useful measure of data-dependent
jitter is the distance between the two impulse functions in Figure 3.1(a) or the separation
between the means of the two Gaussian distributions. According to discussions in
Section 3.3.1, the two impulse distribution results when the impact of only one prior bit,
a-2, on jitter is included. Therefore we define the separation of the impulses as follows and
call it the scale-one data-dependent jitter, DDJ1, because only the impact of one prior bit is
included.

DDJ 1 = E { ∆t a –2 = 0 } – E { ∆t a – 2 = 1 } (3.11)

where E{.} is the expected value of ∆t conditioned on a-2. For equal probabilities of “1”
and “0” we can show
63

Data-Dependent Jitter [UI]

α
Figure 3.6: Comparison of the measurement results for DDJ1 and the analytical expression in (3.12) for a
first-order system

τ 1+α -
DDJ 1 = --- ln ------------------------ . (3.12)
2 2
1–α+α
We verified the expression in (3.12) experimentally by testing an RC filter that serves as
7
the first-order system. A 2 – 1 pseudo-random bit sequence was applied to the filter and
the jitter histogram was measured using Agilent’s 86100 communication analyzer. The
input bit rate was scanned over a wide range of observable DDJ1 values. The separation of
the jitter mean of the two Gaussians in the histogram was measured. Figure 3.6 demon-
strates the excellent agreement between (3.12) and the measurement results. For α<0.02
random jitter dominated DDJ1.

3.4 An Analytical Expression for DDJ: General LTI


System
3.4.1 Perturbation Method

For a general LTI system, equation (3.6) may not be solvable analytically. We propose
a technique that approximates DDJ for a general LTI system based only on its step
64

∆tk

po(t0-kTb) vth
t0

Transition with ak=1 Transition of s(t)

Figure 3.7: Deviation of the threshold-crossing time due to the effect of the kth bit

response. The method can be exploited easily in simulation or measurement to


characterize DDJ and optimize jitter performance.

Data-dependent jitter occurs because the tails of prior bits perturb the time that the
data transition crosses the threshold level. In the absence of any prior bit,
threshold-crossing time is t0 as discussed in Section 3.3.1. However, if ak is “1” the kth
prior bit changes s(t) by po(t0-kTb), in (3.5). This amplitude perturbation shifts the
threshold-crossing time from t0 and causes jitter. Assuming p o ( t 0 – kT b ) « s ( t 0 ) , the shift
in threshold-crossing time from the contribution of the kth bit can be calculated from the
slope of s(t) at t0 and the shift in the amplitude of s(t). This process is shown graphically in
Figure 3.7. The threshold-crossing time shift due to the kth bit is denoted by ∆tk. We have

po ( t0 –k Tb )
∆t k ≅ – ---------------------------- (3.13)
ds ( t )-
-----------
dt t = t
0

and the overall perturbation effect, DDJ, is

–2 –2

∆t ≡ ∑ ds ( t
–1
a k ∆t k = -------------------------- ⋅
)
------------

a k po ( t 0 – k T b ) . (3.14)
k = –∞
dt t = t k = –∞
0

This technique is based on classical perturbation theory (e.g., [76]). The assumption
made above on the amount of perturbation bounds the accuracy of the method. In a
65

practical system the bandwidth is chosen such that unit pulse response fall time is within
Tb. Therefore, po(t0-kTb) is much smaller than vth and (3.14) is a good approximation. If
the link is designed such that the received pulse has the shape of a Nyquist pulse, the
approximation still holds. For such pulses the residual memory of prior bits changes
slowly around the threshold-crossing [40]. Therefore, the perturbation of the step response
is po(t0-kTb). A similar methodology was used to calculate the reference jitter in a clock
recovery system [66][77][78].

We evaluated the results in (3.14) for all possible bit sequences and compared them
against the accurate DDJ in (3.9) for a first-order system. We limit k to – 10 ≤ k ≤ – 2 to
account for the 11 most recent bits only because the effect of the bits exponentially
decreases. Error in DDJ prediction is calculated for each bit sequence at different ratios of
bandwidth (1/2πτ) to bit rate (1/Tb), and for each ratio the worst case relative error is
plotted in Figure 3.8(a). The perturbation method approximation has worst case accuracy
of better than 2.5% in a practical range of bandwidth. Moreover, at the nominally
optimum bandwidth-to-bit rate ratio of 0.7, the error is only 0.01%. For a first-order
system, the error in approximation is identical even if – 3 ≤ k ≤ – 2 .Therefore, (3.14)
introduces a basis for a very efficient technique of calculating data-dependent jitter.

6
different ζ
Relative error [%]

5 0.4
Relative error [%]

0.5
0.71
4 1
0.01%
3

1
0 .7
0.5 0.6 0.7 0.8 0.9 1
f-3dB.Tb f-3dB.Tb
(a) (b)
Figure 3.8: Worst case accuracy of the perturbation method in predicting DDJ: (a) for a first-order
system. (b) for a second-order system
66

A further verification of the perturbation technique is done for an all-pole


second-order system with transfer function

2
ωn
H ( s ) = ------------------------------------------- (3.15)
2 2
s + 2ζω ns + ω n

where ωn is the natural frequency and ζ is the damping factor. The exact DDJ value for
this system is computed from MATLAB simulations of system output for all possible bit
sequences. Then, the approximated DDJ is calculated using (3.14). The results are com-
pared and the worst case relative error is plotted in Figure 3.8(b) for different damping
factors over a practical range of bandwidth normalized to bit rate. Again, small relative
errors verify that (3.14) is an accurate expression for predicting the DDJ of a general LTI
system based on its step response.

3.4.2 Peak-to-Peak Jitter and Scale-One DDJ

We can use (3.14) to estimate the peak-to-peak data-dependent jitter for a general LTI
system. We have

∆t pp = max { ∆t } – min { ∆t } . (3.16)

The maximum of ∆t is achieved for the data sequence in which a k = 1 if p o ( t 0 – k T b ) ≤ 0


and ak = 0 otherwise. Similarly, the minimum of ∆t is achieved for the data sequence,
where a k = 1 if p o ( t 0 – k T b ) ≥ 0 and a k = 0 otherwise. Therefore, (3.16) is simplified
to

–2

ds ( t
1
∆t pp = -------------------------- ⋅
)
------------

k = –∞
p o ( t 0 –k Tb ) . (3.17)
dt t = t
0

Scale-one DDJ can also be defined for a general LTI system similar to (3.11).
However, the predominant impact on jitter is not necessarily related to a-2, as discussed in
Section 3.3.3. The pulse response of the system and the bit rate determine the effect of
67

prior bits. The effect of each prior bit can be estimated separately from (3.13) and the bit
with most prominent impact can be distinguished. Then, using the same definition as in
(3.11) and assuming that am has the largest impact on DDJ, we can write

DDJ 1 = E { ∆t a m = 0 } – E { ∆t a m = 1 }

⎧ ⎫ ⎧ ⎫
⎪ –2 ⎪ ⎪ –2 ⎪ . (3.18)
⎪ ⎪ ⎪ ⎪
= E⎨

∑ ak ∆t k ⎬ – E ⎨ ∆t m +
⎪ ⎪

a k ∆t k ⎬ = ∆t m

⎪ k = –∞ ⎪ ⎪ k = –∞ ⎪
⎩ k≠m ⎭ ⎩ k≠m ⎭

Therefore, we conclude

po ( t 0 –m T b )
DDJ 1 = ----------------------------
- , (3.19)
ds ( t )
------------
dt t = t
0

which is an important expression that determines the separation of the two impulses in the
probability distribution function of DDJ as in Figure 3.1(a) for a general LTI system. It
can be integrated into any communication link design or circuit design simulation soft-
ware to predict the data-dependent jitter contribution of the corresponding component in
the system. In addition, DDJ1 can be easily measured using a general purpose high speed
oscilloscope. We will verify equation (3.19) experimentally in Section 3.5. A significant
advantage of the perturbation method is the remarkable reduction of the simulation or
measurement time of DDJ. In fact, simulation time for peak-to-peak DDJ is now linearly
related to k, while direct calculation from (3.6) requires passing all the 2 k possible
sequences through the system, which increases exponentially with k.

3.4.3 Data-Dependent Jitter Minimization

In a first-order system, any ak=1 will increase the absolute value of DDJ. Furthermore,
the closer the bit to the data transition, the stronger its impact on data jitter. However, this
68

∆t-3

∆t-4

∆tpp [UI]
∆t [UI]

ζ :0.3-1 ζ :0.3-1
ωn : π/Τb ∆t-2 ωn : π/Τb

f-3dB.Tb f-3dB.Tb
(a) (b)

Figure 3.9: (a) Variation of the impacts of the last three prior bits on DDJ in a second-order system. (b)
Existence of a minimum in the peak-to-peak data-dependent jitter

is not generally true for all LTI systems. It can be seen from (3.13) that the sign and value
of ∆tk depends on po(t0-kTb), and based on the response of the system the effect of each
prior bit can dramatically vary independent of the other bits. Particularly, the pulse
response in (3.13) is sampled at integer multiples of bit period. Therefore, for a given bit
rate, the system can be designed such that its pulse response reduces dominant DDJ terms
and minimizes overall jitter. Pulse shapes that result in minimum jitter in addition to
minimum ISI in the receiver have been studied [68][79]. As an example, the variations of
the first three DDJ terms from (3.13) are plotted in Figure 3.9(a) for a second-order system
with different bandwidth-to-bit rate ratios. The selected range covers under-damped,
over-damped, and critically damped systems. In the range of 0.46-0.48 for the normalized
bandwidth, ∆t-3 has a larger impact on DDJ than ∆t-2. In addition, there exists a minimum
in the peak-to-peak data-dependent jitter as illustrated in Figure 3.9(b). This jitter
minimization behavior can be observed in higher-order systems as well. An experimental
example is shown in Figure 3.10, where the output eye diagram of a 4” copper microstrip
transmission line on conventional FR4 board is plotted at two different bit rates. The
69

5 ps 5 mV 5 ps 5 mV

(a) (b)

Figure 3.10: The output eye diagram of a 4” microstrip line on FR4 PCB at (a) 5 Gb/s and (b) 6.5 Gb/s
demonstrates larger peak-to-peak deterministic jitter at lower bit rate

peak-to-peak jitter is clearly larger at the lower bit rate. As will be shown in Section 3.5,
increasing the bandwidth blindly does not necessarily reduce the DDJ.

3.5 Experimental Verification


Equation (3.19) provides a simple means for finding the DDJ contributions of any LTI
system for any bit rate based only on the step response. The pulse response can be stated in
terms of step response as p(t)=s(t)-s(t-Tb). We will verify the validity of the results
experimentally by comparing the predictions of (3.19) with measured DDJ1 of several
high-frequency systems including an integrated CMOS trans-impedance amplifier. We
associate DDJ1 to the separation of the means of two Gaussian distributions, as in
Figure 3.1(b) when the jitter histogram at the output of the device under test (DUT) is
measured. We use Anritsu’s MP1763C pulse pattern generator to provide the step input
and pseudo random bit sequence (PRBS) input of the length 27-1. We also use Agilent’s
86100 communication analyzer to measure the step response and jitter histogram at the
output. For each system, we first measure and record the step response. Then, we apply a
PRBS at the input with varying bit rate. We measure DDJ1 at a bit rate where the system
shows significant amount of data-dependent jitter. The bit rate is always such that the data
70

spectrum does not exceed the system bandwidth. This fact demonstrates that while the
system bandwidth is large enough to minimize amplitude distortion, DDJ still persists.
The jitter histogram is measured after at least 500,000 crossing events are captured by the
oscilloscope. At the same time, we compute the pulse response from the measured step
response and the current bit rate and calculate DDJ1 from (3.19). Finally, we compare the
measured and analytically-calculated DDJ1.

A. Discrete systems. In one set of experiments we carry out the procedure for various
off-the-bench systems available in the lab. They include a Mini Circuit ZFL 1000-LN
driver amplifier with 1GHz bandwidth, a 9” long 50Ω copper microstrip on standard FR4
printed circuit board, a 10.5’ long standard BNC coaxial cable, and an HP 11688A
microwave high-order lowpass filter with cut-off frequency of fc=2.8 GHz. None of these
systems has a simple first-order response. Therefore the DDJ1 should be estimated from
(3.19). The measurement results are summarized in Table 3.1. Small relative errors in the
last column verify the validity of the analytical results for predicting data-dependent jitter.
For the microstrip line, a-3 rather than a-2 has the most dominant effect on DDJ and causes
the scale-one separation of the threshold-crossing times.

Table 3.1: Comparing measured DDJ1 and predictions of analytical expression in


(3.19)

Measured Corresponding
Bit Rate Dominant
DUT DDJ1 dominant ∆tk Error
Measured Bit
[psec] [psec]
Mini Circuit
1.3 Gb/s 7.665 a-2 7.15 -6.7%
ZFL-1000
microstrip
10 Gb/s 5.35 a-3 5.23 -2.3%
on FR4 PCB
HP 11688A
1.2 Gb/s 20.5 a-2 18.96 -7.5%
Lowpass Filter
coaxial
3 Gb/s 4.6 a-2 4.72 2.5%
cable
71

Mini Circuit ZFL-1000 FR4 Copper Microstrip


Step/Pulse Response Prior bits jitter contributions Step/Pulse Response Prior bits jitter contributions
0.6 0.6
0.5 0.5 5
6
0.4 0.4 4
Amplitude [V]

Amplitude [V]
∆tk [psec]
4

∆tk [psec]
0.3 0.3 3

0.2 0.2 2
2
0.1 0.1 1
0
0 0 0

-0.1 -2 -0.1 -1
0 1.25 2.5 3.75 5 -5 -4 -3 -2 0 0.5 1 1.5 2 -10 -8 -6 -4 -2
time [nsec] k time [nsec] k
(a) (b)
HP 11688A Lowpass Filter BNC Coaxial Cable
Step/Pulse Response Prior bits jitter contributions Step/Pulse Response Prior bits jitter contributions
0.8 0.8

15
0.6 0.6 4
10
Amplitude [V]
Amplitude [V]

∆tk [psec]

∆tk [psec]
0.4 0.4 3
5
0.2 0.2 2
0
0 -5 0 1

-0.2 -10 -0.2 0


0 2.5 5 7.5 10 -10 -8 -6 -4 -2 0 1.25 2.5 3.75 5 -10 -8 -6 -4 -2
time [nsec] k time [nsec] k
(c) (d)
Figure 3.11: Step response, pulse response, and the individual jitter contributions of prior bits as
calculated from (3.13) for the systems under test: (a) Mini Circuit ZFL-1000 amplifier
(b) Copper microstrip line on FR4 PCB (c) HP 11688A lowpass filter (d) BNC coaxial
cable

Step response, pulse response, and the jitter contributions of some prior bits are plotted
in Figure 3.11 for the systems we tested. ∆tk is calculated from pulse response using
(3.13). An important observation is the significance of the time response of the system and
its impact on data-dependent jitter at the output. HP 11688A is a lowpass filter with the
3dB cut-off frequency at 2.8GHz. Compared to ZFL-1000, an amplifier with 3dB
bandwidth of 1GHz, one may suspect that the data-dependent jitter contribution to overall
jitter is larger for the amplifier due to smaller bandwidth. However, around the same bit
rate (1.2-1.3 Gb/s), the filter has significantly larger DDJ. This can be associated to the
72

pulse response characteristics of the two systems as illustrated in Figure 3.11(a) and (c).
Pulse response of the filter has larger ringing in its damping tail that dramatically increases
the jitter from (3.13) because the samples of the pulse response at the measurement bit rate
(1.2 Gb/s) coincide with the maxima and minima of the oscillating tail. Consequently, the
contributions of prior bits are all significant and oscillate between negative and positive
values, as can be seen from Figure 3.11(c). However, the amplifier has smaller ringing,
and the ringing oscillation frequency is not constant and is not related to the measurement
bit rate.

In summary, we must emphasize that bandwidth alone cannot be a complete measure


to characterize the DDJ contributions of an LTI system. Although systems with small
bandwidth tend to increase DDJ, step response or pulse response of the system is required
to analyze the exact characteristics of output data-dependent jitter. Particularly, the system
can be designed such that the samples of its pulse response are negligible at integer
multiples of bit period to minimize DDJ. Along the same line of arguments and similar to
Nyquist’s zero-ISI pulse shaping [15], Huang et al. [68] and Gibby et al. [79] have
proposed channel pulse shapes that result in minimum jitter contributions from prior bits
and hence optimize data-dependent jitter performance of the link.

In a communication link, if the channel response is not known or is time varying, zero-
ISI pulse shaping is not possible. In such cases, an adaptive equalizer is utilized in the
receiver to minimize ISI. Similarly, if pulse shaping for the transmitted data sequence is
not feasible due to channel unpredictability, a data-dependent jitter equalizer can be used
in front of the clock recovery circuit [80].

B. Integrated Trans-Impedance Amplifier. To verify the validity of the DDJ prediction


theory we tested an integrated trans-impedance amplifier (TIA). The TIA was
implemented in a 0.18µm BiCMOS technology using only CMOS transistors and
demonstrated a 9.2 GHz 3dB bandwidth [23][24]. We mounted the amplifier on a brass
substrate and built the additional circuitry around it on the same substrate using a low-loss
73

Vout

CHIP

Bias
Vin

Figure 3.12: TIA test board setup for the 10 Gb/s TIA

duroid PCB. The chip is wire bonded to microstrip transmission lines that then transfer the
signal to SMA connectors on the brass substrate. The test board setup is shown in
Figure 3.12. Although this TIA has enough bandwidth to operate at 10 Gb/s, the
reflections from connectors and wirebond mismatches in addition to the amplifier
response cause the whole system to have a ringing step response as the measurement
shows in Figure 3.13. In spite of having enough bandwidth, the TIA, along with the
measurement setup, exhibits a large amount of DDJ.

We measured the DDJ of the TIA at two bit rates, 1.65 Gb/s and 3.3 Gb/s, using the
same procedure previously discussed. While the bit rates are within the bandwidth range
of the TIA, we observed significant amounts of DDJ. The eye diagram at 1.65 Gb/s is

a-2
40
Amplitude [mV]

30

20

10

0 1 2 3 4 5
time [nsec]
Figure 3.13: TIA step response and impact of a-2 pulse on t0 in a “101” sequence at 3.3Gb/s
74

shown in Figure 3.14(a). The measurement results are summarized in Table 3.2. We
should stress that the prediction of DDJ at several bit rates can be done by measuring the
step response only once.

Table 3.2: Comparing measured DDJ1 and predictions of analytical expression for the
10GB/s CMOS TIA

Measured Dominant Predicted


Bit Rate Error
DDJ1[psec] Bit ∆tk [psec]
1.65 Gb/s 6.85 a-2 6.8 0.85%
3.3 Gb/s 13.6 a-2 12.7 6.6%
3.37 Gb/s 14.1 a-2 12.4 12%
3.37 Gb/s DDJ2=5.85 a-3 5.7 2.5%

In the case of 1.65 Gb/s, DDJ prediction using the perturbation method has only 0.85%
error. Larger scales of data-dependent jitter that are associated with prior bits with less-
dominant jitter contributions are often smaller than rms of random jitter. Therefore, they
are hard to measure or observe and are thus neglected. However, the perturbation method
can still predict the DDJ of larger scales. We measured the DDJ scale-one (DDJ1) and
scale-two (DDJ2) of the TIA at 3.37 Gb/s, where both were observable, as Figure 3.14
illustrates. The measurement results are compared with the calculations in Table 3.2. The
perturbation method predicts scale-two DDJ with an accuracy of 2.5%. The measured
values of DDJ1 and DDJ2 are respectively related to ∆t-2 and ∆t-3 as calculated from
(3.13). The negative value of ∆t-2 corresponds to a negative shift in the zero crossing. In
other words, all the sequences in which ∆t-2 is “1” will split from the zero crossings that
occur at t0 and will move to t0 -|∆t-2 |. On the other hand, positive ∆t-3 will split each
crossing group to two groups, one remaining in the same position and one moving ∆t-3 to
the right. Therefore, overall, four crossing groups can be observed, as in Figure 3.14(b).
75

5 ps 6.85ps
5 mV

(a)

DDJ2
5 mV

5 ps

DDJ1
(b)
Figure 3.14: TIA eye diagram when DDJ1 and DDJ2 are observable (a)1.65 Gb/s (b)3.37Gb/s

3.6 DDJ Impact on the BER


In the previous chapter, we studied the combined effect of ISI and jitter on the BER.
However, we saw in this chapter that the jitter distribution should be modified in the
presence of inter-symbol interference because ISI causes DDJ. The impact of DDJ is to
change the distribution of the total jitter distribution. The effect of jitter on the BER is
formulated in (2.23), which is rewritten here as

∞ –T b + Ts

∫ fTJ ( t ) ⋅ dt ∫
1 1
BER j ( T s ) = --- + --- f TJ ( t ) ⋅ dt . (3.20)
2 2
Ts –∞

Let’s assume that the DDJ is modeled with a double Dirac delta function distribution
and the dominant prior bit that causes this distribution is a-2. fTJ(.) is the convolution of the
76

RJ Gaussian distribution and the DDJ distribution as in (3.2). Equivalently, we can split
(3.20) to two terms where each is the BER caused by random jitter conditioned on the
value of a-2. We have

BER j ( T s ) = p ⋅ BERj ( T s, a – 2 = 0 ) + ( 1 – p ) ⋅ BER j ( T s, a –2 = 1 ) (3.21)

where p is the probability that a-2=0. Each of the BERj terms on the right can be calculated
from (3.20) by replacing fTJ(.) with a Gaussian distribution, while noting that the value of
the a-2 determines the mean value of the Gaussian distribution. The mean value of the
Gaussian distribution is the same as the mean of the threshold-crossing times and can be
found from (3.7)

t c, 0 = E { t c a – 2 = 0 } = t 0 – E { ∆t a – 2 = 0 } (3.22)

t c, 1 = E { t c a– 2 = 1 } = t 0 – E { ∆t a – 2 = 1 } . (3.23)

We find the values of tc,0 and tc,1 in Appendix B. We can write the BER caused by jitter,
conditioned on a-2, as

⎛ T s – t c, 0⎞ ⎛ T b – ( T s – t c, 0 )⎞
BER j ( T s, a –2 = 0 ) = BERj ( T s, t c, 0 ) = Q ⎜ --------------------⎟ + Q ⎜ -------------------------------------⎟ (3.24)
⎝ σj ⎠ ⎝ σj ⎠

⎛ T s – t c, 1⎞ ⎛ T b – ( T s – t c, 1 )⎞
BER j ( T s, a – 2 = 1 ) = BER j ( T s, t c, 1 ) = Q ⎜ --------------------⎟ + Q ⎜ -------------------------------------⎟ . (3.25)
⎝ σj ⎠ ⎝ σj ⎠

The impact of DDJ on the overall BER can also be calculated by modifying the
distribution of the timing jitter. We have carried out this analysis in Appendix A, equation
(A.10). We can use (A.10) to generate the BER contours similar to Section 2.6.3.2. We
have plotted the BER contours for a first-order system in Figure 3.15(a) for when
σj=0.05UI and N0=4e-3V/Hz2. Because the DDJ is related to the system response and it
decreases with larger bandwidth in a first-order system, the BER contours depend on the
bandwidth at all of the sampling points. This is in contrast to Figure 2.28(b), where DDJ
was neglected and the BER becomes independent of bandwidth when jitter dominates the
BER.
77

-2
-6
1.4 1.4

-2

-6

-2
-8

-6
-4
-4

-8
-6
-4
1.2 1.2
f-3dB/(Bit Rate)

f-3dB/(Bit Rate)
-10 -10

-8

-4
-8
-2
1 1

-6

-10

-2
-2

-6

-6
-10
-8

-8
-4
-4

-12
-6
-4

-10
0.8 0.8

-10

-4
-12
-8

-1212

-8
-

-2
0.6 0.6

-10

-2
-6
-6 - 8
-6
-2

-4

-8
-6
-4

-1 -8
-

-10
-8 10

-4
0.4 0.4
0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9
Ts [UI] Ts [UI]

(a) (b)
Figure 3.15: BER contours for σj=0.05UI and N0=4e-3V2/Hz for two reference times for the sampling
point (a) t=0 (b) bandwidth-dependent threshold-crossing time

The values of tc,0 and tc,1 are functions of the system response. For instance, for a
first-order system, both tc,0 and tc,1 increase. In practice, this results in an offset in the eye
diagram of the data. This is because the threshold-crossing times that determine the start
and stop time of the eye diagram shift. Therefore, the absolute value of the optimum
sampling time that is achieved from the BER contours such as the one in Figure 3.15(a)
must also be offset by the same amount. Equivalently, we can replot the BER contours and
change the reference time for the sampling time from t=0 to the threshold-crossing time
for each bandwidth. This will result in the contours in Figure 3.15(b). Figure 3.15(b)
demonstrates that for all of the bandwidths, the optimum sampling point is at 0.65UI,
which is 0.15UI offset from the middle of the eye. The optimum system bandwidth for
minimum BER is again around 70% of the data rate.

We showed in Section 3.4 how to estimate the DDJ impact of a general LTI system
from its response that can be used to calculate tc,0 and tc,1 in (3.22) and (3.23). Therefore,
the BER contours can be easily obtained for any LTI system based on its response.
78

3.7 Summary
The data-dependent jitter is one type of deterministic jitter that results from residual
effects of prior bits on a data threshold-crossing time. It degrades the BER and the data
link performance as the data rates increase, while the system bandwidth budget is
restricted. We proposed a methodology to analytically estimate a general LTI system’s
data-dependent jitter based on its step response. The method reduces the complexity
remarkably because computation time grows linearly with the number of prior bits.
Whereas, in conventional methods, the complexity grows exponentially with the number
of bits.

We verified the validity of the analytical results with simulations and demonstrated
experimentally that this approximation is reasonably accurate for several systems. In
addition, we showed that certain pulse response shapes can result in a minimum
peak-to-peak data-dependent jitter. Finally, we highlighted that 3dB bandwidth does not
characterize DDJ of the system completely, and the shape of the system step response is
the important and essential element that determines DDJ characteristics. We provided the
relationship between the overall BER of a data link and the link response by considering
the effect of DDJ that complemented our calculations in Chapter 2. By analytically
relating the impact of the data link impairments to the BER we can design the system
response and link specifications to optimize the link reliability.
79

Chapter
Bandwidth
4 Enhancement for
Wideband Amplifiers
4.1 Introduction
Wideband amplifiers are one of the most critical building blocks at the front-end of a
high-speed link receiver. As we discussed in Chapter 2, any baseband communication
system needs a wide bandwidth receiver due to the signal’s low-frequency spectral
content. Particularly, all amplifiers in the signal path, such as the trans-impedance
amplifier (TIA) in Figure 2.21, should have enough bandwidth with minimum variations
in the passband and near constant group-delay to avoid distortion in the signal. We studied
the impact of restricted bandwidth, in the form of ISI and jitter. In this chapter, we provide
conditions to maximize the bandwidth of amplifiers in the front-end of high-speed
receivers. We are mainly interested in integrated amplifiers that are implemented by a
silicon-based technology.

Silicon integrated circuits are the only candidates that can achieve the required level of
integration with reasonable speed, cost, and yield and have thus been pursued to a great
degree in recent years. In particular, full integration of silicon-based optical-fiber
communication systems at 10Gb/s and 40Gb/s is of great interest. However, silicon-based
integrated circuits implementing such systems face serious challenges due to the inferior
parasitic characteristics in silicon-based technologies, complicating the procedure for a
wideband design.

The inherent parasitic capacitors of devices are the main cause of bandwidth limitation
in wideband amplifiers. Several bandwidth enhancement methods have been proposed in
the past that can be utilized to overcome this issue in silicon technologies. First-order
shunt peaking has historically been used to introduce a resonant peaking at the output as
80

the amplitude starts to roll off at high frequencies [81]–[83]1. It improves the bandwidth
by adding an inductor in series with the output load to increase the effective load
impedance as the capacitive reactance drops at high frequencies. Neuhauser et al. studied
the effect of bondwire inductors and used an active peaking network to enhance the
bandwidth [84][85]. Capacitive peaking uses an explicit capacitor to control the pole
locations of a feedback amplifier and thus potentially improves the bandwidth [86].

A more exotic approach to solving the problem that was proposed by Ginzton et al. is
using distributed amplification [87]. Here, the gain stages are separated with transmission
lines. Although the gain contributions of several stages are added together, the artificial
transmission line isolates the parasitic capacitors of several stages. In the absence of loss,
we can improve the gain-bandwidth product without limit by increasing the number of
stages. In practice, the improvement is limited by the loss of the transmission line. Hence,
the design of distributed amplifiers requires careful electromagnetic simulations and very
accurate modeling of transistor parasitics. For instance, a CMOS distributed amplifier was
presented in [88] with a unity gain frequency of 8.5 GHz.

The work presented in this chapter applies a multi-pole bandwidth enhancement


technique to wideband amplifier design. It is based on turning the entire amplifier into a
low-pass filter with a well-defined passband characteristic and cut-off frequency. The
inevitable parasitic capacitances of the devices are absorbed as part of the low-pass filter
and, hence, affect the bandwidth of the amplifier in a controlled fashion. Theoretical limits
of the gain-bandwidth product of lumped amplifiers have been known for over half a
century [26], [27], [89]–[91]. Broadband filter synthesis techniques for bandwidth
enhancement have been used for wideband amplifiers [93][94] and interconnect [95]
design. Applying proper matching networks between amplifier stages to approach those
limits is the key step in improving wideband amplifiers’ bandwidth with this method.

1. For more references on traditional techniques for wideband amplifier design look at the bibliogra-
phy of [89].
81

Section 4.2 reviews these theoretical limitations. Section 4.3 presents a technique to
improve the bandwidth of wideband amplifiers. A design example using this technique
follows in Section 4.4 to demonstrate the practicality of the method, whose validity is
shown with experimental results in Section 4.5.

4.2 Wideband-Amplifier Limits


A wideband amplifier should retain near-constant gain and linear phase over its
passband. The bandwidth requirements of such amplifiers continuously increase following
the drive for higher-speed systems. While device scaling continues to provide faster
transistors with higher cut-off frequencies, it is still desirable to improve the bandwidth of
amplifiers using circuit techniques that enable us to do so for a given process technology.

Over the last few decades, many techniques have been developed to improve the
bandwidth of amplifiers [36]. An improvement in the bandwidth of the amplifier is often
accompanied by a corresponding drop in its low frequency voltage gain. As such, the
gain-bandwidth product (GBW) can serve as a first-order figure of merit for an amplifier
topology in a given device technology [89][90]. For the purposes of this discussion, the
bandwidth is defined as the lowest frequency at which the voltage gain drops by 2 or
3dB. Accordingly, this bandwidth is often called the 3dB bandwidth. In Section 4.2.1, we
discuss the GBW limits of single-stage amplifiers for one- and two-port passive load
networks. Section 4.2.2 is dedicated to GBW limits of multi-stage amplifiers.

4.2.1 Single-Stage Amplifiers

4.2.1.1 One-port (two-terminal) load network

Figure 4.1(a) shows the simplest model for a linear single-stage amplifier, where R and
C are, respectively, the aggregate parasitic resistance and capacitance of the transistor and
the input of the following stage. The gain-bandwidth product of this amplifier is given by:
82

(a) vin R C
gmvin

(b) vin gmvin Z(jω)

Figure 4.1: Single-stage amplifier: (a) First-order load (b) General passive impedance load

1 gm
GBW = g m R ⋅ ------------------- = --------------- (4.1)
2π ⋅ RC 2π ⋅ C
As can be seen, the parasitic capacitance directly limits the bandwidth by reducing the
output impedance of the amplifier as the frequency grows. Consequently, retaining a
uniform output impedance over a wider frequency range will increase the GBW. In
general, it is possible to introduce a more elaborate passive load network Z(jω) to do so.
Figure 4.1(b) shows the generic load network, Z(jω), that should look like a constant
resistor over as wide a frequency range as possible. Wheeler [89] and Hansen [90] have
derived an intuitive upper bound for such a range. Bode [26] has mathematically proven
the existence of a bandwidth limit for a class of load impedances. Fano [27] and Youla
[91] have further generalized the theory for a larger class. This theoretical limit (a.k.a.
Bode-Fano Limit) for the amplifier in Figure 4.1(b) is [96]:

gm
GBW max = ----------- (4.2)
π⋅C
where C is defined as:
83

C = lim ⎛⎝ ---------⎞⎠
1
(4.3)
ω→∞ jωZ

and Z(jω) is an impedance function, as defined in Appendix C. Z(jω) includes the aggre-
gate output capacitance C, shown in Figure 4.1(a). It is easy to show that for a one-port
load network, C is greater than or equal to C. Thus, according to (4.2), any one-port pas-
sive network added in parallel to C can improve the GBW by at most a factor of two over
that of the amplifier in Figure 4.1(a). As a result, the maximum achievable bandwidth
enhancement ratio (BWER) for a one-port load is two. Shunt-peaking is an example of this
case. Shunt peaking results in BWER of 1.6 and 1.72 when designed for optimum group
delay or maximally flat responses, respectively [83].

4.2.1.2 Two-port (four-terminal) matching network

Figure 4.2(a) shows a single-stage amplifier, where the intrinsic output resistance and
capacitance of the transistor, i.e., R1 and C1 are separated from those of the load, namely,
R2 and C2. The combination of capacitors C 1 and C 2 limits the bandwidth of the
amplifier, i.e.,

gm
GBW = ----------------------------------- . (4.4)
2π ⋅ ( C 1 + C 2 )

In this case, a passive two-port network can be inserted between the transistor’s intrinsic
components (R1 and C1) and load (R2 and C2) to increase the bandwidth, as shown in
Figure 4.2(b). This two-port passive network can be designed to maintain the impedance
constant over a wider frequency range, as it separates and isolates C1 and C2. Therefore,
C1 is the only capacitor that affects the gain-bandwidth product at the input port of the net-
work. Based on the argument in Section 4.2.1.1, the maximum gain-bandwidth product at
the input port of N(jω) is:

gm
GBW max = -------------- . (4.5)
π ⋅ C1
84

Bode has shown that for C 1 = C 2 = C ⁄ 2 1 it is possible to design N(jω) in such a


way that the gain-bandwidth product at the output port is the same as that of the input [26].
Thus, for a single stage-amplifier with a two-port passive load network:

2 gm
GBW max = ----------- . (4.6)
π⋅C
This can be done by using a constant-k LC-ladder filter [26][50][89] terminated to its
image impedance. A constant-k LC-ladder filter that is terminated in its image impedance

(a) vin R1 C1 C2 R2
gmvin

passive matching network

(b) vin R1 N(jω) R2


gmvin C1 C2

(c) vin R1 C1 C2 R2
gmvin

Figure 4.2: (a) Small signal model of an amplifier with loading effect of next stage amplifier (b) The
inserted passive network isolates the amplifier parasitics and the load (c) Additional
inductor forms a 3rd-order passive network at the output

1. If not equal, he proposes adding an ideal transformer at the output to match C2 to C1 with the proper
ratio.
85

has a constant transfer function over the frequencies less than its cut-off frequency. Com-
pared to (4.4) with C 1 = C 2 = C ⁄ 2 , (4.6) is four-times larger than the gain-bandwidth
product of a single-stage amplifier without an additional coupling network. As a result, for
equal low-frequency gain, the maximum achievable BWER for a two-port load is four.

In general, it is computationally difficult to calculate the component values for the


optimizing two-port network directly. Even in the case of a third-order system, with only
an additional inductor between the device and the load as in Figure 4.2(c), the equation for
the value of the inductor that maximizes the bandwidth is quite complicated. Instead,
graphical or numerical methods can be used. Figure 4.3 shows a normalized gain of a
single-stage amplifier with a passive network load similar to Figure 4.2(c), where a single
inductor isolates C1 and C2. The component values are normalized to achieve 0dB gain at
low frequency and a 1 rad/s 3dB bandwidth. Figure 4.3(a) corresponds to when the output
impedance of the amplifier is equal to the load ( R 1 = R 2 = 1Ω and C 1 = C 2 = 1F ).
Figure 4.3(b) shows the case for R 1 = 0.5Ω and R 2 = ∞ . This may occur when the
output of the amplifier is connected to the next stage with capacitive input. Values for the
bandwidth enhancement ratio (BWER), defined as the ratio of the 3dB bandwidth of the
amplifier to the 3dB bandwidth when L=0, in both cases is summarized in Table 4.1. It is
noteworthy that even with a simple third-order passive network, BWER is significant
compared to its theoretical limit. Bandwidth optimization assumes no gain peaking
constraints.
86

Third-order Frequency Response for Different Inductors


1.3 1 0.707 0.6 0.4

Normalized Gain [dB]


-3dB line L=0

(a)

Normalized frequency [rad/s]


Third-order Frequency Response for Different Inductors
1
0.707
0.5
Normalized Gain [dB]

0.2

(b) -3dB line L=0

Normalized Frequency [rad/s]

Figure 4.3: Normalized gain of the amplifier with 3rd-order network load and different inductor values:
(a) R1 = R2 = 1Ω, C 1 = C 2 = 1F (b) R 1 = 0.5Ω, R 2 = ∞, C1 = C 2 = 1F

An alternative method to design the passive network is to look up the component


values in standard tables for low-pass filter design [97] or compute them from
87

corresponding equations [23][24][98]. Essentially, the additional passive networks are


low-pass structures that control the frequency response of the amplifier. After choosing
the desired frequency response for the amplifier, such as maximally flat gain or maximally
flat group delay, the component values can be chosen directly from standard tables. This
will be discussed thoroughly in Section 4.3.

Table 4.1: Bandwidth enhancement ratios for the two 3rd-order passive
networks in Figure 4.3

Case 1 Case 2
L Value [H]
R1=R2=1Ω R1=0.5Ω R2= ∞
0 1 1
0.2 1.44 3.4
0.4 2.46 2.42
0.5 2.2 2.17
0.6 2 1.99
0.707 1.83 1.83
1 1.52 1.55
1.3 1.31 1.36

4.2.2 Multi-Stage Amplifiers

Often it is hard to achieve a desirable gain-bandwidth product with a single-stage


amplifier. Then, several stages can be cascaded. The total gain is the product of the gain of
each stage. However, the overall bandwidth is less than the bandwidth of each stage,
because the gain drop in the passband of each amplifier will accumulate. For instance, the
overall 3dB bandwidth and the GBW of an amplifier made by cascading N similar single-
pole amplifiers with gain Av and bandwidth ω ° with no mutual loading is:

ω overall = ω ° ⋅ N 2 – 1 (4.7)

N
GBW = A v ⋅ ω overall . (4.8)
88

Compared to the single-stage gain-bandwidth product, Av ω ° , there is a gain-bandwidth


improvement of1:
GBW multi – stage N–1
------------------------------------------ = A v ⋅ N 2 – 1. (4.9)
GBWone – stage

For instance, N=2 and Av=10 correspond to a factor of 6.4 improvement in GBW. For
larger Av, GBW will increase dramatically by introducing additional single stages at the
price of increasing overall power consumption.

In practice, each stage has a loading effect on its previous stage, which reduces its
bandwidth, hence reducing the overall bandwidth. The matching networks introduced in
Section 4.2.1.2 can reduce the loading effect by separating the output of an amplifier from
the input of its next stage. One disadvantage of multi-stage amplifiers, in general, and
multi-stage amplifiers with two-port matching networks between each stage, in particular,
is excessive phase shift that each amplifier stage or each network adds to the signal path,
which can result in instability in feedback amplifiers.

4.3 Design Methodology


Based on the discussions in the previous section, for a given wideband amplifier one
can add passive matching networks at the input and output, as well as between the gain
stages of the amplifier to enhance the bandwidth. This method brings each stage of the
amplifier closer to its theoretical limit discussed in Section 4.2. The networks absorb the
capacitive parasitic components of the gain stages (transistors) and/or the source and load
into their structure. Each network can be designed as a low-pass filter structure with
standard response [23][89]. To achieve a particular response shape for each network (e.g.,
maximally flat group delay) the components in the passive network take the same values
as their corresponding element in the filter.

–1 -
---------------
2N – 2
1. The overall GBW will actually improve if A v > (N 2 – 1) .
89

In this approach one can resort to passive networks with low sensitivity to component
values such as ladder structure [99]. Figure 4.4 shows a general low-pass ladder structure
inserted between two gain stages in an amplifier. The component values are generated
using standard look-up tables [97] or network synthesis methods [98]. The network order,
N, is an additional design parameter. Using higher-order networks will provide wider
bandwidth and sharper transition from passband to stopband. However, it may cause some
practical issues, such as unreasonable component values, large numbers of passive
components (large die area), and additional signal loss due to passive components
(primarily inductors). Typically these issues limit the order of the network to five, i.e., only
three additional passive components.

Design Example: Here, we show the procedure for designing a maximally flat
response 3rd-order passive network as an example. Figure 4.5(a) illustrates the two stages
of a given amplifier with an inductor inserted between them. Figure 4.5(b) demonstrates
that the inductor forms a 3rd-order ladder structure with C1 and C3, transistor parasitic
capacitances. The values for R1, R2, C1, and C3 are known for the amplifier. To achieve a
maximally flat frequency response at the output of the ladder, components values should
be equal to their corresponding 3rd-order Butterworth filter elements as follows [98]:

1
C 1 = ------------------------------ (4.10)
R 1 ( 1 – δ )ω c

Passive Network
L2 L4 L6

vin gmvin R1 C1 C3 C5 CN R2

Added Network
Figure 4.4: Passive ladder structure of order N, inserted between the gain stages
90

L2
R1
L2

C1 C3 vin R1 C1 C3 R2
gmvin
R2

(a) (b)
Figure 4.5: (a) An inductor is inserted between two gain stages. (b) The small signal model shows
formation of a 3rd-order ladder network

2
L 2 = --------------------------------------------------- (4.11)
2 2
( 1 – δ + δ ) ⋅ ωc ⋅ C1

1
C 3 = ------------------------------ (4.12)
R 2 ( 1 + δ )ω c

where δ is an indication of impedance transformation between R1 and R2 and is defined as

R1 – R2
δ = 3 ------------------
- (4.13)
R1 + R2

and ωc is the 3dB cut-off frequency of the network. From (4.10) the new amplifier band-
width at the output of the ladder structure is

1
ω c, new = -------------------------------------- . (4.14)
( 1 – δ ) ⋅ R1 ⋅ C1

The inductor value can be calculated from (4.11) and (4.14). C3 for the original amplifier
may not be equal to the value with the new cut-off frequency, calculated from (4.12).
Some explicit capacitance should be added to adjust for this. If we define the bandwidth
enhancement ratio (BWER) as the ratio between the new 3dB bandwidth and the old one
(without adding the inductor) of the single-stage amplifier, we can show:
91

ω c, new 1 R2 C 1 + C3
BWER ≡ ----------------- = ----------- ⋅ ------------------- ⋅ ------------------- . (4.15)
ω c, old 1 – δ R 1 + R2 C1

Equations (4.10), (4.12), and (4.13) simplify (4.15) to an expression based on the ratio
of R1 and R2. BWER decreases monotonically when R2/R1 increases. For a given
amplifier with R2<R1, adding the inductor always enhances the bandwidth by BWER.
When R2=R1, BWER=1 and there is no bandwidth enhancement with adding the inductor.
However, a maximally flat pass band and sharp cut-off response is still achieved.

The same analysis can be applied to the input stage of a trans-impedance amplifier
(TIA) as shown in Figure 2.21. The photodiode is modelled by a current input in parallel
with a capacitance, CPD, as shown in Figure 4.6. Although R1 is eliminated from the
model, design calculations using (4.10)–(4.14) can use an arbitrary value for R1. An
optimum value for R1 can be computed from (4.14) with fixed C1 (CPD in this case) and
R2, to maximize the 3dB bandwidth. It results in R 1 = 2.05R 2 with δ = 0.7 . After
designing the inductor and adjusting for C3, R1 can be eliminated. Essentially, the
trans-impedance gain will increase as no portion of the input current is absorbed by R1
anymore. The enhancement ratio should also be modified for the input passive structure
as:

1 R 2 C PD + C 3 ⎛ C3 ⎞
BWER = ----------- ⋅ ------ ⋅ ------------------------ = 1.63 ⋅ ⎜ 1 + -----------⎟ . (4.16)
1 – δ R1 C PD ⎝ C PD⎠

L2

IIn CPD C3 R2

Figure 4.6: The inductor at the input forms a 3rd-order ladder network with the photodiode capacitance
92

The preceding example can be generalized to any response shape when (4.10)-(4.12)
are replaced with their corresponding filter component equation. (4.15) and (4.16) should
also be modified to correspond to the new component values.

4.4 Example Design


To demonstrate the effectiveness of the developed methodology, a CMOS
trans-impedance amplifier (TIA) is designed. It is a single-ended design consisting of
three gain stages. The first stage is a shunt-shunt feedback trans-impedance stage as
Figure 4.7 shows. The input resistance of the amplifier is approximated by Rf ⁄ ( A + 1 ) ,
where A is the inverting voltage gain. Thus, the stage can provide a low input impedance
and reduce the dominant effect of the input pole due to the large photodiode junction
capacitance, CPD. The input pole frequency can be written as:

1 A
P in ≈ ------------------------------------------- ≈ ---------------------------------------- (4.17)
R in ⋅ ( C PD + C in ) Rf ⋅ ( C PD + C in )

where Rin and Cin are the input resistance and input capacitance, respectively. For the
circuit in Figure 4.7, if the transistors are in short channel region, both Cin and A are
proportional to the input transistor width, Win:
A ≈ gm ⋅ RL ≈ v sat C ox W in ⋅ R L (4.18)

RL

Vbias

Vin RS

Rin,Cin
Rf
Figure 4.7: Schematic of the input stage of the TIA
93

C in ∝ C ox L in W in (4.19)

where Cox is the gate oxide capacitance, vsat is the carrier saturation velocity, and Lin is the
input transistor channel length. When the input width increases there is a bound for the
input pole dominated by Cin. However, additional constraints such as power consumption
or input noise set an optimum width for the input transistor [101]. Adding the additional
inductor to isolate Cin and CPD enhances the bandwidth according to (4.16). In this design
we match the input resistance to our electrical measurement setup, which had a 50Ω input
resistance.

The complete schematic of the circuit including the added passive components is
shown in Figure 4.8. The second and third stages of the amplifier are designed as cascode
configuration with intermediate inductors and are isolated using a source follower buffer.
Adding the source follower avoids the large input capacitance of the 3rd-stage amplifier to
load the 2nd stage as well as providing a low impedance node at its output and increasing
its pole frequency.

Four passive networks are inserted between the stages of the amplifier to enhance the
bandwidth. The input network separates the photodiode capacitance and the parasitic
capacitance of the input stage. Adding one inductor will transform it to a 3rd-order ladder
structure. The next two networks are also 3rd-order and are placed between the cascode

RL
RD
Parasitic Capacitances

VB VB L4 VOut
RF
L2 L3
L1
AV Buffer
I In C PD

Figure 4.8: Schematic of the TIA with parasitic capacitances and additional inductors
94

transistors. The load capacitance in conjunction with the output capacitance (including
bonding pad) and output bondwire inductor form the output 3rd-order network.

The capacitors, as shown with a dotted line in Figure 4.8, are the parasitics from the
devices and only four inductors are added to the original circuit. The input and output
inductors are bondwire inductors and the inter-stage ones are on-chip spiral inductors. A
final optimization step in simulation is performed to include the bilateral effects of the
devices. Note that the output network is different from a conventional shunt-peaking
approach. For a photodiode capacitance of 0.5pF, the circuit achieves over 9GHz 3dB
bandwidth in simulation. This is 2.4 times larger than the bandwidth achieved using the
same circuit without the inductors. The individual effect of each passive network and the
effect of a combination of them are summarized in Table 4.2 from simulation results.

Table 4.2: Comparison of the individual effects of the inductors on BWER

Additional Inductors BWER


no inductor 1
L1 only 1.48
L2 only 1.42
L3 only 1.62
L4 only 1.17
L1, L3 2
L1, L2, L3 2.3
All 2.4

L3 causes the largest improvement in bandwidth because the device sizes of the second
cascode amplifier are large to drive 50Ω with a minimum loss of gain. L1 is separating the
two large capacitances that form the input pole frequency. In our design, this pole is the
dominant bandwidth limiting factor of the core TIA without a driver. L4 is not remarkably
enhancing the bandwidth because the output pole is not dominant. However, L4 will exist
95

in the circuit as the bondwire and should be modeled. All four passive networks have a
ladder structure for lower sensitivity to process variations.

Both on-chip inductors were implemented as spiral inductors in the top metal layer.
Accurate electromagnetic modeling of the inductors was done using ASITIC [102] and
SONNET [103] E&M simulators and gave similar results. The parasitic capacitances of
the inductors are not negligible, and their impact is considered in addition to device
parasitics.

4.5 Experimental Results


The trans-impedance amplifier was implemented in a 0.18µm BiCMOS process
technology using only CMOS transistors. It draws 55mA from a 2.5V power supply. The
scattering parameters were measured with a 20 GHz HP8720B network analyzer.
Assuming matched output, it can be shown that the complex TIA trans-impedance, Z(jω),
can be extracted from this measurement using:

Z ° ⋅ S 21
Z ( jω ) = -------------------------------------------------------------------------- (4.20)
1 – S 11 + Z ° ⋅ jωC PD ( 1 + S 11 )

where Z ° = 50Ω is the reference impedance and CPD is the photodiode capacitance. The
amplitude and group delay response of the implemented TIA, extracted from measure-
ment data, are shown in Figure 4.9(a) and Figure 4.9(b), respectively, when CPD=0.5pF.
Matched output will cause a 6dB drop in the gain, which is adjusted for in the reported
result. Group delay is calculated from the phase response of the amplifier and logarithmic
frequency steps of the network analyzer.
96

60
Rtrans
50

S11 [dB]; Trans-resistance [dBΩ]


40
30

(a) 20
10
0
-10
S11
-20
-30
0.1 1 10
Frequency [GHz]

400
350
Group Delay [ps]

300
250
(b)
200
150
100
50
0
1 2 3 4 5 6 7 8 9 10
Frequency [GHz]
Figure 4.9: (a) Trans-resistance gain of the TIA with 0.5 pF photodiode capacitance and the input
matching. (b) Group delay response of the TIA

The 3dB bandwidth is 9.2 GHz, which is in good agreement with the simulations, and
the trans-impedance gain is 54 dBΩ. The input reflection coefficient, S11, remains below
-10 dB up to 7 GHz. Although we did not design for flat group delay, the group delay
ripples are ± 25ps . The dip in the frequency response of the trans-impedance at 2.5 GHz
can be correlated to a resonance mode between the on-chip supply by-pass capacitor and
bondwire and supply line inductances. Changing these parameters changes its depth and
97

30 mV
50 ps

Figure 4.10: Eye diagram of the TIA output with 10GB/s 231-1 PRBS at the input

frequency during the measurement and can be removed by using a different supply
by-passing technique in a revised version of the design. The design has low sensitivity to
inductor values. The simulated values for L1 and L4 are 0.5–0.6nH. L2 and L3 are 1nH
100x100 µm2 spiral inductors.

Figure 4.10 shows the eye diagram when a 231-1 pseudo random bit sequence is
applied to the input at 10GB/s. The ringing is partly due to the resonance mode at 2.5 GHz
and partly due to the absence of the photodiode capacitance that will cause peaking in the
overall transfer function. This peaking translates to a ringing response in the time domain
and will increase the ISI penalty and close the eye vertically. However, the TIA still
achieves the overall sensitivity of -18dBm for BER, which is better than 10-12 as we
discuss next.

The electrical sensitivity of the amplifier for different bit error rates (BER) is measured
using Antrisu’s MP1763C and MP1764C BERT system. A 231-1 pseudo random bit
sequence is applied to the input at 10GB/s, and the BER is measured for different
electrical input powers at 500-second intervals. The results are depicted in Figure 4.11.
For a data communication link, the required BER is typically 10-12. The TIA achieves a
sensitivity of -18dBm or 15.8µW for this BER when photodiode capacitance is not
present. At very low power inputs we were limited to the sensitivity of the BERT system.
98

Figure 4.11: The BER of the TIA for different input powers at 10GB/s

The TIA output swing was not large enough to meet the minimum requirement of the
BERT input.

Simulated total input noise current of the TIA, integrated over the bandwidth, equals
1.6µA. In an optical receiver, there are two other noise sources that contribute to increase
the minimum-detectable optical power. One is the intensity noise of the transmitted signal
originating mainly from spontaneous emission of the laser source [43]. Resulting current
noise at the receiver input can be quantified as

i t, rms = R ⋅ P in ⋅ RIN (4.21)

where R is the responsivity of the detector, Pin is the input optical power, and RIN is the
relative intensity noise of the laser integrated over the bandwidth. Second noise source is
the shot noise of the photodetector that is generated proportionally to the optical power
given by [43]

i s, rms = 2q ( R ⋅ P in + I dark ) ⋅ ∆f (4.22)

where q is the electron charge, Idark is the dark current of the detector, and ∆f is the system
bandwidth. From (4.21) and (4.22) we can compare the injected noise currents and deter-
99

L1 Pad

L4 Pad L2
L3

Figure 4.12: The die photograph of the 9.2 GHz TIA

mine the dominant noise source. Assuming peak RIN/∆f= -130dB/Hz for a typical laser
and a detector with R=0.8A/W and I dark =10nA, minimum input optical power of
Pin=20µW will result in it,rms=0.014µA and is,rms=0.22µA (∆f=9.2GHz). Therefore, the
thermal noise of the TIA is the dominant noise source of the receiver, and we expect that
the optical sensitivity and the electrical sensitivity of the TIA are comparable. The ampli-
2
fier core occupies 0.8 × 0.8mm of area, as shown in Figure 4.12.

4.6 Summary
In this chapter, we studied wideband amplifier design for the front-end of high-speed
wireline links. We addressed the gain-bandwidth product (GBW) limits of amplifiers and
introduced a methodology that can be used to enhance the bandwidth of wideband
amplifiers with specified characteristics for their transfer functions. In a simple design
100

procedure, parasitic capacitances of transistors can be absorbed into passive networks,


inserted between the gain stages. The component values can be calculated based on
standard low-pass filter structures. A prototype CMOS TIA implemented using the
developed technique achieves over 9 GHz bandwidth and 54dBΩ trans-impedance gain in
the presence of a 0.5pF photodiode capacitance.
101

Chapter
Eye-Opening Monitor
5 for Adaptive
Equalization
5.1 Introduction
In Chapters 2 and 3, we discussed how the channel and receiver front ends of a
high-speed wireline communication system degrade received signal quality by adding ISI
and jitter. In Chapter 4, we provided a bandwidth enhancement method that can be utilized
in the design of the receiver front-end. To minimize the ISI caused by the channel
response we can use an equalizer (Section 2.3.4). For instance, Wu et al. [48][104] and
Reynolds et al. [49] have demonstrated significant bit error rate (BER) reduction by using
transversal filter equalizers in the receiver front-end of multi-mode fiber links to
compensate for modal dispersion.

When the channel response is initially unknown or if it may vary over time, an
adaptive equalizer is used in which the transversal filter coefficients are adjusted
automatically and continuously to track channel response variations. Since the adaptation
is an iterative process, a feedback mechanism is required to measure and report the signal
quality at the equalizer output. In this chapter, we propose an eye-opening monitor
(EOM), which is a circuit block that reports a quantitative measure of the quality of the
signal eye diagram and thus can be used as such feedback.

Figure 5.1 shows the block diagram of a transversal filter adaptive equalizer that uses
an EOM circuit. The EOM evaluates signal quality by making periodic observations of the
filter output and providing information about the filter performance to an optimization
algorithm. The algorithm updates all of the filter coefficients accordingly. This
architecture is desirable if the transversal filter is implemented using broadband passive
102

T T
in

c0 c1 c2

out

Data
Filter CDR
New
coefficients

Algorithm EOM

Figure 5.1: Adaptive transversal filter equalizer with an eye-opening monitor (EOM)

delay lines, i.e., LC networks [48][49][104] or active delay elements [80]. At multi-Gb/s
data rates, the passive or active delay cells become more sensitive to on-chip parasitic
components. In contrast to conventional LMS adaptive equalization (Section 2.3.4), when
an EOM-based adaptive equalizer is used, the nodes of the delay cells of the filter are not
loaded by additional hardware for adaptation circuitry. Therefore, the filter can be
designed as a separate module and its response remains intact. The other advantage of the
EOM-based architecture is that the cost function for the coefficient optimization is only
based on the filter output and is independent of the receiver decision on the symbols. This
is specially beneficial in links where training sequences are not used for adaptation and
most decisions are erroneous at the startup when BER is high. The EOM can also be
utilized as a standalone measurement system to verify the quality of the eye.
103

5.2 Prior Art


The received signal quality in a communication link and the shape of its eye diagram
are strongly correlated [16][17]. Therefore, eye diagram monitoring has been proposed as
a technique for extracting information about the received signal [105]–[108] and is used in
various applications including adaptive equalizers. George [109] and Hogge [110] both
introduced eye monitor hardware that are used as pseudo-error detectors [105][111] for
rapid estimation of low BER. The estimation is based on evaluating the eye diagrams by
comparing them against a fixed rectangular eye-opening mask. Shin et al. [112] use an eye
monitor to perform a pass/fail test on fiber optic channels. They also use a fixed
rectangular mask overlapped with the eye diagram and count the number of eye traces
inside it. If it is more than a given threshold the channel has failed and another stand-by
fiber channel is selected. Various signal performance monitors have been proposed
[113]–[116] to adaptively adjust the decision threshold level of the receiver. For instance,
in [114] and [115] the approach is to fit a rectangular mask with a fixed width to the eye
and to adaptively adjust its height to keep the number of eye traces occurring inside the
mask constant. The traces above and below threshold (representing “1” and “0” bits) are
counted separately to capture unbalanced eye shapes. The threshold is set to the center of
the rectangular mask.

Eye-opening monitor circuits have also been utilized as part of adaptive equalizers
[117]-[125] mainly to mitigate various dispersion issues in optical fibers. In [120] the eye
monitor estimates the vertical eye opening at the sampling point. The receiver includes a
path parallel to the main path that embraces a decision circuit with a variable decision
threshold. The threshold is varied to sweep the eye vertically. The decision of the two
paths are compared and an error is recorded if they differ. When the error is integrated
over time for various thresholds, the eye vertical opening for a given error rate can be
estimated from the separation of the thresholds that resulted in that error rate. Ellermeyer
suggests a circuit for estimating the horizontal eye opening of the input signal [118][119].
104

A rectangular mask with fixed height is overlapped with the input eye. The width of the
mask is increased as long as eye traces do not occur inside it and is decreased otherwise. In
steady state, the mask width indicates the horizontal eye opening.

We propose an EOM circuit architecture that has a unique feature of mapping both the
vertical (amplitude) and horizontal (temporal) opening of the received eye to a two-
dimensional error diagram [28]. The error diagram is directly correlated to the eye
opening in both dimensions and is essentially the captured image of the signal eye
diagram. The output error rate is recorded with a digital counter as opposed to an
accumulated or integrated format. This is advantageous when the eye monitor is in a
feedback loop with a microcontroller that runs the optimization algorithm because error is
recorded in finer resolution and potentially has larger dynamic range. We have
implemented a prototype of this 2D EOM circuit in 0.13µm standard CMOS technology,
and we have verified its operation up to 12.5Gb/s.

In the following sections, the operation principle of the EOM is discussed first. Then,
the architecture and details of the associated circuit blocks are presented. Finally, the
experimental techniques for verifying the operation of the prototype and the measurement
results are described.

5.3 EOM Principle of Operation


The EOM characterizes the opening of an eye diagram by an eye mask. The eye is
overlapped by several rectangular masks with various sizes and aspect ratios. Any eye
trace, i.e., data transition, that passes inside a mask is counted as an error. For a given
mask, if the EOM runs for a sufficiently long time, some data transitions will eventually
fall inside the mask and create an error even for an apparently good eye diagram. This is
because random jitter and amplitude noise often have unbounded distributions. A mask
error rate (MER) can be defined as the number of data transitions that fall inside a given
105

error count
MER ≡
total transition s

0%

0.1%

20%

Figure 5.2: The mask error rate (MER) varies for different mask shapes in a given eye diagram

mask normalized by the total transitions during the same time period. Figure 5.2 illustrates
an example where MER is obtained for three different masks in a given eye diagram. Any
given mask is associated with a MER that increases as the quality of the monitored eye
degrades. The horizontal and vertical opening of an eye can be determined from the mask
size for a specific MER. Moreover, different eye diagrams can be quantitatively compared
by comparing their associated mask sizes at a given MER. The eye that can fit a larger
mask for the given MER is more desirable.

A significant feature of a 2D eye-opening monitor is that it can capture eye diagram


shapes with irregular and non-rectangular openings that are common in high-speed links.
In such a case a rectangular mask shape might not be the optimum choice for comparing
eye openings because the non-zero rise time and fall time of data transitions constitute a
large portion of the bit period and form a rounded diamond eye opening. A 2D EOM can
generate rectangular masks of different size in both horizontal and vertical dimensions.
For a given eye diagram, a group of masks with different aspect ratio can have the same
MER. Figure 5.3 demonstrates an example of a typical eye diagram shape overlapped with
three masks. All the masks result in MER=0. The combined area inside the eye that covers
all the masks with the same MER is defined as the effective eye opening at that MER. This
effective eye opening is not necessarily rectangular and contains more realistic
106

Effective Eye Opening

Figure 5.3: The effective eye opening formed by combining the mask areas that have the same MER

information about the shape of the eye. The EOM architecture in this design can measure
the effective eye opening for different MER values. The aggregate of effective eye
openings is a 2D error map that covers the eye diagram completely and is a representation
of the shape of the eye as Figure 5.4 illustrates hypothetically.

The MER for a given mask is found from counting the errors, i.e., the number of data
transitions that cross either of the two vertical sides of the mask. The operation is
demonstrated in Figure 5.5. Two reference voltages, VH and VL, define the vertical
opening of the mask, and the two phases of the sampling clock, φearly and φlate, determine
its horizontal opening. Data is continuously compared with VH and VL, and these results

10-3
10-4
10-5
10-6
10-7
Effective Eye Opening
MER
for various MERs

Figure 5.4: The combination of effective eye openings is a 2D error map that is correlated to the shape
of the eye diagram
107

VH
Mask
VL
φearly φlate

SH
left side
SL
SH
right side
SL
φearly φlate
Figure 5.5: Operation principle of the EOM for one mask

are sampled at both early and late phases. At each phase, if the sampled values differ, a
mask violation has occurred and an error is flagged. The error detection logic is
error = S H ⊕ S L , where SH and SL are sampled comparison results for either of the
phases, i.e., either side of the mask, and the operator is XOR. The timing diagram in
Figure 5.5 illustrates one violation for each side of the mask. If the errors of the left (from
φearly) and the right (from φlate) sides of the mask are counted separately, horizontally

assymetrical eye diagrams can be captured effectively. We have added this capability to
the architecture by providing two independent error detector blocks for two sides of the
mask.

5.4 EOM Architecture


Figure 5.6 shows the proposed architecture of the EOM circuit. Differential input data
is compared with differential reference levels in two comparators. The lower comparator
reference is generated by swapping VH and VL [118][126]. The reference levels can be
adjusted either through an on-chip DAC or externally. The DAC sets V H = V cm + n∆V
108

DFF
D Q
SH,late early
Data DFF S 2
H,early
VH D Q
2 logic,
VL φlate retime, sel0 sel1
φearly 16

DFF SL,late combine & error_out


D Q 2
late M
DFF S
L,early logic, M=1, 4, 16, 64
D Q
retime,
φlate 16
φearly 2 clock_out
512
φ φ 15 phase set
registers

DAC I Q I Q
15
Vcontrol next_φearly next_φlate

next_ref
CML
2
full-rate clock
CMOS

Figure 5.6: The EOM architecture

and V L = V cm – n ∆V , where Vcm is the input common mode and 1 ≤ n ≤ 7 . Every


positive edge on next_ref triggers a reference-set shift register that increases n by one. The
eighth edge resets n to 1. The step size, ∆V, is adjustable externally. The comparators'
outputs are sampled at both early and late phases by the D-flip flops (DFF) that follow the
comparators. The sampling clocks are half-rate and thus each DFF block consists of two
master-slave DFFs to sample at both rising and falling edges of the clocks. This avoids
skipping any data transitions. The samples from early and late phases are processed
separately in two independent logic blocks. From discussions in Section 5.3, this allows
the EOM to differentiate data transitions that cross the left side of the eye mask from those
that cross the right side and enable it to capture asymmetrical eye diagrams. In our
prototype implementation we have combined the early and late errors to only one
error_out signal due to test equipment limitations. However, asymmetric eye shapes can
still be captured by triggering the early or late sampling phases one at a time.

In each logic block for early or late phases, the errors due to rising and falling edge
samples are detected, retimed, and merged. The errors are detected for the edges
109

separately by independent XOR gates. Then the error signals from the falling sampling
edge are re-sampled at the next rising edge to align the two error signals in time. Then,
they are merged by a logic OR function. The merged error signal is divided down by a
factor of 16 using CML logic. This allows the use of low power CMOS logic for the
dividers in the subsequent stages. Finally, the two error signals from the φearly and φlate are
retimed by the early sampling phase and are combined. The error output passes through a
digital divider with four selectable divide ratios. A larger divide ratio is selected in order
to measure cases with high-error counts. The chip output, error_out signal, is a toggling
output. MER for a fixed mask size can be calculated from the frequency of error_out
signal, ferror, as

N ⋅ f error
MER = ---------------------- , (5.1)
BR
where N is the total divide ratio in the chain and BR is the input bit rate. A separate divider
chain is used to divide the late sampling clock, φlate, by 512. The output is used to monitor
the clock divider and phase rotator functionality and is also applied as a trigger signal dur-
ing the chip test and characterization.

The sampling clocks are generated from an external full-rate clock that is divided by
two with an on-chip divider to create half-rate I and Q phases. Two single-quadrant phase
rotators interpolate between I and Q and between I and Q to create, respectively, φearly
and φlate. Therefore, the output phase of each rotator covers a range of 90o or half of the bit
period as can be seen from the timing chart in Figure 5.7. Each rotator has a 15-bit
thermometer-encoded control line that sets the phase interpolation weights and results in a
phase step of 6o. The control-line value for each rotator is determined by a phase-set shift
register. The trigger signals of the shift registers that increment the control lines for φearly
and φlate are next_φearly and next_φlate, respectively. When both control lines are set to
zero, φearly and φlate have the same phase as Q and overlap in the center of the eye. Every
positive edge on next_φearly moves φearly one step to the left. Similarly, every positive edge
110

full-rate
clock
I

I
φearly φlate
Figure 5.7: Generation of φearly and φlate by phase interpolation

on the next_φlate moves φlate one step to the right. The 16th positive edge on either
next_φearly or next_φlate automatically resets the phase to the center (Q) position.

By separately stepping the next_φearly, next_φlate, and next_ref trigger signals, the
architecture provides three degrees of freedom for obtaining several rectangular mask
sizes in both horizontal and vertical dimensions. Seven settings for the differential
reference voltage DAC and 15 for each phase rotator provide 210 different masks. The
number of masks can be increased by applying reference voltages externally with a
smaller step size. The MER increases as the mask size expands in either dimension. The
EOM can be utilized in two ways. Mask expansion can be stopped at a threshold MER to
report the eye opening or all masks can be swept to capture the full error map that
represents the effective shape of the eye diagram.
111

data VH

VL CML
data
Buffer
VB VB VB

Figure 5.8: The differential comparator circuit

5.5 Circuit Implementation


The comparator circuits use a two-stage differential topology followed by a
current-mode logic (CML) buffer, as in Figure 5.8. The first stage consists of two parallel
source-coupled pairs [126]. The overall output of the stage is

vo = gm ⋅ R ⋅ [ ( vi – VH ) – ( vi – VL ) ] (5.2)

that can be rewritten as

vo = gm ⋅ R ⋅ [ ( vi – vi ) – ( VH – VL ) ] . (5.3)

The latter is the desired output for a differential comparator with differential reference
voltage. The parameters gm and R in both Chapter (5.2) and (5.3) are, respectively, the
transconductance of one MOS transistor and the load resistor. Since the reference voltages
VH and VL are stepped such that all the input swing range is covered by the vertical mask
opening, each source-coupled pair must tolerate a wide range of common-mode input and
thus needs a large CMRR. The second stage is also added to enhance the CMRR of the
comparator and to increase its sensitivity. The tail current devices are designed longer than
the minimum feature size to improve their output impedance and further enhance CMRR.
The comparator is optimized to achieve maximum gain-bandwidth product. This maxi-
mizes the comparators’ sensitivity and thus minimizes the degradation of the input eye
diagram shape due to the EOM non-idealities. We will discuss the impact of EOM
non-idealities on the eye opening in Section 5.6.
112

The comparator’s offset is another limitation that affects the EOM operation by
shifting the rectangular mask vertically. The input offset for each input source-coupled
pair can be modelled by a shift in either of VH or VL in (5.2). Equivalently, the overall
offset can be modelled as a constant term on the right-hand side of (5.3). In the absence of
offset, the comparator maximum sensitivity is when VH =VL, and both are equal to the
input common mode. With offset, the maximum sensitivity is when VH -VL equals the
amount of offset. This interpretation is used to de-embed the offset impact on MER
measurements as will be shown in Section 5.6. In the implementation of the prototype we
minimized offset by careful layout techniques to increase matching between transistors.
We also avoided using low-Vt (MOS threshold voltage) devices for the input stage
transistors of the comparator due to their poor Vt-matching property. Monte-carlo
simulation of the comparator shows a mean output offset voltage of 6.4mV with
worst-case value of 25mV. A CML buffer follows the second stage of the comparator to
convert the output swing to proper levels for CML DFF blocks in the subsequent stages.
The DFFs use standard master-slave topology with conventional CML latches and
resistive loads. The clock divider is a static divider based on similar CML latches. We
used low-Vt transistors in the latch circuit to enhance the latch switching speed.

The phase rotator circuit consists of a phase interpolator and a phase-set register that
adjusts the proper interpolation weight. The phase interpolator is formed by two parallel
differential stages, as Figure 5.9 shows. The differential input of each stage is connected to

φout
φout
φ1 φ1 φ2 φ2

s0 s0 s14 s14
15b
x15
Phase Set next_φearly
VB VB
Registers next_φlate

Phase Interpolator-core 15b


Horizontal Opening Controls

Figure 5.9: The phase interpolator and phase-set register


113

90

Output Phase [degree]


67.5

45
4GHz
6GHz
22.5 8GHz

0
0.25 0.5 0.75 1
Input Weight
Figure 5.10: Simulated phase interpolator transfer function for different bandwidth

one of the two input phases. By properly adjusting the differential control lines, s0-s15, the
tail current is steered between the two stages to set the input phases weights and obtain the
desired interpolated phase. To generate uniform phase steps and thus uniformly sweep the
mask horizontal opening, the transfer characteristic of the phase interpolator, i.e., the
relationship between output phase and input weight, should be linear. This characteristic
can be controlled by the input signal transition slope and the bandwidth of the interpolator.
Figure 5.10 illustrates the transfer function for three different bandwidths that is achieved
by generalizing the approach in [127]. The phase interpolator is modelled by a
bandlimited system that performs a weighted sum operation on two input signals.
Although smaller bandwidths linearize the transfer function, they cause increased jitter
because they reduce output signal transition slope and thus create more timing uncertainty
due to amplitude fluctuation at the signal threshold-crossing point.

The reference-set register for the DAC, the phase-set registers for the phase rotators,
and all the back-end dividers and the error combiner are implemented using CMOS
standard cells in the technology to achieve lower power consumption.
114

VH VL

/2 & Phase

Comparators
660µm DFFs
next_φlate Data
Rotators

400µm Core Data


next_φearly
Digital
CMOS

error_out Clock
clock_out

Figure 5.11: The die photograph of the EOM with magnified active core

5.6 Experimental Results


The EOM circuit was implemented in a 0.13µm standard CMOS technology. The die
photograph, along with the layout of the active core, is shown in Figure 5.11. The chip is
designed for a customized pad frame that enabled us to perform on-wafer testing. As a
result, the die size is bounded to the pad frame and is 1.7x1.7mm2. However, the active
area of the EOM circuit that is highlighted on the die photograph is only 400x660µm2.
On-wafer measurements at up to a 12.5Gb/s input data rate with a 231-1 PRBS source and
1.2V power supply showed successful error diagram measurement. Tested input amplitude
was from 50mVp to 400mVp. The chip consumes about 275mA from a 1.2V supply. It is
functional at 10Gb/s with a supply voltage as low as 1V. It operates reliably even at severe
input conditions when a closed eye with 10-2 BER is applied to the input. In the following,
we elaborate on the test setup and experimental results.
115

PRBS Source
Clock
Delay
Trigger
Add
Jitter 10.0GHz Spectrum Analyzer Oscilloscope

clock

next_φlate next_φearly
∆t data
Oscilloscope/Counter
∆t data clock_out
delay line EOM chip
error_out
VH VL Power Supply

Figure 5.12: Measurement setup

5.6.1 Test Setup

The block diagram of our test setup is shown in Figure 5.12. A PRBS source provides
the data and clock for a wide range of data rates up to 12.5Gb/s. The data source has an
additional port that controls the amount of jitter added to the data artificially. Although the
full rate clock phase is primarily phase-locked to data when applied to the EOM, the
on-chip path difference does not preserve the phase relationship. In our measurements, we
compensated the path delay mismatch by an external delay line. We adjusted the external
delay to minimize the MER for the minimum size mask to guarantee that the mask is
centered with respect to the eye. In the adaptive equalizer loop this calibration can be done
once at start up, as the delay mismatch is a systematic effect. In addition, two external
delay lines were used at the input path to compensate external cable mismatches and
insure a 180o phase difference between differential inputs.

The trigger signal for next_φearly and next_φlate are applied externally, and step
horizontal opening of the mask. Similarly, vertical opening is controlled by varying VH
and VL externally. A frequency counter records the average frequency of the error_out
signal, from which MER can be calculated using (5.1).
116

~50ps
40ps 80mV

Figure 5.13: Accumulated phase of the clock_out signal that verifies functionality of the divider and
phase rotator with 10GHz input clock

5.6.2 Clock Path

We first tested the functionality of the divider and the phase rotators by observing the
clock_out output signal on the oscilloscope. Figure 5.13 shows the accumulated phases of
the clock_out when a 10GHz clock is applied to the clock input. We trigger the next_φlate
signal by applying a 3MHz square wave pulse. Although the standard-cell CMOS dividers
slow down the clock transition, the accumulated phases correctly cover 50ps, which is
equivalent to half of the bit period of a 10GB/s signal.

5.6.3 Qualitative Eye-Opening Measurement

The objective of this test is to verify the functionality of the main blocks in the data
path of the EOM. We apply a 10Gb/s PRBS input signal and add a 41ps peak-to-peak
sinusoidal jitter (SJ) to it to degrade the eye quality, as in Figure 5.14(a). The next_φearly
and next_φlate signals are stepped simultaneously. The vertical opening of the mask is
constant and is set to 120mV with external references. Figure 5.14(b) shows the measured
error_out signal. There is an error-free region (no toggle) for a small mask opening that
corresponds to when φearly and φlate are close to their initial positions in the center of the
117

φearly,14 φ0 φlate,14
(a)

20ps 150mv

error_out no-error zone

(b)

next_φearly
next_φlate

Error frequency increases

(c)

next_φearly
next_φlate

Figure 5.14: Qualitative eye-opening measurement. (a)10Gb/s input eye diagram (b) error_out signal
demonstrates an error-free region (c) magnified error_out signal shows MER increase
for wider mask

eye. But as the trigger signals step the sampling phases toward the edges of the eye and
thus the mask gets wider, the error frequency gradually increases. This can be seen in
Figure 5.14(c), which is the magnified error_out signal around regions with error and
118

Measured Eye Opening [ps]

Limited by delay resolution MER


1%
70
60 0.50%
50 0.1%
40
30
20
Desired monotonic response
10
0
0 10 20 30 40 50 60

Peak to Peak Jitter [ps]

Figure 5.15: Measured eye opening for various input eye diagrams with different peak-to-peak jitter

shows the frequency of error_out signal increases after each positive trigger edge. The
periodic behavior of the error_out signal is due to the self-resetting mechanism of the
phases.

5.6.4 Eye-Opening Measurement Variations

Ultimately, the EOM will be used in an adaptive equalizer as shown in Figure 5.1. In
such a setting, the EOM output should track variations of the eye opening and provide a
correct gradient to assist the optimization algorithm in adjusting the filter coefficients. We
verified the behavior of the EOM in this scenario by measuring the eye opening when
various amounts of peak-to-peak SJ are added to the 10Gb/s, 231-1 PRBS input. The
vertical opening of the mask is constant. Figure 5.15 shows the measurement result with
three sample input eye diagrams that demonstrate the gradual closing of the eye. As
119

expected, the measured eye opening monotonically decreases as additional jitter closes the
eye. At low input jitter, the transition from small to large measured MER is abrupt and
thus the plot loses accuracy, because the resolution of the horizontal eye-opening step
becomes comparable to the peak-to-peak input jitter. Therefore, when the sampling clocks
approach the data edge, one horizontal opening step can increase the number of transitions
falling inside the mask from zero to all the transitions.

5.6.5 Complete System Test

This experiment demonstrates the EOM capability in generating the two-dimensional


error diagram that corresponds to the input eye shape. A pair of 5-foot coaxial cables were
used to add ISI to the input data. A computer program controls a pulse generator and two
external power sources through a GPIB port. The program steps through several mask
horizontal and vertical openings. A frequency counter records the error_out frequency, as
the number of transition errors, for each mask. In each mask sweep only one half of the
eye is covered by triggering only one of the next_φearly or next_φlate signals. The other
phase is held in the center of the eye in phase with Q. Once one half of the eye is swept
completely, the other next signal will be stepped through to cover the other half. This way
any horizontal asymmetry in the eye is captured.

Ideally, when VH and VL are both equal to the input data common mode the mask error
rate should be minimal because the comparators have the highest sensitivity. Therefore,
the vertical mask is swept for V H = V cm + n∆V and V L = V cm – n ∆V , where Vcm is data
common mode and 1 ≤ n ≤ N . N is seven if on-chip DAC is used but can be larger with
external reference adjustment. However, due to the comparators' offset, the minimum
error count may occur when V H ≤ V cm or V L ≥ V cm . To guarantee that all the horizontal
range is covered, the horizontal sweep is done for – N ≤ n ≤ N . We measured three sample
chips and we observed that the minimum mask error rate occurs at n=1 or 2 corresponding
to 5mV to 10mV differential offset.
120

vertical mask size [mV]

Log(MER) [10-x]
-7
<10

horizontal mask size [UI]

Figure 5.16: Measured 2D error map with 68dB dynamic range

Figure 5.16 illustrates the two-dimensional error diagram that is generated as the result
of the measurement for the input eye in Figure 5.14(a). It demonstrates that the
asymmetrical input eye shape is captured. Furthermore, the diagram has 68dB dynamic
range for MER. The dynamic range is a function of the time period for MER measurement
per one mask. A longer period of error-free measurement corresponds to a smaller MER.

It can be shown that MER is bounded by the input noise. We show in Appendix D that
the MER measured by an ideal EOM can be expressed as

1 – ( VH – VL )
MER ≅ Q ⎛⎝ ----------------------------------⎞⎠ (5.4)

for a signal amplitude of “1” and a noise standard deviation of σ. Equation (5.4) simplifies
to conventional BER expression when VH =VL. Amplitude noise is assumed to have Gaus-
sian distribution, and Q(.) is its cumulative distribution function. Due to the exponential
nature of the Q(.), expected MER is about four orders of magnitude larger than BER for
BER about 10-12.

The non-idealities of the EOM, specifically the bandwidth limitations of the


comparators further degrade the measured MER. It is shown in Appendix D that (5.4) can
be modified to
121

A ( t ) – ( VH – VL ) A ( t ) + ( VH – VL )
MER ≅ Q ⎛ -----------------------------------------⎞ ⋅ ⎛ 1 – Q ⎛ -----------------------------------------⎞ ⎞ (5.5)
⎝ 2σ ⎠ ⎝ ⎝ 2σ ⎠⎠

to take the impact of the EOM into account. A(t) is the response of the comparator to the
input sequence at the time of sampling, t. As the sampling clocks, φearly or φlate are stepped
toward the edges of the eye diagram, A(t) approaches the threshold level and MER
increases as a consequence. A two-dimensional MER map can be obtained from (5.5) for
different sampling times and VH-VL based on the input and comparator response. We gen-
erated this error map using the simulation results of the comparator in our design and
compared it with the measured 2D error map in Figure 5.16. A two-dimensional cross-cor-
relation of the two maps resulted in a 0.9 correlation coefficient that verifies that our
measurement is closely following the expected result from the simulation.

5.7 EOM vs. BERT


The EOM has two main features that distinguish it from a conventional bit error rate
test (BERT) system. First, the EOM detects errors based on two samples at the same
sampling phase. It does not require pattern matching. Thus, unlike a BERT that requires a
PRBS sequence for proper operation, the EOM can operate with truly random sequences
and does not need the a priori knowledge of the transmitted sequence. This simplifies the
error detection computation and hardware remarkably well. Second, a BERT treats a
channel's deterministic impacts, amplitude noise, random jitter, and digital errors induced
from imperfect digital circuit blocks, e.g., multiplexer and demultiplexer, equally.
Therefore, the effect of individual impairments on the number of detected errors can not
be separated. On the other hand, the EOM is more sensitive to deterministic impairments
of the channel, e.g., deterministic jitter and ISI because the random effects are averaged
when the error number is divided down in the counter. Furthermore, the digital errors are
ignored by the EOM as it does not compare received data with a pre-determined sequence.
Consequently, the error count is mainly correlated to the impact of the channel response.
Figure 5.17 compares the response of the EOM and a commercial BERT in the presence
122

Measured by the EOM


15
12.5Gb/s 231-1 -0.4

vertical mask size [mV]


10 -0.5

Log(MER) [10-x]
-0.6
5
-0.7
0
-0.8

-5 -0.9

-1
-10
-1.1
20ps 40mv -15
0 0.5 1
horizontal mask size [UI]

(a) (b)
Sampling threshold [mV]

Sampling point [UI] Sampling point [UI]

(c) (d)

Figure 5.17: Comparing EOM and BERT operations: (a) 12.5Gb/s input eye (b) MER measurement
with EOM in presence of 10% digital error (c) BER measured with commercial BERT
(d) BER measured with commercial BERT in presence of 10% digital error

of digital errors intentionally added to the input. A 12.5Gb/s 231-1 PRBS is passed through
5 feet of lossy coaxial cable to introduce ISI and is then applied to the input of the EOM or
BERT. The eye diagram is closed and is shown in Figure 5.17(a). Figure 5.17(c) is the
response of the BERT with about 18dB BER dynamic range. However, when 10% digital
error is added to the input, the BERT cannot capture the eye diagram shape although the
channel has not changed. Evidently, from Figure 5.17(d) the BERT has lost its error
dynamic range completely. Figure 5.17(b) is the response of the EOM in the presence of
the digital errors and demonstrates that the EOM successfully captures the shape of the
123

eye diagram as in Figure 5.17(c). The MER dynamic range is reduced to about 8dB due to
the reasons discussed in Section 5.6.5.

5.8 Summary
We have developed an architecture that can essentially capture a two-dimensional map
of the eye diagram of a high-speed data signal. The error map can be used to extract
various features of the received signal. Specifically, it can be used in an adaptive equalizer
to generate the cost function for coefficient optimization. The cost function will solely
depend on the quality of the received signal and not on the decision of the receiver.

The architecture is based on comparing two samples of the signal at one sampling
point and therefore does not require a priori knowledge of data sequence or pattern
matching, which remarkably simplifies the architecture. A prototype was implemented in
0.13µm standard CMOS technology that was successfully tested up to 12.5Gb/s input data
rate. It consumes about 275mA from a 1.2V supply that is significantly lower than prior
art.
124

Chapter
Instantaneous
6 Demultiplexing for
Burst-Mode Links
6.1 Introduction
The wireline communication link has either continuous transmission or burst-mode
transmission. In burst-mode communication, data is transmitted in asynchronous packets
(a.k.a. bursts), and there are long variable-length intervals between packet transmissions
when the transmitter is off [36]. An example application for burst-mode links is in passive
optical networks, e.g., fiber-to-the-home, where multiple end users (EU) share an optical
channel to the central office (CO). The CO assigns a time slot to each EU and allows each
EU to upload burst-mode data in a designated time slot.

Burst-mode communication relies on very fast acquisition circuitry to achieve low


network latency. If the data stream is bursty, the receiver must be able to synchronize with
the data instantaneously to maintain reliable communication. For Burst-mode
communication, conventional clock recovery (CR) methods based on narrowband
phase-locked loops (PLLs), such as the ones designed for SONET applications, are not
applicable. PLL-based CR circuits in SONET have stringent jitter transfer specifications
to avoid jitter accumulation. In addition, they are required to tolerate long sequences of
identical bits (Section 2.2.2). These constraints impose a narrowband PLL that will have a
long acquisition time. For instance, [37] reports a minimum of 50µs acquisition time for a
155MHz CR circuit. In this case, approximately the first 8000 bits of data will be lost.
Although designing a wideband PLL reduces the number of lost bits, it still requires
preamble bits to enable synchronization prior to the arrival of the data payload.

Gated-oscillator clock recovery provides instantaneous lock to the first data transition.
Such circuits have been reported at several hundred Mb/s [128]–[132]. Gated-oscillator
125

clock recovery relies on two oscillators that are activated with the rising and falling edges
of the input signal and are, hence, resynchronized with every transition. The frequency of
the gated oscillators are equal to the bit rates and the right value is typically maintained via
another replica oscillator in a PLL.

In this chapter we introduce an alternative method for instantaneous data recovery and
demultiplexing based on a finite state machine (FSM). The FSM receives the data and
decides on the output values based on the current input data and the previous state. The
previous state is provided to the FSM input with a bit-period delay. While decisions are
synchronized with every incoming data transition, no oscillator is required. Although the
jitter transfer function is flat similar to the gated oscillator-based approach, there is a
reduction by a factor of n, the demultiplexing ratio, in output jitter due to the integrated
demultiplexer function.

We first introduce the new general architecture for an 1:n clockless demultiplexer and
discuss the complexity of the FSM for different demultiplexing ratios, n. Then, we
describe the design procedure for a 1:2 demultiplexer and discuss different possibilities for
implementing the delay cells. We perform a comprehensive statistical study on process
variations of passive delay cells and explore their feasibility for this application. Lastly,
we demonstrate the experimental results of the fabricated 1:2 demultiplexer prototype.

6.2 Instantaneous 1:n Demultiplexer


6.2.1 General Architecture

Figure 6.1 shows the general block diagram of the proposed clockless data recovery
and the 1:n demultiplexer. The FSM consists of a combinational-states logic block and
bit-period delay cells. It maps the combination of the input and previous state to a current
state. The previous state is the output of the FSM at the last bit period, Tb, and is fed back
to the input of the states logic block with a delay of Tb. The output logic block generates
126

Figure 6.1: Instantaneous 1:n demultiplexer

the demultiplexed outputs based on the current state. Both logic blocks respond to their
inputs instantaneously. Therefore, each data transition at the input immediately affects the
outputs of the demultiplexer, if the logic gate propagation delay is neglected.

In a conventional 1:n demultiplexer, each output changes based on its corresponding


bit at the input and then keeps its value for nTb periods. The combination of the FSM and
output logic operates similarly. Each input bit is directed to the proper output, and the
value of the other outputs are kept constant in the memory of the FSM state. Therefore, the
information stored in one state of the FSM consists of the value of current bit (1 bit), the
values of the unaffected bits (n-1 bits), and the binary address of the affected output (log2n
bits). As a result, for a 1:n demultiplexer a n+log2n bit FSM is required, as Figure 6.1
demonstrates. Consequently, the number of states in the FSM is n.2n.

The delay cell in Figure 6.1 guarantees that the information of the previous state is
available whenever a data transition occurs, i.e., every bit period. The output is updated
for every data transition, and thus there is no explicit jitter rejection. However, the input
data jitter at any transition impacts only one of the n outputs. Moreover, each output has
1/n the data rate of the input. Therefore, effective output data jitter in unit intervals (UI),
i.e., normalized to nTb, is reduced to 1/n of the input data jitter.
127

Figure 6.2: State diagram for a FSM-based 1:2 demultiplexer

Next, we demonstrate the design of a 1:2 demultiplexer based on the clockless data
recovery method.

6.2.2 Design of a 1:2 Demultiplexer

The major advantage of the clockless demultiplexer is that it responds to data


transitions instantaneously. After a long quiet period with zero input and zero outputs, the
first data transition initiates the state transitions of the FSM. Thereafter, the FSM state is
updated every period, Tb, synchronous to data transitions. If the delay is not exactly Tb,
any incoming data transition resynchronizes the FSM. The output of FSM will be valid as
soon as the data transition arrives at the input provided that the logic propagation delay is
negligible.

Figure 6.2 illustrates the states and state transitions of the 1:2 clockless demultiplexer.
Each arrow corresponds to a state transition in FSM. The binary value on the arrow
represents the current value of the input. For the 1:2 demultiplexer, the FSM has a total of
eight states, as shown in (b). The FSM stays in each state for a period of Tb. Then it
transitions to the next state based on the input bit. The prime superscript in the state name
is equivalent to the select line of a conventional demultiplexer. States with prime
superscript correspond to the ones for which the input bit affects out2. The first subscript
128

in the state name is the current input bit, and the second subscript is the previous input bit
stored to hold the unaffected output. For example, when FSM is in s1,0 it corresponds to
the state when the input bit, “1,” is transferred to out1 and stored previous bit, “0,” is
transferred to out2. If after Tb, a data transition occurs and the input bit is “0,” FSM
transitions to s’0,1, for which the input bit, “0,” is now transferred to out2 and stored
previous bit, “1,” is transferred to out1. Table 6.1 summarizes the two output values for all
the eight states.

Table 6.1: 1:2 Demultiplexer output in each state

Current Current
out1 out2 out1 out2
State State
s0,0 0 0 s0,1 0 1
s’0,0 0 0 s’0,1 1 0
s1,0 1 0 s1,1 1 1
s’1,0 0 1 s’1,1 1 1

A 3-bit FSM represents the state diagram in Figure 6.2. Each state is assigned a 3-bit
code word. To avoid races, i.e., erroneous transitions to other states, the codes are assigned
such that only one bit changes in every state transition. Therefore, delay mismatches in the
implementation cannot cause errors, and the FSM is race free. The code words are
presented in Table 6.2.

Table 6.2: Race-free code assignment for the states of the FSM

State Code (y0 y1y2) State Code (y0 y1y2)


s0,0 001 s0,1 100
s’0,0 000 s’0,1 011
s1,0 010 s1,1 111
s’1,0 101 s’1,1 110
129

We associate the binary variables y0, y1, and y2 to the three bits that code the FSM
states. Therefore, when y0, y1, and y2 are updated every Tb, we say FSM has transitioned
to the next state. The updated binary values for each of y0, y1, or y2 is determined from the
state diagram and the code word table based on the current values of y0, y1, or y2 and the
input bit. The next value of each of the three binary variables is described as

y i∗ = f i ( y 0, y 1, y 2, x ) ( i = 0, 1, 2 ) (6.1)

where “*” signifies the updated value of yi. Function fi is a logic function, and the argu-
ments are the current values of the binary variables. Variable x corresponds to the current
input bit. The logic functions are designed based on standard methods such as the sum of
products (SOP) using Karnaugh maps [133].

Similarly, out1 and out2 can be represented as functions of y0, y1, y2, and x. Based on
the concept of a conventional 1:2 demultiplexer we predict the logic function for the two
outputs as

out 1∗ = out 1 ⋅ S + x ⋅ S (6.2)

out 2∗ = out 2 ⋅ S + x ⋅ S (6.3)

where S is a binary function representing which output should change and is equivalent to
the select line of an ordinary demultiplexer. The “.” and “+” are logical AND and OR
functions, respectively. For instance, in (6.2), the next value for out1 is the current value of
out1 if S=1 and is the input if S=0. The change is synchronous with x.

From the Figure 6.2 and Table 6.2 we can show that

S = y0 ⋅ ( y1 ⊕ y2 ) + y0 ⋅ ( y1 ⊕ y2 ) (6.4)

where ⊕ is the “exclusive or” function. Additionally, we can show out1=y1 and out2=y0.
Therefore, the output logic block in Figure 6.1 is omitted and the outputs are tapped
directly from the FSM output variables. Hence, the simplified output functions are

out 1 = y 0 ⋅ ( y 1 ⊕ y 2 ) + x ⋅ ( y 0 + y 1 ⊕ y 2 ) (6.5)
130

out 2 = y 1 ⋅ ( y 0 ⊕ y 2 ) + x ⋅ ( y 1 + y 0 ⊕ y 2 ) . (6.6)

Equations (6.5) and (6.6) are computed directly from digital maps of Table 6.2 code
words. However, they have the form as in (6.2) and (6.3) and can be obtained directly by
replacing (6.4) in (6.2) or (6.3).

6.2.3 Cascade Architecture

For n>2, one approach to design the demultiplexer is to follow the same procedure as
in Section 6.2.2. The number of states increases exponentially with n. The state transition
table should also be updated. Furthermore, the number of binary variables that encode the
state increases. For instance, for n=4, the FSM has 64 states that needs 6 bits for encoding
the states. Consequently, the SOP terms require implementation of 6-input gates that are
all working at the speed of data rate. An alternative approach is to use a cascaded chain of
1:2 demultiplexers to implement the 1:n demultiplexer. For example, if n=4 each output of
the first 1:2 demultiplexer is used as the input to a second 1:2 demultiplexer. Therefore, a
total of three 1:2 demultiplexers is used. The latter approach has a total of nine delay cells
in contrast to six delay cells in the former approach. Therefore, if the delay cells are
implemented using passive elements, the latter approach will have area disadvantage.
However, the combinational logic design is much simpler because of fewer number of
variables. In addition, the required speed of operation for the 1:2 demultiplexers decreases
monotonically as they are placed closer to the outputs in the chain because the data rate at
the output of each stage is divided by two. Therefore, alternative digital gate circuit
topologies such as complementary-MOS can be used for 1:2 demultiplexers at the end of
the chain that can reduce the power consumption significantly.
131

(a)

(b)

Figure 6.3: Demultiplexer outputs for 1011000010 input sequence (a) Ideal case (b) Delay cell has
smaller delay than bit period

6.3 Delay Mismatch


The delay cell in Figure 6.1 is the synchronizing block in the demultiplexer that
controls how long the FSM stays in one state. When the first data transition initiates the
demultiplexer, FSM binary outputs, i.e., y0, y1, y2 in a 1:2 demultiplexer, are fed back to
the input of FSM every Tb and generate the next state and next outputs based on (6.1),
(6.5), and (6.6). Ideally, the delay must be equal to the input bit period Tb. Figure 6.3(a)
demonstrates how the demultiplexer operates for a sample input data. When one output is
following the input, the other output is holding its own previous value.
132

If the delay cell has a delay T b′ , different from Tb, the outputs might experience
glitches. However, any input transition will immediately correct those glitches and avoid
any unwanted output transition. Figure 6.3(b) illustrates an example where T b′ < T b . As
can be seen, out1 in the second bit period is holding its previous value, “1.” After T b′ new
values for y0, y1, y2 are ready at the FSM input while the correct input, x, corresponding to
out1 has not arrived yet. out1 starts to follow the incorrect x. However, after
∆t = T b – T b′ , the next data transition arrives and out1, out2, y0, y1, and y2 are
immediately corrected because they relate to x with a combinational-logic relation such as
(6.5) or (6.6). Although glitches are observed at the outputs as a consequence, delay
mismatch is corrected every cycle and does not accumulate when data transitions occur.

A behavioral simulation of the demultiplexer in HSPICE confirms the above


argument. T b′ is chosen 125ps while input data is 7.5Gb/s, which corresponds to
Tb=133ps. In Figure 6.4, the marked bumps on out1 and out2 correspond to the same
glitches discussed in Figure 6.3(b). As predicted, the next data transition immediately

out1

out2
y2

10.5 12 13.5
t[ns]

Figure 6.4: Outputs of the 1:2 demultiplexer when T b′ < T b simulated with HSPICE
133

corrects such errors. The other weaker bumps in the outputs are related to hazards that
occur when two or more terms in the SOP of output function are changing simultaneously,
while the overall output function remains at the same logic level.

The delay mismatch bounds the maximum number of consecutive identical bits (CIB)
at the input. If delay mismatch is ∆t, as in Figure 6.3(b), the number of CIB’s should be
fewer than n = T b ⁄ ∆t . Otherwise, the FSM will swap the outputs, and the (n+1)th bit
will be resolved at the incorrect output.

∆t can be made very small by using a delay control circuit that forces ∆t to zero. A ring
oscillator is formed by closing a positive feedback loop around a replica of the delay cells.
The period of oscillation equals twice the delay of the delay cells, 2T b′ . A PLL locks the
frequency of the ring oscillator to an accurate reference clock by tuning the replica delay
cell. The same control voltage is used to adjust the delays in the FSM. In practice, the
logic circuits in the FSM have propagation delays that contribute to the total delay around
the feedback loop of the FSM. Furthermore, process variations could significantly impact
the propagation delay of the gates. Therefore, a delay control loop that adjusts the delay
cells alone would be inconsequential. The replica ring oscillator must include all the
blocks that contribute to the delay.

For the 1:2 demultiplexer, it can be shown that in the absence of input transitions, f2
from (6.1) is simplified to

y 2∗ = y 2 . (6.7)

Equation (6.7) shows the y2 output inverts for every period delay around the FSM loop,
T b′ . In other words, the y2 output oscillates with the period of 2T b′ when the input is con-
stant. In fact, y2 acts as the internal timer of the FSM in the absence of input transitions.
The y2 output in Figure 6.4 demonstrates this. When input is zero, y2 oscillates for four
cycles. As soon as the data transition arrives in the fifth cycle, the y2 phase is aligned and
the oscillation stops. If there were no additional input transitions y2 would oscillate again
with corrected phase. Now, the replica of the 1:2 demultiplexer with y2 as the output forms
134

1:2 demultiplexer FSM

(b)

replica FSM as
ring oscillator

Figure 6.5: Demultiplexer with delay control loop

the ring oscillator in the delay control loop. This ring oscillator includes all the digital
blocks that contribute to the delay. Figure 6.5 illustrates the architecture. The same archi-
tecture could be used to design a variable bit rate demultiplexer by adjusting the delay
around the loop based on the input bit rate.

6.4 Delay Implementation


6.4.1 Passive Delay

The delay block can be implemented using active or passive delay elements. If the
delay control loop is not used, passive delay cells based on LC ladder structures can be
used, as shown in Figure 6.6 (also in Section 2.3.4). The delay is determined by the value
of the passive components from T D = n LC where L and C are the inductance and
capacitance, respectively, and n is the number of sections in the ladder. Integrated LC
delay lines have practically no sensitivity to the supply voltage while maintaining a low
sensitivity to process variations and temperature. This is because the L and C component
135

L L L

C/2 C C C/2

Figure 6.6: 3-section constant-k filter-based passive LC delay line

values are primarily determined by high-accuracy fabrication processes, and they will not
vary after fabrication is complete. In contrast the delay of active delay cells is a strong
function of temperature.

In addition, it has been shown [134] that using building blocks that depend only on the
lateral dimensions, such as vertical parallel plate (VPP) capacitors, one can achieve even a
tighter tolerance and better matching across the chip, wafer, and process lots for the
capacitance value. If VPP capacitors and spiral inductors are used to implement the delay
cell, the delay value will only depend on lateral dimensions of components. Lateral
dimensions are defined by lithography and etching processes that have inherently higher
accuracy than process steps such as deposition and planarization that control the vertical
dimensions. We will present a statistical analysis of passive delay lines that were used to
implement the 1:2 demultiplexer prototype.

Constant-k LC ladder structures consist of identical interconnected inductors and


capacitors in a ladder form, as shown in Figure 6.6. The ladder is a lumped approximation
of transmission line and, hence, can be used as a delay line. It can be shown that the delay
of the structure is approximately

T D = n LC (6.8)

where n is the number of LC sections. Using spiral inductors and high-density VPP or
MIM capacitors one can obtain large delay values. Using the image impedance tech-
niques, we can calculate the impedance of the line to be
136

L L L

C/4 C/2 C/2 C/4

L L L
one section

Figure 6.7: 3-section differential constant-k filter-based delay line

2 2
L LCω LCω
Z( ω) = ---- ⋅ 1 – -------------- = Z 0 ⋅ 1 – -------------- (6.9)
C 4 4
where Z 0 = L ⁄ C is the characteristic impedance of the line [50]. As can be seen, the
impedance becomes imaginary for frequencies above a critical frequency given by

2
ω c = ----------- . (6.10)
LC
The LC delay line can be designed in a differential form as shown in Figure 6.7. In
such circuits, the differential inductors can be interwound in order to benefit from the
mutual inductance of the two inductors. Therefore, larger value inductances will be
achievable with the same (or even smaller) area/size. It can be shown that if two equal
differential inductors with value L are interwound with mutual inductance of k (with
proper sign), the effective inductance value for each will be (1+k)L. We have taken
advantage of this fact in our implementation of the delay lines. In the next two sections we
measure several integrated passive delay lines and analyze the experimental results to
study the process variation of passive delay lines

6.4.2 LC Delay Line Implementation

Two sets of LC delay lines are implemented in the form of differential constant-k
filters in a 5-metal SiGe BiCMOS process in two different process runs. We will refer to
these two process runs by PR1 and PR2. The differential inductors are implemented using
coupled inductors and have 1.25 interwound turns in the top metal. Figure 6.8 shows the
137

Figure 6.8: Differential symmetric interwound inductors for one section of the delay line

symmetric layout of the inductors. Inductors are simulated using a 2.5D electromagnetic
simulator.

The first set of delay lines use MIM capacitors and consist of 24 LC sections in PR1.
In the second set, the VPP capacitors are used instead of the MIMs. It has 19 LC sections
and was fabricated in PR2. Based on our earlier discussion, we expect this VPP-based
delay line to show smaller delay variations compared to its MIM-based counterpart. In
VPP capacitors, the distance of the adjacent parallel plates of the capacitors are chosen to
be larger than the minimum allowable spacing between adjacent metals to reduce the
effect of lateral surface roughness on the capacitor value. The increased fringe capacitance
is modelled accurately with electromagnetic simulations. Table 6.3 summarizes the delay
line parameters.

Table 6.3: Summary of the delay line parameters

Delay Line Parameter Value


Effective Inductance per section 0.58 nH
Total Capacitance per section 230 fF
Characteristic Impedance 50 Ω (100 Ω differential)
Total Simulated Passive Delay per section 11.5 ps
Ideal Critical Frequency (ωc/2π from (6.10)) 28 GHz
138

6.4.3 Experimental Results and Analysis

Standalone delay structures using MIMs and VPPs with direct on-wafer probing were
tested. The results are summarized in the following sub-sections.

6.4.3.1 Measurement Accuracy and Repeatability

Twenty-seven MIM-based delay lines in PR1 and 47 VPP-based delay lines in PR2
were characterized using an Agilent Technologies E8364A network analyzer. To ensure
constant environmental conditions (including temperature and measurement setup
variations) during the measurement of all 74 sites, a set of preliminary experiments was
performed. Six random sites were selected as witness cases and were measured three times
each at different times during the measurement. Then, the results for each site were
compared. The observed variations were always less than 0.05% indicating the
measurement error and the degree of its repeatability. This very high repeatability of
results indicates minimum changes in the conditions of the experiments.

6.4.3.2 S-parameters

Magnitude of S11 and S21 parameters of MIM-based and VPP-based delay lines were
measured. A sample result for a MIM-based delay line, plotted in Figure 6.9, shows S11 <
-12dB (upto 30 GHz). Similar measurements for VPP-based delay lines show S11 < -16dB

0 S21
S11
-10
Amplitude [dB]

-20
-30

-40
-50
0.1 1 10 100
f [GHz]
Figure 6.9: Magnitude of the S-parameters of one MIM-based standalone delay line
139

1.5GHz
( ≤ f ≤ 20GH). They indicate that the delay line characteristic impedances are very
close to 50 Ω over that wide range of frequencies. Τhe low frequency loss of MIM-based
delay line is 1.2 dB and its 3dB bandwidth is 7.5 GHz.

6.4.3.3 Standalone Delay Lines: Group Delay

The group delay is an indication of the delay value of the delay line at different
frequencies. The group delays of the whole ensemble for both MIM-based and VPP-based
lines are plotted in Figure 6.10 and Figure 6.11, respectively. The dominant source of
variations over different wafer sites for samples in MIM-based lines is the tolerance of
2
MIM capacitors. The reported MIM tolerance in this process technology is± 0.15fF ⁄ µm . It
translates to a total tolerance of ∆C=18.8 fF for the MIMs that we used. The time delay
variations per section can be approximated from (6.8)

∂T D
∆T D = ⋅ ∆C (6.11)
∂C

∆T D
----------- = 1--- ⋅ ∆C
-------- = 0.04 . (6.12)
TD 2 C
group delay[s]

n
20
15
10
0 5 f[GHz]

Figure 6.10: Collective group delays of 27 standalone MIM-based delay lines


140

group delay[s]

n
20
10 15
0 5
f[GHz]

Figure 6.11: Collective group delays of 47 standalone VPP-based delay lines

The normalized standard deviations of group delay (normalized to the mean group
delay at corresponding frequency) for MIM-based and VPP-based lines are plotted in
Figure 6.12. The variations for MIM-based lines are within the tolerance of the MIM
capacitors in (6.12). The delay lines with VPP capacitors are almost twice as accurate
across most of the frequency range. This corresponds to a factor of 3.3 improved tolerance
for the VPPs in agreement with [134]. Table 6.4 compares the average low-frequency
Normalized Standard Deviation

4%

3%
MIM-based line
2% VPP-based line

1%

f [Hz]
Figure 6.12: Normalized standard deviations for group delays of standalone delay lines
141

group delays and the average normalized standard deviations of that in both cases. Again,
it can be seen that the VPP-based delay lines are almost twice as accurate. Figure 6.13
shows the distribution of normalized delay at 1 GHz for both MIM- and VPP-based delay
lines. Passive LC delay lines are low sensitivity to process variations and no sensitivity to
supply variations.
Table 6.4: Statistical comparison for MIM and VPP-based lines

Standard
Parameter Mean (η) σ/η
Deviation (σ)
MIM low freq. group delay 56.7 ps 0.572 ps 1.01%
VPP low freq. group delay 52.14 ps 0.306 ps 0.59%

The die photos of the VPP-based line and MIM-based line are shown in Figure 6.14
and Figure 6.15, respectively. The passive delay lines are dominating the area. The spiral
inductors are formed in a loop in the oscillator to avoid long interconnect lines. The
capacitors are located in between the inductors. Inductor size is 150µm × 150µm .

MIM-based delay line


Number of occurrence

VPP-based delay line

98% 99% 100% 101% 102%


Normalized Delay
Figure 6.13: Distributions of normalized delay at 1GHz for both MIM and VPP-based delay lines

Figure 6.14: Die photo of 19-section VPP-based LC delay line


142

Figure 6.15: Die photo of 24-section MIM-based LC delay line

6.5 Prototype Measurement Results


We fabricated an integrated 1:2 demultiplexer based on the instantaneous clockless
architecture in the SiGe BiCMOS process technology, which is the same technology we
used to analyze process variations of the delay lines. SOP logic functions form the FSM as
described in Section 6.2.2. The logic functions are realized by emitter-coupled logic
(ECL) gates. Figure 6.16 shows a 3-input OR gate. When any one of the three inputs is at
high logic level, all the tail current, It, runs in the left branch of the emitter-coupled stage,

Figure 6.16: A three-input ECL OR gate


143

test structure

test structure delay line core logic delay line

delay line
buffer buffer

buffer
Figure 6.17: Die microphotograph of the 1:2 demultiplexer with three 5-section differential LC delay
line

which forces the out to high logic level. The AND gates are implemented using OR gates
by inverting the inputs and output. A 5-section differential LC delay line is implemented
for each of the delay cells in the feedback loop of binary functions y0, y1, and y2. The die
photograph is shown in Figure 6.17. Chip dimensions are 2.5mm × 1.7mm .The core logic
occupies only 11% of the total die area.

Figure 6.18 is the y2 output of the demultiplexer when the input is a constant “1.” As
mentioned, y2 oscillates with a period equal to twice the total delay around the FSM
feedback loop, 266ps. Therefore, the delay (bit period) is 133ps, and the demultiplexer
70mV

100ps
266ps

Figure 6.18: The y2 output in the oscillator mode


144

out1
out2

272ps 272ps
(a)
50mV

200ps

out1

274ps 822ps
(b)
out2
50mV

500ps

out1
274ps

out2
(c)
50mV

500ps

Figure 6.19: Demultiplexer outputs out1 and out2 for 3 input sequences (a)1100 (b)10000000 (c)
1000000010001000

works at input bit rate of 7.5 Gb/s. About 55% of the total delay is generated by the
passive delay lines and the rest is from the ECL gates and interconnect parasitics.

The two outputs of the demultiplexer, out1 and out2, are measured for three different
input sequences, as shown in Figure 6.19. The “sync” signal from the input signal source
145

is used to trigger the sampling oscilloscope for viewing the outputs and justifies that the
outputs are synchronized with the input. When the input is a repeating “1100” sequence,
the demultiplexed outputs should both be “10” sequence repeating at half the bit rate, i.e.,
twice the bit period, of the input. In addition, out2 is one input bit period delayed with
respect to out1. Figure 6.19(a) shows these outputs.

A data transition can experience different delays when propagating through an


LC-ladder delay cell, based on previous data bits [18]. Delay is defined as the time
difference of the threshold-crossing times of the data transition, e.g., a “10” falling
transition, at the input and output. The threshold-crossing time is affected by the bits
arriving at the delay cell prior to the transition due to the memory of the delay cell
[18][20]. For example, “010” and “110” sequences will have different threshold-crossing
times at the output of the delay cell. In the latter sequence the residue of the last bit before
the falling transition impacts the threshold-crossing time. This data-dependent delay
affects the overall delay in the FSM feedback loop. There are three delay cells for y0, y1,
and y2 in the FSM and the input to each depends on the FSM input sequence. Thus each of
the individual delay values may vary based on the input sequence. Furthermore, the
sequences to the input of the delay cells are not necessarily the same. We define the FSM
loop delay as the average of the three delays, while this average depends on the FSM input
sequence. For instance, when the FSM input is a constant “1,” the three delay cells for y0,
y1, and y2 respectively see a constant “1,” a constant “1,” and “10” sequences at their
input; whereas for a “1100” at the input the delay cells respectively experience “1100,”
“1100,” and “10” sequences. The inputs to the y0 and y1 have changed. Consequently, the
average loop delay changes. The two cases are shown in Figure 6.18 and Figure 6.19(a),
where average loop delay is changing from 133ps (half of 266ps) in the former to 136ps
(half of 272ps) in the latter. One way to avoid data-dependent delay values is to use
nonlinear (digital) delay cells as opposed to linear LC-delay cells.
146

Figure 6.19(b) shows the outputs when the input sequence is “10000000,” which has
seven consecutive zeros. The two outputs are respectively “1000” at half bit rate and
all-zero. The droop for long sequences of one is due to ac-coupled outputs, which will not
be present in a dc-coupled version. A longer sequence, i.e., 16 bit, is tested in
Figure 6.19(c). The input sequence, “1000000010001000,” results in the outputs
“10001010” at half the bit rate and all-zero. The demultiplexer is locking to the input
phase and correctly demultiplexes to two outputs without using a synchronous clock. The
chip is using a 3.3V power supply and draws 316mA of current, of which 110mA is
flowing in the output buffers and bias circuits.

6.6 Summary
We introduced a new architecture that instantaneously recovers and demultiplexes data
without explicit clock recovery. The architecture is based on a finite state machine (FSM)
that assigns input to a proper output and maintains the value of other outputs. State
transitions are synchronized with the arrival of input-data transitions. Binary logic
functions map the current state along with the input bit to the next state. Analog delay cells
with bit-period delay feedback the value of the binary functions to the input and
synchronize FSM with the input data.

One approach to implement the delay cells in the architecture is to use passive delay
cells that are low sensitive to process and temperature variations. We performed an
experimental statistical analysis on passive delay lines to demonstrate this fact. We then
showed the measurement results of a prototype 1:2 demultiplexer based on integrated
passive LC delay lines that operates at 7.5Gb/s without using a clock signal.
147

Chapter
Conclusion
7
7.1 Thesis Highlights
In this work we explored the analysis and design of wireline communication systems
and focused on the basic challenges of high-speed wireline links. We bridged the gap
between system design and circuit design by: (1) understanding the relationship between
the wireline link reliability and the system parameters, (2) introducing practical circuit
architectures that enable realization of such systems with the required parameters, and (3)
demonstrating implementation of hardware prototypes using silicon-based integrated
circuit technologies that verify our solutions.

First, we provided the principles of today’s high-speed communication links. We


developed a theory that analytically relates the data link reliability, i.e., the BER, to the
system level characteristics, e.g., the channel response, the pre-amplifier bandwidth, the
amplitude noise, and the transmitter clock jitter. In particular, we used the BER contours
to find the optimum system bandwidth that is a crucial specification for the pre-amplifier
in the receiver architecture. In addition, we determined the optimum sampling point and
its associated timing margin that is an important design specification for the clock
recovery circuit. We also developed the theory of the data-dependent jitter (DDJ), which is
a significant component of the timing jitter in high-speed links because of the system
bandwidth limitations. We provided an analytical distribution function for the DDJ of an
arbitrary linear time-invariant system and included the impact of the DDJ in the data link’s
overall BER.

Second, we proposed a method for bandwidth enhancement of wideband amplifiers.


This is useful for the pre-amplifier design in the high-speed links using technologies that
148

suffer from large parasitic components and thus small maximum frequency of operation.
This methodology was based on two-port broadband matching of multistage amplifiers.
We absorbed the device parasitics into the passive matching networks in order to allow
each amplifier stage to achieve its theoretical maximum gain-bandwidth product set by the
Bode-Fano limit [26][27]. We demonstrated a CMOS 0.18µm amplifier that operates at
10Gb/s and achieves 2.4 times the bandwidth improvement over a design that does not
apply our technique.

Third, we developed a novel eye-opening monitor architecture that enables full


integration of adaptive equalizers, for compensation of channel-induced impairments such
as the inter-symbol interference. The eye-opening monitor circuit (EOM) is a block that
evaluates the quality of the received signal eye diagram and periodically reports a
quantitative measure that is correlated to the signal quality. This output can be used as a
cost function for automatic adjustment of the filter coefficients in an adaptive equalizer.
Our proposed EOM can effectively capture a two-dimensional image of the eye diagram
shape and thus can be used for general signal integrity evaluation. The simple error
detection mechanism of the EOM can be implemented at very high speed. We
demonstrated a prototype implemented in 0.13µm CMOS that was successfully tested up
to 12.5Gb/s and provides up to 68dB output error dynamic range.

Finally, we introduced a novel architecture for instantaneous clockless demultiplexing.


Instantaneous data acquisition is required in burst-mode communication systems, where
the data stream arrives at the receiver in asynchronous packets separated by unknown
quiet intervals. The conventional narrowband phase-locked loops require a long preamble
with large acquisition time and are therefore not suitable. As an alternative to gated
oscillators that require a full-rate clock for operation, we proposed a clockless finite state
machine that recovers and demultiplexes the received burst of data instantaneously. The
architecture consists of a combinational logic structure with immediate response and a
bit-period-delayed feedback loop. Therefore, every time a burst is received, the operation
149

is initiated exactly in-phase with the first bit and continues synchronously to the stream.
We implemented a 1:2 clockless demultiplexer based on this concept in SiGe BiCMOS
technology and verified its operation at 7.5Gb/s.

7.2 Directions for Future Work


We have investigated the underlying principles of high-speed wireline
communications and have developed a general relationship between the data link
reliability and the system parameters. One direction that can be pursued for future research
to expand this work is to include other impairments in high-speed link and to complete the
model for calculation of the BER. The most significant effect that we neglect in this thesis
is nonlinearity of the channel or the transceiver. For instance, in long-haul optical
communication, the deployment of optical amplifiers for boosting the optical power may
force the fiber into its nonlinear regime that will affect the signal integrity. Furthermore,
some of the receiver blocks, and particularly the main amplifier that succeeds the
pre-amplifier, is typically a high-gain nonlinear block for limiting the signal amplitude.
Therefore, it should be modelled by a slew rate-limited system as opposed to an LTI
system. The impact of such systems on the ISI and timing jitter and ultimately the way
they affect the BER is an important and interesting subject for future research that expands
the results of this thesis.

This work relates the time response of an arbitrary LTI system to the ISI that results
from that system. Therefore, the analytical results for calculating the BER, in general, and
the data-dependent jitter, in particular, are based on the time response of the system. From
a circuit design perspective, it is also interesting and useful to derive such analytical
results for the frequency response, i.e., both the amplitude response and the phase
response, of the system. The relationship between the circuit parameters and the frequency
response are conventionally studied more rigorously and are better documented, e.g., the
theory of poles and zeros or the tables of filter design. For instance, the design of an
150

amplifier is more straightforward if the specifications are presented in terms of the


maximum ripple in the amplitude and group delay response, as opposed to specifying the
maximum values for the samples of the pulse response.

Two important topics related to the proposed eye-opening monitor (EOM) circuit that
require further investigations are the optimization algorithm and the loop dynamics of an
adaptive equalizer that uses the EOM. The algorithm can be chosen freely because it is
separate from the hardware of the filter. However, convergence speed is a practical
constraint that may eliminate probabilistic algorithms such as the genetic algorithm or the
simulated annealing algorithm. On the other hand, the cost function, i.e, the EOM output,
does not have a known direct relationship with the optimizing parameters, i.e., the filter
coefficients. Therefore, application of gradient descent-based algorithms is not straight
forward. As a result, understanding the trade-offs for the choice of the algorithm is a topic
of interest for future research in this area.

Understanding the loop dynamics of the adaptive equalizer is important for


guaranteeing convergence and controlling the speed of convergence. The EOM is a
nonlinear system and its impact on the loop dynamics should be studied carefully.
Particularly, the required mask error rate dynamic range, which determines the integration
time for each mask, and the number of masks, which determines the resolution of the
mask error rate, are among the parameters whose relationships to the loop dynamics
should be investigated.

Possible future directions for enhancing the design of the proposed instantaneous
demultiplexer include targeting accurate delay implementation and low-power design.
Absolute value of the feedback loop delay is the only parameter that determines the
operating bit rate of the demultiplexer. The loop delay includes the delay of the feedback
delay cells, the propagation delay of the combinational logic, and the delay of the
interconnect parasitic components. The optimum way to adjust the loop delay and
accurately control the delay value is to form a delay reference loop by a replica of the
151

demultiplexer that is configured as a ring oscillator. The oscillating frequency of this ring
oscillator is determined by the same parameters that set the delay value of the loop.
Consequently, if the frequency is adjusted accurately in the delay reference loop, the
feedback delay will be determined with the same accuracy. Finally, the demultiplexer
design was not optimized for minimum power consumption. The power consumption also
affects the heat generated by the demultiplexer chip that sets the local temperature around
the active devices and impacts their propagation delay. Therefore, controlling the power
consumption can improve the operation of the block.

All in all, the thesis provides insight and develops useful tools and techniques for
designing high-speed wireline communication systems using integrated circuit
technologies.
152

Appendix
Overall BER
A Calculation
In this appendix we calculate the overall BER when both timing jitter and ISI are
present. To calculate the probability of error for the current bit that is being sampled, we
have to take into account the value of the next bit and the previous bit. This is in order to
add the impact of the timing jitter of the transitions before or after the current sampling on
the BER. Therefore, we should consider 3-bit sequences, where the middle bit is the one
being sampled. Out of the eight possibilities, we only need to calculate the BER for four
sequences of “000,” “001,” “100,” and “101.” Each of the other four cases, where the
middle bit is “1,” equals one of the sequences with “0” as the middle bit because of the
symmetry and is thus found automatically. Therefore, we have

1
BER ( T s ) = --- [ BER ( T s ″000″ ) + BER ( T s ″001″ ) + BER ( T s ″100″ ) + BER ( T s ″101″ ) ] . (A.1)
4

The error for the first term on the right is caused only by the ISI and noise because there is
no transition. Hence, we can write

BER ( T s ″000″ ) = BER ( ISI 0 ( T s ) ) . (A.2)

In the second sequence, “001,” one transition occurs to the next bit. Because of the
timing jitter the transition can occur before or after the sampling point, Ts. If the transition
occurs after the sampling point, the BER is not affected by it because we assume the
system is causal. On the other hand, if the transition takes place before the sampling point,
the receiver samples ISI0(Ts)+s(Ts-tR). The random variable tR denotes the location of the
transition on the right of the current bit. It has a mean value equal to Tb. Therefore, the
overall BER for the “001” sequence can be found as a conditional probability conditioned
on tR as
153


⎪ BER ( ISI 0 ( T s ) ) tR ≥ Ts
BER ( T s ″001″ ) = ⎨ . (A.3)
⎪ BER ( ISI 0 ( T s ) + s ( T s – t R ) ) t R < T s

If we assume tR has the probability distribution function ft(tR), we have

∞ Ts

BER ( T s ″001″ ) = BER ( ISI 0 ( T s ) ) ∫ f t ( tR ) dt R + ∫ ft ( tR ) ⋅ BER ( ISI0 ( Ts ) + s ( T s – tR ) ) dtR . (A.4)


Ts –∞

We can calculate the BER for the “100” sequence similarly. However, the location of
the transition only affects the BER if it occurs before the sampling point, because it
changes the amount of the ISI1 at the sampling point. We have


⎪ BER ( ISI 1 ( T s – tL ) ) t L < T s
BER ( T s ″100″ ) = ⎨ (A.5)
⎪ BER ( ISI 1 ( T s ) + s ( T s ) ) t L ≥ T s

∞ Ts

BER ( T s ″100″ ) = BER ( ISI 1 ( T s ) + s ( T s ) ) ∫ f t ( t L ) dtL + ∫ ft ( tL ) ⋅ BER ( ISI1 ( Ts – tL ) ) dt . (A.6)


Ts –∞

For the “101” sequence we have both the left and right transitions. However, the right
transition does not impact the BER if tR>Ts because the system is causal. In that case, the
BER is equivalent to the BER for the “100” sequence. On the other hand, if tR<Ts, we also
implicitly know that tL<Ts. We can write


⎪ BER ( T s ″100″ ) t R > Ts
BER ( T s ″101″ ) = ⎨ . (A.7)
⎪ BER ( T s ″100″, t L < t R < T s ) t R < T s

Therefore, we have

∞ Ts T
⎛ s ⎞
BER ( T s ″101″ ) = BER ( T s ″100″ ) ∫ ft ( t R ) dtR + ∫ f t ( t R ) ⋅ ⎜ ∫ ft ( t L ) ⋅ BER ( ISI1 ( T s – t L ) + s ( T s – t R ) ) dtL⎟ dtR (A.8)
⎜ ⎟
T –∞ ⎝ –∞ ⎠
s

We assume the timing jitter distribution is Gaussian with means of zero and Tb, for tL
and tR, respectively, and standard deviation of σj. In addition, we assume the noise
distribution is Gaussian with zero mean and standard deviation σn. Therefore, all the BER
154

terms in the above equations will be in the form of a Q(.) function, where Q(.) is the
cumulative distribution function of the Gaussian distribution. We can approximate some
of the BER terms in (A.4), (A.6), and (A.8) by one. This applies to all the terms where the
argument of the BER, i.e., the argument of the Q(.), is large due to the effect of the step
response, e.g., in BER(ISI0(Ts)+s(Ts-tR)). Then, we estimate the overall BER by replacing
(A.2), (A.4), (A.6), and (A.8) in (A.1). We get

BER ( T s )=
Ts
⎛ 0.5 – ISI1 ( T s – tL )⎞ ⎞⎟ ⎛ . (A.9)
1 ⎜ ⎛ 0.5 – ISI 0 ( T s )⎞ T s – T b⎞ ⎞ T T b – T s⎞
--- Q ---------------------------------- + ∫ f t ( t L ) ⋅ Q ⎛ -----------------------------------------
- dt L ⋅ 1 + Q ⎛ ----------------
- + Q ⎛ -----s⎞ + Q ⎛ ----------------
-
4 ⎜ ⎝ σn ⎠ ⎝ σn ⎠ ⎟ ⎝ ⎝ σj ⎠ ⎠ ⎝ σj ⎠ ⎝ σj ⎠
⎝ –∞ ⎠

We have also neglected all the second-order terms that include products of two Q(.)
functions.

In reality, the total jitter distribution should also include the effect of the DDJ, as we
discussed in Chapter 3. Here, we investigate how the DDJ affects each of the terms in
(A.1). The DDJ does not have any impact on the BER(Ts| “000”) because “000” has no
transitions. In the “001” sequence, there is a “01” transition with a “0” as the penultimate
bit. Therefore, ft(tR) should be modified to a Gaussian with the mean of Tb+tc,0. The tc,0 is
defined in (3.22) and is calculated in Appendix B.

We do not have the knowledge of the penultimate bit for the transition in “100,” in
contrast to the “001” case. Therefore, to calculate BER(Ts| “100”), the previously Gaussian
distribution for ft(tL) should be modified by convolving it with a double Dirac delta
function DDJ distribution. Finally, both the ft(tR) and ft(tL) should be modified to calculate
BER(Ts| “101”). The ft(tR) distribution becomes a Gaussian with the mean of Tb+tc,1,
because the penultimate bit to the “01” transition is now “1.” The tc,1 is defined in (3.23)
and is calculated in Appendix B. The ft(tL) distribution convolves with a double Dirac
delta function DDJ distribution. Therefore, the resulting overall BER is
155

BER ( T s )=

1 ⎧ ⎛ ⎛ 0.5 – ISI 0 ( T s )⎞ ⎛ T s – T b – t c, 0 T b + t c, 0 – T s
--- ⎨ Q ---------------------------------- ⋅ 1 + Q ⎛ -------------------------------⎞ ⎞ + Q ⎛ -------------------------------
-⎞
4⎩⎝ ⎝ σn ⎠ ⎝ ⎝ σj ⎠⎠ ⎝ σj ⎠
.(A.10)
Ts ⎫
0.5 – ISI 1 ( T s – tL ) T s – T b – t c, 1 T s – t c, 0 T s – t c, 1 ⎪
⎛ ----------------------------------------- ⎞ dt ⋅ ⎛ 1 + Q ⎛ -------------------------------⎞ ⎞ + 1--- Q ⎛ ---------------- ⎞ + 1--- Q ⎛ ---------------- ⎞
+ ∫ t L ⎝
f ( t ) ⋅ Q
σn
-
⎠ L ⎝ ⎝ σj ⎠ ⎠ 2 ⎝ σj ⎠ 2 ⎝ σj ⎠ ⎬
- -

–∞ ⎭
156

Appendix
Threshold-
B Crossing Time
In this appendix we calculate tc,0 and tc,1, defined in (3.22) and (3.23), for a first-order
system. We can show

t 0 = τ ⋅ ln 2 . (B.1)

We also have

–2
⎛ – k⎞
∆t a = 0 = – τ ⋅ ln ⎜⎜ 1 – ( 1 – α ) ∑
a k' ⋅ α ⎟
⎟ (B.2)
–2
⎝ k' = – ∞ ⎠

–2
⎛ ⎞
(1 – α) 2
∆t a = 1 = – τ ⋅ ln ⎜⎜ 1 – ----------------- α + α
α

a k' ⋅ α
–k


–2
⎝ k' = – ∞ ⎠
–2
⎛ – k⎞
= – τ ⋅ ln ⎜⎜ 1 – α + α – ( 1 – α )
2

a k' ⋅ α ⎟

(B.3)
⎝ k' = – ∞ ⎠
–2
⎛ – k⎞
2 ⎜ ⎛ 1 – α ⎞
= – τ ⋅ ln ( 1 – α + α ) + ( – τ ) ⋅ ln ⎜ 1 – ⎝ -----------------------⎠
2
∑ a k' ⋅ α ⎟
⎟ .
⎝ 1 – α + α k' = – ∞ ⎠

We define a new discrete random variable Φ as follows

–2 –3

∑ ∑
–k 2 –k
Φ≡ ak ⋅ α = a–2 ⋅ α + ak ⋅ α . (B.4)
k = –∞ k = –∞

After reorganizing the terms in the sum and renumbering the indices we have
157

–2


2 –k
Φ = a–2 ⋅ α + α a k' ⋅ α (B.5)
k' = – ∞

where k' = k + 1 . As the ak’s are independent identically distributed (iid) random vari-
ables, the sum in the second term on the right is, by definition in (B.4), a random variable
with identical statistical properties to Φ. Specifically, all the statistical moments are equal
for the two random variables. If we denote this new random variable by Φ' , we have

2
Φ = a –2 ⋅ α + αΦ' . (B.6)

Also, note that Φ' and a-2 are independent random variables. Now we can write

⎧ 2 ⎫ 2
E { Φ } = E ⎨ a – 2 ⋅ α + αΦ' ⎬ = α E { a – 2 } + αE { Φ' } . (B.7)
⎩ ⎭

We know that E { Φ } = E { Φ' } . We also assume

p ( a k = 1 ) = p ( a k = 0 ) = 0.5 . (B.8)

Then, we have

E { a – 2 } = 1 ⁄ 2 × 0 + 1 ⁄ 2 × 1 = 1 ⁄ 2. (B.9)

Replacing into (B.7) we will get

2
1 α
E { Φ } = --- ⋅ ------------ . (B.10)
2 1–α
The second-order moment can be calculated from (B.6) as follows

2 4 3 2 2
E { Φ } = α ⋅ m 2 + 2α m 1 E { Φ } + α E { Φ } (B.11)

4
2 1 4 3 0.5α
E { Φ } = --------------- ( α ⋅ m 2 + 2α m 1 E { Φ } ) = -------------------------------------- (B.12)
2 2
1–α (1 – α )(1 – α )
in which mi is the ith order moment of a-2 and E{Φ} is known from (B.10). It is easy to
show mi=1/2 for all i. Similarly, the kth order moment can be written as follows
158

k–1
⎛ k ⎞ 2k – i

0.5 i
k
E { Φ } = --------------- ⎜ ⎟α E{Φ } . (B.13)
1–α
k ⎝ i ⎠
i=0

This gives a recursive expression based on lower-order moments. Now, we can calculate

⎧ ⎫
E ⎨ ∆t a = 0 ⎬ = E { – τ ⋅ ln ( 1 – ( 1 – α ) Φ ) } (B.14)
⎩ –2 ⎭

⎧ ⎫ ⎧ 1–α ⎫
= – τ ⋅ ln ( 1 – α + α ) + E ⎨ ln ⎛ 1 – ⎛ -----------------------⎞ Φ⎞ ⎬ .
2
E ⎨ ∆t
a–2 = 1 ⎬
(B.15)
⎝ ⎝ 2⎠ ⎠
⎩ ⎭ ⎩ 1 –α + α ⎭
2
α
From (B.4) we know Φ ≤ ------------ where the maximum occurs when all ak’s are “1.” In addi-
1–α
tion, Tb/τ is positive and so α ≤ 1 . Therefore,

2
α 2
( 1 – α )Φ ≤ ------------ ⋅ ( 1 – α ) = α ≤ 1 . (B.16)
1–α
Similarly,

2 2
1 – α -⎞
⎛ ---------------------- α 1–α α
Φ ≤ ------------ ⋅ ⎛ -----------------------⎞ = ----------------------- ≤ 1 . (B.17)
⎝ 2 ⎠ 1 – α 1 –α + α2⎝ ⎠ 2
1 –α + α 1 –α + α
Hence, we can use the Taylor series expansion of the natural logarithm to estimate (B.14)
and (B.15). We have


k

x-----
ln ( 1 – x ) ≅ – . (B.18)
k
k=1

Therefore,


⎧ ⎫ k
∑ --k- ( 1 – α ) E { Φ }
1 k
E ⎨ ∆t a = 0 ⎬ = τ ⋅ (B.19)
⎩ –2 ⎭
k=1
159


⎧ ⎫ 1 – α -⎞ k
1--- ⎛ ----------------------

2 k
E ⎨ ∆t a = 1 ⎬ = – τ ⋅ ln ( 1 – α + α ) + τ ⋅ E { Φ }. (B.20)
⎩ –2 ⎭ k ⎝ 1 – α + α 2⎠
k=1

We can approximate (B.19) and (B.20) by neglecting all the moments of Φ for k>2
because we can show that the kth moment is proportional to the kth power of α and thus
shrinks exponentially. Then we have

4
⎧ ⎫ 1 2 2 τ 2 0.5α
E ⎨ ∆t a = 0 ⎬ ≅ τ ⋅ ( 1 – α )E { Φ } + --- ( 1 – α ) E { Φ } = --- ⋅ α + -------------- (B.21)
⎩ –2 ⎭ 2 2 1+α

⎧ ⎫ 1–α 1 1–α 2
E ⎨ ∆t a = 0 ⎬ ≅ τ ⋅ ⎛⎝ -----------------------⎞⎠ E { Φ } + --- ⎛⎝ -----------------------⎞⎠ E { Φ } – τ ⋅ ln ( 1 – α + α )
2 2
2 2 1 –α + α 2
⎩ –2 ⎭ 1 –α + α
(B.22)
.
2 4 5
τ α + 0.5 α + α 2
= --- ⋅ ----------------------------------------------------- – τ ⋅ ln ( 1 – α + α ) .
3
2 ( 1 + α ) ⋅ ( 1 –α + α ) 2

Finally, tc,0 and tc,1 can be found by replacing (B.1), (B.21), and (B.22) in (3.22) and
(3.23).
160

Appendix
Impedance Function

C
An impedance function is a rational function (ratio of two polynomials with real
coefficients) of frequency with no right half-plane poles. Additionally, the numerator
polynomial should be of at most one degree higher than the denominator one. The
conditions for an impedance function can be found in [26][96]. The upper-bound in (4.2)
is not valid if the load does not satisfy the conditions of an impedance function. In other
words, if the overall transfer function of an amplifier is of the form:

Av ( jω ) = g m ⋅ Z ( jω ) (C.1)

and Z ( jω ) is not an impedance function, then the Bode-Fano limit need not be satisfied.
Distributing passive structures between gain stages can result in overall transfer functions
that are not impedance functions per se [23]. Therefore, the GBW product can potentially
be higher than the limit in (4.2). One design approach for such a structure is stagger tuning
of the frequency responses. An early amplitude roll-off due to a low-frequency pole in one
stage can be compensated for with a peaking in the next stage. Similarly, the overall phase
response of passive structures can be properly controlled.
161

Appendix
Mask Error Rate

D
We assume that the amplitude noise cumulative distribution function is Q(.). The probabil-
ity of occurring a mask error is

MER = Pr { S H ≠ S L } , (D.1)

where Pr{.} denotes probability and SH and SL are defined in Section 5.3. SH and SL take
binary values and thus there are two combinations that contribute to (D.1). The condi-
tional probability of each of the combinations can be calculated, given the input bit to the
EOM. Therefore we have

1 ⎛ ⎧ ⎫ ⎧ ⎫
MER = --- ⋅ ⎜ Pr ⎨ SH = 0 S L = 1 in = 1 + z ⎬ + Pr ⎨ SH = 0 S L = 1 in = z ⎬ , (D.2)
2 ⎝ ⎩ ⎭ ⎩ ⎭
⎧ ⎫ ⎧ ⎫⎞
+ Pr ⎨ S H = 1 S L = 0 in = z ⎬ + Pr ⎨ S H = 1 SL = 0 in = 1 + z ⎬⎟
⎩ ⎭ ⎩ ⎭⎠

where z is the input noise at the sampling point. When the EOM is ideal, only input noise
impacts SH and SL values. The last two terms in (D.2) will be identically zero because they
both imply V H < V L . The first two terms both equal the probability that

1 – ( VH – V L ) 1 + ( VH – VL )
---------------------------------- < z < ---------------------------------
-. (D.3)
2 2
Hence, we can write

1 – ( VH – VL ) 1 + ( VH – VL ) 1 – ( VH – VL )
MER = Q ⎛⎝ ----------------------------------⎞⎠ – Q ⎛⎝ ----------------------------------⎞⎠ ≅ Q ⎛⎝ ----------------------------------⎞⎠ . (D.4)
2σ 2σ 2σ
When the impact of EOM is considered, the last two terms in (D.2) are not identical to
zero anymore because SH and SL are the output of two comparators with different noise
162

contribution. However, we can still neglect them in MER calculation for reasonable noise
levels in the comparators. The first term in (D.2) can be written as

⎧ ⎫ ⎧ ⎫ ⎧ ⎫
Pr ⎨ S H = 0 S L = 1 in = 1 + z ⎬ = Pr ⎨ S H = 0 1 + z ⎬ ⋅ Pr ⎨ S L = 1 1 + z ⎬ . (D.5)
⎩ ⎭ ⎩ ⎭ ⎩ ⎭

Because of the bandwidth limitations of the comparators, the probabilities on the right side
of (D.5) are functions of the sampling time and are smaller when sampling time is closer
to the data edge. If the response of the comparators to the input is denoted by y(t) we have

y(t) = ∑ Ai ( t ) . (D.6)
i

We define A i (t) as the response of the comparators to the input in a unit interval,
( i – 1 ) ⋅ T b < t < i ⋅ Tb , where Tb is the bit period. Several Ai(t) exist due to various combi-
nations of symbols that cause ISI. The overlap of Ai(t)’s for all i when transformed to
0 < t < T b is the eye diagram. If we limit ISI to the last n symbols only 2n distinct Ai(t)
could be achieved for a binary modulation. Then from (D.5) we can write

n
2
Ai ( t ) – ( VH – VL ) Ai ( t ) + ( V H – V L )
∑ Q ⎛ -------------------------------------------⎞ ⋅ ⎛ 1 – Q ⎛ -------------------------------------------⎞ ⎞ .
1
MER ≅ ----- (D.7)
n ⎝ 2σ ⎠ ⎝ ⎝ 2σ ⎠⎠
2
i=1

For simplicity we rewrite (D.7) as

A ( t ) – ( VH – VL ) A ( t ) + ( VH – V L )
MER ≅ Q ⎛⎝ -----------------------------------------⎞⎠ ⋅ ⎛⎝ 1 – Q ⎛⎝ -----------------------------------------⎞⎠ ⎞⎠ , (D.8)
2σ 2σ
where the sum is implicit in the notation. Equation (D.7) can be used to generate a 2D map
of the MER.
163

Bibliography

[1] http://www.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm.

[2] http://www.internetworldstats.com.

[3] Telcordia (formerly Bellcore) publication, GR-253-CORE, Synchronous Optical


Network (SONET) Transport Systems: Common Generic Criteria, Sept. 2000.

[4] American National Standards Institute (ANSI) publication, T1.105-2001,


Synchronous Optical Network (SONET)-Basic Description Including Multiplex
Structure, Rates, and Formats, 2001.

[5] http://www.ieee802.org/3/.

[6] http://www.intel.com/products/processor/index.htm.

[7] http://www.cisco.com/en/US/netsol/ns340/ns394/ns259/ns261/networking_solutions_
white_paper09186a00800c464f.shtml.

[8] http://www.rambus.com/downloads/Networking_Backgrounder.pdf.

[9] V. Stojanovic and M. Horowitz, “Modeling and Analysis of High-Speed Links,”


Proceedings of the IEEE Custom Integrated Circuits Conference (CICC'03), pp.
589-594, Sept. 2003.

[10] http://www.corning.com/docs/opticalfiber/CO9562.pdf.

[11] G. E. Moore, “Cramming More Components onto Integrated Circuits,” Electronics


Magazine, vol. 38, no. 8, April 1965.

[12] G. E. Moore, “No Exponential Is Forever: But “Forever” Can Be Delayed,” IEEE
International Solid-State Circuits Conference Digest of Technical Papers,
(ISSCC'03), pp. 20-23, Feb. 2003.

[13] http://www.itrs.net/Common/2004Update/2004Update.htm.

[14] http://www.itrs.net/Common/2004Update/2004_04_Wireless.pdf.
164

[15] H. Nyquist, “Certain Topics in Telegraph Transmission Theory,” AIEE


Transactions., vol. 47, pp. 617-644, April 1928.

[16] R. W. Lucky, J. Salz, and E. J. Weldon, Jr., Principles of Data Communication,


first edition, McGraw-Hill, New York, 1968.

[17] J. G. Proakis, Digital Communications, fourth edition, McGraw-Hill, Boston,


2001.

[18] B. Analui, J. Buckwalter, and A. Hajimiri, “Estimating Data-Dependent Jitter of a


General LTI System from Step Response,” IEEE MTT-S International Microwave
Symposium Digest, (IMS'05), Long Beach, CA, June 2005.

[19] B. Analui, J. Buckwalter, and A. Hajimiri, “Data-Dependent Jitter in Serial


Communications,” IEEE Transactions on Microwave Theory & Techniques, in
press.

[20] J. Buckwalter, B. Analui, and A. Hajimiri, “Predicting Data-Dependent Jitter,”


IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 51, no. 9, pp.
453-457, Sept. 2004.

[21] J. Buckwalter, B. Analui, and A. Hajimiri, “Data-Dependent Jitter and


Crosstalk-Induced Bounded Uncorrelated Jitter in Copper Interconnects,” IEEE
MTT-S International Microwave Symposium Digest, (IMS'04), vol. 3, pp.
1627-1630, June 2004.

[22] J. Buckwalter, B. Analui, and A. Hajimiri, “Deterministic Jitter Equalizer,” U.S.


and PCT Patents Pending.

[23] B. Analui and A. Hajimiri, “Multi-Pole Bandwidth Enhancement Technique for


Trans-Impedance Amplifiers,” Proceedings of the European Solid-State Circuits
Conference (ESSCIRC'02)-Italy, pp. 303-306, Sept. 2002.

[24] B. Analui and A. Hajimiri, “Bandwidth Enhancement for Trans-Impedance


Amplifiers,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1263-1270,
Aug. 2004.

[25] B. Analui and A. Hajimiri, “Method and Apparatus for a Multi-Pole Bandwidth
Enhancement Technique for Wideband Amplification,” U.S. Patent #6,778,017.
165

[26] H. Bode, Network Analysis and Feedback Amplifier Design, D. Van Nostrand
company, Princeton, 1945.

[27] R. M. Fano, “Theoretical Limitations on the Broadband Matching of Arbitrary


Impedances,” Journal of Franklin Institute, vol. 249, pp. 57-83, Jan. 1950; pp.
139-154, Feb. 1950.

[28] B. Analui, A. Rylyakov, S. Rylov, M. Meghelli, and A. Hajimiri, “A 10Gb/s Eye


Opening Monitor in 0.13µm CMOS,” IEEE International Solid-State Circuits
Conference Digest of Technical Papers, (ISSCC'05), pp. 332-333, Feb. 2005.

[29] B. Analui, A. Rylyakov, S. Rylov, M. Meghelli, and A. Hajimiri, “A 10Gb/s


Two-Dimensional Eye-Opening Monitor in 0.13µm Standard CMOS,” IEEE
Journal of Solid-State Circuits, in press.

[30] B. Analui and A. Hajimiri, “Instantaneous Clockless Data Recovery and


Demultiplexing,” IEEE Transactions on Circuits and Systems II: Express Briefs, in
press.

[31] B. Analui and A. Hajimiri, “Statistical Analysis of Integrated Passive Delay


Lines,” Proceedings of the IEEE Custom Integrated Circuits Conference
(CICC'03), pp. 107-110, Sept. 2003.

[32] B. Analui and A. Hajimiri, “System and Method for Clockless Data Recovery,”
U.S. and PCT Patents Pending.

[33] C. E. Shannon, “A Mathematical Theory of Communication,” Bell System


Technical Journal, vol. 27, pp. 379-423 and 623-656, July and October, 1948.

[34] R. Farjad-Rad, C.-K. K. Yang, M. Horowitz, and T. Lee, “A 0.4µm CMOS


10-Gb/s 4PAM pre-emphasis serial link transmitter,” IEEE Journal of Solid-State
Circuits, pp. 580-585, May 1999.

[35] J. L. Zerbe, et al., “Equalization and Clock Recovery for a 2.5-10Gb/s


2PAM/4PAM Backplane Transceiver Cell,” IEEE Journal of Solid-State Circuits,
vol. 38, no. 12, pp. 2121-2130, Dec. 2003.

[36] E. Sackinger, Broadband Circuits for Optical Fiber Communication,


Wiley-Interscience, 2005.
166

[37] L. M. DeVito, “A Versatile Clock Recovery Architecture and Monolithic


Implementation,” in Monolithic Phase-Locked Loops and Clock Recovery
Circuits: Theory and Design, B. Razavi: Editor, New York, IEEE Press, 1996, pp.
405-420.

[38] A. Widmer and P. Franaszek, “A DC-Balanced, Partitioned-Block, 8B/10B


Transmission Code,” IBM Journal of Research & Development, vol. 27, no. 5, pp.
440-451, Sept. 1993.

[39] B. Razavi, Design of Integrated Circuits for Optical Communications,


McGraw-Hill, 2003.

[40] A. B. Carlson, Communication Systems, third edition, McGraw-Hill, 1986.

[41] H. Meyr, M. Moeneclaey, and S. A. Fetchel, Digital Communication Receivers:


Synchronization, Channel Estimation, and Signal Processing, Wiley-InterScience,
1998.

[42] J.-M. Patenaude, “High-Speed Backplanes Pose New Challenges to Comms


Designers,” http://www.commsdesign.com, Jan. 07, 2004.

[43] G. P. Agrawal, Fiber-Optic Communication Systems, second edition,


Wiley-Interscience, 1997.

[44] E. A. Lee and D. G. Messerschmitt, Digital Communication, second edition,


Kluwer Academic Publishers, 1994.

[45] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing,


Prentice-Hall International, Inc., 1989.

[46] J. Buckwalter and A. Hajimiri, “An active analog delay and the delay reference
loop,” IEEE Radio Frequency Integrated Circuits (RFIC) Symposium Digest of
Papers, pp. 17-20, June 2004.

[47] C. Pelard, et al., “Realization of multigigabit channel equalization and crosstalk


cancellation integrated circuits,” IEEE Journal of Solid-State Circuits, vol. 39, no.
10, pp. 1659-1670, Oct. 2004.

[48] H. Wu, J. A. Tierno, P. Pepeljugoski, J. Schaub, S. Gowda, J. A. Kash, and A.


Hajimiri, “Integrated Transversal Equalizers in High-Speed Fiber Optic Systems,”
IEEE Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2131-2137, Dec. 2003.
167

[49] S. Reynolds, P. Pepeljugoski, J. Schaub, J. Tierno, and D. Beisser, “A 7-tap


Transverse Analog FIR Filter in 0.13µm CMOS for Equalization of 10Gb/s
Fiber-Optic Data Systems,” IEEE International Solid-State Circuits Conference
Digest of Technical Papers, (ISSCC'05), pp. 330-331, Feb. 2005.

[50] D.M. Pozar, Microwave Engineering, second edition, John Wiley & Sons, New
York, 1998.

[51] R. W. Lucky, “Automatic Equalization for Digital Communication,” Bell System


Technical Journal, vol. 44, pp. 547-588, April 1965.

[52] S. Gondi and B. Razavi, “A 10Gb/s CMOS Adaptive Equalizer for Backplane
Applications,” IEEE International Solid-State Circuits Conference Digest of
Technical Papers, (ISSCC'05), pp. 328-329, Feb. 2005.

[53] B. R. Saltzberg, “Intersymbol Interference Error Bounds with Application to Ideal


Bandlimited Signaling,” IEEE Transactions on Information Theory, vol. IT-14, no.
4, pp. 563-568, July 1968.

[54] H. C. van den Elzen, “On the Theory and the Calculation of Worst-Case Eye
Openings in Data-Transmission Systems,” Philips Research Reports, vol. 30, no.
6, pp. 385-435, Dec. 1975.

[55] C. W. Helstrom, “Calculating Error Probabilities for Intersymbol and Cochannel


Interference,” IEEE Transactions on Communications, vol. COM-34, no. 5, pp.
430-435, May 1986.

[56] N. C. Beaulieu, “The Evaluation of Error Probabilities for Intersymbol and


Cochannel Interference,” IEEE Transactions on Communications, vol. 39, no. 12,
pp. 1740-1749, Dec. 1991.

[57] N. C. Beaulieu and A. A. Abu-Dayya, “The Evaluation of Error Probabilities for


Low-Frequency Attenuation Channels,” IEEE Transactions on Communications,
vol. 42, no. 9, pp. 2676-2683, Sept. 1994.

[58] B. Razavi, Editor, Monolithic Phase-Locked Loops and Clock Recovery Circuits:
Theory and Design, IEEE Press, New York, 1996.

[59] Y. Takasaki, Digital Transmission Design and Jitter Analysis, Artech House,
Boston, 1991.
168

[60] J. Savoj, A 10Gb/s CMOS Clock and Data Recovery Circuit, Ph. D. Dissertation,
University of California, Los Angeles, CA, 2001.

[61] International Committee for Information Technology Standardization (INCITS),


Fibre Channel–Methodologies for Jitter and Signal Quality Specification–MJSQ,
Technical Report REV 10.0, March 10, 2003.

[62] A. Leon-Garcia, Probability and Random Processes for Electrical Engineering,


second edition, Addison Wesley, 1994.

[63] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and Phase Noise in Ring
Oscillators,” IEEE Journal of Solid-State Circuits, vol. 34, no. 6, pp. 790-804, June
1999.

[64] E. D. Sunde, “Self-Timing Regenerative Repeaters,” The Bell System Technical


Journal, vol. 36, no. 7, pp. 891-937, July 1957.

[65] C. J. Byrne, B. J. Karafin, and D. B. Robinson, Jr., “Systematic Jitter in a Chain of


Digital Regenerators,” The Bell System Technical Journal, vol. 42, no. 11, pp.
2679-2714, Nov. 1963.

[66] B. R. Saltzberg, “Timing Recovery for Synchronous Binary Data Transmission,”


The Bell System Technical Journal, vol. 46, no. 3, pp. 593-622, March 1967.

[67] F. M. Gardner, “Self-Noise in Synchronizers,” IEEE Transactions on


Communications, vol. COM-28, no. 8, pp. 1159-1163, Aug. 1980.

[68] J. C. Y. Huang, K. Feher, and M. Gendron, “Techniques to Generate ISI and


Jitter-Free Bandlimited Nyquist Signals and a Method to Analyze Jitter Effects,”
IEEE Transactions on Communications, vol. COM-27, no. 11, pp. 1700-1711,
Nov. 1979.

[69] J. W. M. Bergmans, “Adaptive Characterization of Write-Precompensation


Circuits,” IEEE Transactions on Magnetics, vol. 39, no. 4, pp. 2109-2114, July
2003.

[70] M. Shimanouchi, “New Paradigm for Signal Paths in ATE Pin Electronics Are
Needed for Serialcom Device Testing,” Proceedings of the IEEE International Test
Conference, ITC’02, pp. 903-912, Oct. 2002.
169

[71] M. Shimanouchi, “An Approach to Consistent Jitter Modeling for Various Aspects
and Measurement Methods,” Proceedings of the IEEE International Test
Conference, ITC’01, pp. 848-857, Oct.-Nov. 2001.

[72] M. P. Li, J. Wilstrup, R. Jessen, and Dennis Petrich, “A New Method for Jitter
Decomposition through Its Distribution Tail Fitting,” Proceedings of the IEEE
International Test Conference, ITC’99, pp. 788-794, Sept. 1999.

[73] Y. Cai, S. A. Werner, G. J. Zhang, M. J. Olsen, R. D. Brink, “Jitter Testing for


Multi-Gigabit Backplane SerDes-Techniques to Decompose and Combine Various
Types of Jitter,” Proceedings of the IEEE International Test Conference, ITC’02,
pp. 700-709, Oct. 2002.

[74] J. Wilstrup, “A Method of Serial Data Jitter Analysis Using One-Shot Time
Interval Measurements,” Proceedings of the IEEE International Test Conference,
ITC’98, pp. 819-823, Oct. 1998.

[75] Wavecrest Technologies, Jitter Fundamentals, SMPB-00019 Rev. 1.


http://www.wavecrestcorp.com/technical/pdf/jittfun_hires_sngls.pdf.

[76] R. Bellman, Perturbation Techniques in Mathematics, Physics, and Engineering,


Holt Rinehart and Winston, Inc., New York, 1964.

[77] W. R. Bennett, “Statistics of regenerative digital transmission,” The Bell System


Technical Journal, vol. 37, pp. 1501-1542, Nov. 1958.

[78] G. L. Cariolaro and F. Todero, “A general spectral analysis of time jitter produced
in a regenerative repeater,” IEEE Transactions on Communications, vol. COM-25,
no. 4, pp. 417-426, April 1977.

[79] R. A. Gibby and J. W. Smith, “Some extensions of Nyquist's telegraph


transmission theory,” The Bell System Technical Journal, vol. 44, pp. 1487-1510,
Sept. 1965.

[80] J. Buckwalter and A. Hajimiri, “A 10Gb/s Data-Dependent Jitter Equalizer,”


Proceedings of the IEEE Custom Integrated Circuits Conference, (CICC'03), pp.
39-42, Oct. 2004.

[81] H. E. Ives, et al., “Electrooptical Transmission,” US Patent 2,058,883, Oct. 27


1936.
170

[82] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, Cambridge


University Press, 1998.

[83] S. S. Mohan, M. D. M. Hershenson, S. P. Boyd, and T. H. Lee, “Bandwidth


Extension in CMOS with Optimized On-Chip Inductors,” IEEE Journal of
Solid-State Circuits, vol. 35, pp. 346-355, March 2000.

[84] M. Neuhauser, H-M. Rein, and H. Wernz, “Low-Noise High-Gain Si-Bipolar


Preamplifiers for 10 GB/s Optical-Fiber Links: Design and Realization,” IEEE
Journal of Solid-State Circuits, vol. 31, pp. 24-29, Jan. 1996.

[85] M. Neuhauser, H-M. Rein, H. Wernz, and A. Felder, “13 GB/s Si Bipolar
Preamplifier for Optical Front Ends,” Electronics Letters, vol. 29, No. 5, pp.
492-493, March 1993.

[86] F. Chien and Y. Chan, “Bandwidth enhancement of trans-impedance amplifier by a


capacitive-peaking design,” IEEE Journal of Solid-State Circuits, vol. 34, pp.
1167-1170, Aug. 1999.

[87] E. Ginzton, et al., “Distributed Amplification,” Proceedings of the IRE, pp.


956-969, Aug. 1948.

[88] H. T. Ahn and D. J. Allstot, “A 0.5-8.5GHz Fully Differential CMOS Distributed


Amplifier,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 985-993, Aug. 2002.

[89] H. A. Wheeler, “Wide-Band Amplifiers for Television,” Proceedings of the IRE,


pp. 429-438, July 1939.

[90] W. Hansen, “On maximum gain-bandwidth product in amplifiers,” Journal of


Applied Physics, vol. 16, pp. 528-534, 1945.

[91] D. C. Youla, “A new theory of broadband matching,” IEEE Transactions on


Circuit Theory, vol. CT-11, pp. 30-50, Mar. 1964.

[92] W. Ku and W. Petersen, “Optimum gain-bandwidth limitations of transistor


amplifiers as reactively constrained active two-port networks,” IEEE Transactions
on Circuits and Systems, vol. CAS-22, pp. 523-533, June 1975.

[93] H. H. Kim, S. Chandrasekhar, C. A. Burrus, Jr., and J. Bauman, “A Si BiCMOS


Trans-impedance Amplifier for 10Gb/s SONET Receiver,” IEEE Journal of
Solid-State Circuits, vol. 36, pp. 769-776, May 2001.
171

[94] J. P. Rooney, R. Parry, I. Hunter, R. D. Pollard, “A filter synthesis technique


applied to the design of multistage broadband microwave amplifiers,” IEEE
MTT-S International Microwave Symposium Digest, vol. 3, pp. 1915 -1918, 2002.

[95] T. P. Budka, “Wide-bandwidth millimeter-wave bond-wire interconnects,” IEEE


Transactions on Microwave Theory and Techniques, vol. 49, part 1, pp. 715-718,
April 2001.

[96] T. Wong, Fundamentals of Distributed Amplification, first edition, Artech House,


Boston, 1993.

[97] A. I. Zeverev, Handbook of Filter Synthesis, John Wiley & Sons, 1967.

[98] W. Chen, Theory and Design of Broadband Matching Networks, Pergamon Press,
Oxford, 1976

[99] H. J. Orchard, “Inductorless Filters,” Electronics Letters, vol. 2, pp. 224-225, Sept.
1966.

[100] M. E. Van Valkenburg, Analog Filter Design, HRW Inc., 1982.

[101] A. Abidi, “Gigahertz Transresistance Amplifiers in Fine Line NMOS,” IEEE


Journal of Solid-State Circuits, vol. 19, no. 6, pp. 986-994, Dec. 1984.

[102] ASITIC (Simulation of Spiral Inductors and Transformers),


http://formosa.eecs.berkeley.edu/~niknejad/asitic.html.

[103] SONNET Software, High frequency electromagnetic software [Online]. Available:


http://www.sonnetusa.com/.

[104] H. Wu, J. A. Tierno, P. Pepeljugoski, J. Schaub, S. Gowda, J. A. Kash, and A.


Hajimiri, “Differential 4-tap and 7-tap Transverse Filters in SiGe for 10Gb/s
Multimode Fiber Optic Link Equalization,” IEEE International Solid-State
Circuits Conference Digest of Technical Papers, (ISSCC'03), pp. 180-181, Feb.
2003.

[105] E. A. Newcombe and S. Pasupathy, “Error Rate Monitoring for Digital


Communications,” Proceedings of the IEEE, vol. 70, no. 8, pp. 805-828, Aug.
1982.
172

[106] J. B. Scholz, “Error Performance Monitoring for Digital Communications


Systems,” Australian Telecommunication Research, vol. 25, no. 2, pp. 1-25, 1991.

[107] T. J. Nohara, A. Premji, and W. R. Seed, “A new Signal Quality Degradation


Monitor for Digital Transmission Channels,” IEEE Transactions on
Communications, vol. 43, no. 2-4, pp. 1333-1336, Feb./March/April 1995.

[108] D. Kilper, R. Bach, D. Blumenthal, D. Einstein, T. Landolsi, L. Ostar, M. Preiss,


and A.Willner, “Optical Performance Monitoring,” IEEE Journal of Lightwave
Technology, vol. 22, no. 1, pp. 294-304, Jan. 2004.

[109] R. A. George, “Method and Means for Detecting Error Rate of Transmitted Data,”
US Patent #3,721,959, March 20, 1973.

[110] C. R. Hogge, “Performance Monitoring of a Digital Radio by Pseudo-Error


Detection,” IEEE National Telecommunications Conference, pp. 43.3/1-3, Dec.
1977.

[111] J. M. Keelty and K. Feher, “On-Line Pseudo Error Monitors for Digital
Transmission Systems,” IEEE Transactions on Communications, vol. COM-26,
no. 8, pp. 1275-1282, Aug. 1978.

[112] S. Shin, B.-G. Ahn, M. Chung, S. Cho, D. Kim, and Y. Park, “Optics Layer
Protection of Gigabit-Ethernet System by Monitoring Optical Signal Quality,”
Electronics Letters, vol. 38, no. 9, pp. 1118-1119, Sept. 2002.

[113] S. G. Harman, “Digital Signal Performance Monitor,” US Patent #4,097,697, June


27, 1978.

[114] Y. Tremblay and D. J. Nicholson, “Binary Data Regenerator with Adaptive


Threshold Level,” US Patent #4,823,360, April 18, 1989.

[115] M. Kawai, H. Watanabe, T. Ohtsuka, and K. Yamaguchi, “Smart Optical Receiver


with Automatic Decision Threshold Setting and Retiming Phase Alignment,”
IEEE Journal of Lightwave Technology, vol. 7, no. 11, pp. 1634-1640, Nov. 1989.

[116] P. J. Anslow, R. A. Habel, and A. G. Solheim, “Eye Quality Monitor for a 2R


Regenerator,” US Patent #6,433,899 B1, Aug. 13, 2002.
173

[117] K. Y. Maxham, C. R. Hogge, Jr., S. J. Clendening, C.-T. Chen, J. M. Dugan, S. K.


Sheem, and D. O. Offutt, “Rockwell 135-Mbit/s Lightwave System,” IEEE
Journal of Lightwave Technology, vol. LT-2, no. 4, pp. 394-402, Aug. 1984.

[118] T. Ellermeyer, U. Langmann, B. Wedding, and W. Pohlmann, “A 10Gb/s Eye


Opening Monitor IC for Decision-Guided Optimization of the Frequency
Response of an Optical Receiver,” IEEE International Solid-State Circuits
Conference Digest of Technical Papers, (ISSCC'00), pp. 50-51, Feb. 2000.

[119] T. Ellermeyer, U. Langmann, B. Wedding, and W. Pohlmann, “A 10Gb/s


Eye-Opening Monitor IC for Decision-Guided Adaptation of the Frequency
Response of an Optical Receiver,” IEEE Journal of Solid-State Circuits, vol. 35,
no. 12, pp. 1958-1963, Dec. 2000.

[120] F. Buchali, S. Lanne, J.-P. Thiery, W. Baumert, and H. Bulow, “Fast Eye Monitor
for 10Gbits/s and its Application for Optical PMD Compensation,” Optical Fiber
Communication Conference and Exhibit, (OFC’01), vol. 2, pp. TuP5/1-3, 2001.

[121] F. Buchali, W. Baumert, H. Bulow, U. Feiste, R. Ludwig, and H. G. Weber, “Eye


monitoring in a 160 Gbit/s RZ field transmission system,” 27th European
Conference on Optical Communication, (ECOC'01), vol. 3, pp. 288-289,
Sept.-Oct. 2001.

[122] F. Buchali, W. Baumert, H. Bulow, and J. Poirrier, “A 40 Gb/s Eye Monitor and its
Application to Adaptive PMD compensation,” Optical Fiber Communication
Conference and Exhibit, (OFC’02), pp. 202-203, March 2002.

[123] F. Buchali, W. Baumert, and H. Bulow, “Adaptive 1 and 2 stage


PMD-Compensators for 40 Gbit/s Transmission Using Eye Monitor Feedback,”
Optical Fiber Communications Conference, (OFC’03), vol.1, pp. 262-264, March
2003.

[124] G. Gehler, R. Wessel, F. Buchali, G. Thielecke, A. Heid, and H. Bulow, “Dynamic


Adaptation of a PLC Residual Chromatic Dispersion Compensator at 40Gb/s,”
Optical Fiber Communications Conference, (OFC’03), vol.2, pp. 750-751, March
2003.

[125] K. Azadet, E. F. Haratsch, H. Kim, F. Saibi, J. H. Saunders, M. Shaffer, L. Song,


and M.-L. Yu, “Equalization and FEC Techniques for Optical Transceivers,” IEEE
Journal of Solid-State Circuits, vol. 37, no. 3, pp. 317-327, March 2002.
174

[126] T. Miki, H. Kouno, T. Kumamoto, Y. Kinoshita, T. Igarashi, and K. Okada, “A


10-b 50-MS/s 500-mW A/D converter using a differential-voltage subconverter,”
IEEE Journal of Solid-State Circuits, vol. 29, no. 4, pp. 516-521, April 1994.

[127] M. Zargari, A BiCMOS Active Substrate Probe Card Technology for Digital
Testing, Ph.D. Dissertation, Stanford University, Stanford, CA, March 1997.

[128] M. Banu and A. E. Dunlop, “Clock recovery circuits with instantaneous locking,”
Electronics Letters, vol. 28, no. 23, pp. 2127-2130, 5 Nov. 1992.

[129] M. Banu and A. E. Dunlop, “660Mb/s CMOS Clock Recovery Circuit with
Instantaneous Locking for NRZ Data and Burst-Mode Transmission,” IEEE
International Solid-State Circuits Conference Digest of Technical Papers,
(ISSCC’93), vol. 40, pp. 102-103, Feb. 1993.

[130] A. E. Dunlop, W. C. Fischer, M. Banu, and T. Gabara, “150/30 Mb/s CMOS


Non-Oversampled Clock and Data Recovery Circuits with Instantaneous Locking
and Jitter Rejection,” IEEE International Solid-State Circuits Conference.
(ISSCC’95) Digest of Technical Papers, vol. 38, pp. 44-45, Feb. 1995.

[131] M. Nakamura, N. Ishihara, and Y. Akazawa, “A 156 Mbps CMOS clock recovery
circuit for burst-mode transmission,” IEEE Symposium on VLSI Circuits Digest of
Technical Papers, pp. 122-123, June 1996.

[132] S. Kobayashi and M. Hashimoto, “A Multibitrate Burst-Mode CDR Circuit with


Bit-Rate Discrimination Function from 52 to 1244 Mb/s,” IEEE Photonics
Technology Letters, vol. 13, no. 11, pp. 1221-1223, Nov. 2001.

[133] J. F. Wakerly, Digital Design: Principles and Practices, second edition, Prentice
Hall, 1994.

[134] R. Aparicio and A. Hajimiri, “Capacity Limits and Matching Properties of


Integrated Capacitors,” IEEE Journal of Solid-State Circuits, vol. 37, no. 3, pp.
384-393, March 2002.

You might also like