Dissertation
Dissertation
Dissertation
Applications
Analysis, Design and Implementation
An Ultra-Low-Power ADPLL for BLE
Applications
Analysis, Design and Implementation
Afstudeerverslag
Lianbo Wu
Academia Supervisor:
Industry Supervisor:
Dr. Xin He
Committee:
Prof. Dr. R. B. Staszewski
Prof. Dr. Michiel Pertijs
Prof. Dr. Nick van der Meijs
Dr. Xin He
Leonardo da Vinci
万物之始,大道至简,衍化至繁
老子
Abstract
In recent years, wireless personal area network (WPAN) applications have trig-
gered the needs for low-cost and low-power PLLs which also provide good perfor-
mance. All-digital phased-locked loops (ADPLLs) are preferred over their analog
counterparts in nanoscale CMOS technology due to their flexibility, configurabil-
ity, small area and easy portability. However, fractional spurs and insufficiently low
power dissipation are main problems related to conventional TDC-based structures.
In this work, a sub-half mW 2.2 GHz - 3 GHz fractional-N ADPLL is presented for
Bluetooth Low Energy (BLE) applications. Coarse-fine DTC based phase predictor
with dynamic element matching (DEM) ability and clock gated phase error freezer
are proposed to reduce the power while maintaining good phase noise and frac-
tional spur performance. This prototype ADPLL was taped out on Sep. 11th 2014
in GlobalFoundries 40 nm Low Power (40 nm-LP) technology. Based on post-layout
simulations and modelling, it is expected to consume less than 450 𝜇W with inte-
grated rms jitter of 1.5 ps for the close integer channel and 800 fs for the rest of
channels, leading to a potential state-of-art FoM below -240 dB. Design of the full
ADPLL in terms of system level analysis, digital logic, mixed-signal and RF design
is presented in the thesis.
vii
Acknowledgement
I am deeply grateful to all the people who in one way or another have helped
me during my MSc project. Without the support of others, it would be impossible for
me to reach this stage. First and foremost I would like to express my sincere thanks
to my MSc supervisor, Dr. Robert Bogdan Staszewski. It is really an honor to work
under his supervision on the field of the ADPLL design. Thanks for his guidance in
my MSc project work and other matters. I have learned a lot from his incomparable
expertise on this field, his strict requirement of the design and his passion for the
work. Once again, thank you my supervisor, for your time, attention and patience
with me all over the year. Special thanks to my colleagues in NXP. I want to thank
Dr. Xin He, my industry supervisor, for his help, care, discussion as well as support
for my intern work. He gives me too much. I also want to thank Theo Thurlings
for his time spent with me on the digital flow back-end. The thanks also go to Tarik
Saric for the discussion on DCO design; to Nenad Pavlovic for the discussion on
system level analysis; to Vladislav Dyachenko, Robert Rutten, Salvatore Drago and
other members in the design team for the discussion on circuit design, layout and
chip finishing.
I would like to express my gratitude to the other members of my MSc defense
committee, Dr. Michiel Pertijs and Dr. Nick van der Meijs, for their invaluable time
not only in attending my defence but also in helping me to improve this thesis. I
would like to thank my friends here. These go to the PhD students in Electronics
Research Laboratory (ELCA), especially Gerasimos Vlachogiannakis, Ying Wu and
Zhirui Zong for the technical discussions and the helps on other issues; go to Qilong
Liu for the wonderful cooperation on the courses; and also go to all my classmates
for the help and the fun during the two-year study in microelectronic track in TU
Delft. Last but not least, I would like to thank my parents for all that they have
done for me. Only with their support can I take the courage to study and succeed
in the other end of Eurasia . Special thank also goes to my love, Tianzi Wang for
her support and accompaniment since our bachelor university (University of Science
and Technology of China). It was really a right decision for us to choose Delft to stay
ix
x Acknowledgement
together two years ago, after bachelor study. In a word I owe my accomplishments
to my family.
Lianbo Wu
Delft, 2014
Glossary
List of Acronyms
ADPLL all-digital phase-locked loop
FoM figure-of-merit
MoM Metal-oxide-Metal
xi
xii Acknowledgement
TX transmitter
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Frequency Synthesizer for Telecommunication Systems and
Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Common Metrics for Frequency Synthesizer . . . . . . . . . . . . 3
1.4 PLL based Frequency synthesizer . . . . . . . . . . . . . . . . . . 4
1.4.1 Charge-pump PLL . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Research Contribution. . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 8
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
xiii
xiv Contents
6 Conclusion 113
6.1 Conclusion for This Work . . . . . . . . . . . . . . . . . . . . . . . 114
6.2 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A Appendix 117
A.1 Capacitor Bank simulation . . . . . . . . . . . . . . . . . . . . . . 117
A.2 SPI register table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
A.3 Derivation for Mismatch for DTC unit cell design . . . . . . . . . 127
1
Introduction
Yes, we have to divide up our time like that, between our politics and our
equations. But to me our equations are far more important, for politics are
only a matter of present concern. A mathematical equation stands forever.
Albert Einstein
1
.
2 1. Introduction
1 1.1. Introduction
hat frequency synthesizer to IC is what watch to human being. Frequency
W synthesizer is such an important as well as indispensable item that you can
see it everywhere in modern IC products. From telecommunication systems to
digital circuit applications, from clock and data recovery to modulation and wave-
form generation, frequency synthesizer takes places in almost every IC circuit, ei-
ther wireless or wireline. In the mean time, remarkably growth has been seen
in wireless communication ever since year 1901 when Marconi succeed in his ra-
dio experiment, and nowadays people are enjoying the reality of anytime anyplace
communication thanks to emerging IC products. In this emerging IC market, many
untapped opportunities exist in the realm of short range low-cost wireless networks.
Applications in wireless personal area network (WPAN) and wireless sensor network
(WSN) are virgin territories where the new era of wireless would be triggered. Power
budget analysis of such sensor nodes reveals that the wireless transceiver domi-
nates the overall power. This, together with the need for large volume, low-cost,
highly integrated solutions warrant the need for ultra-low power RF transceivers in
nanometer scale digital CMOS technologies. This thesis is an effort in that direction
and explores the implementation of an ultra-low power all-digital phase-locked loop
(ADPLL) for frequency synthesis in WPAN transceivers with main focus on Bluetooth
Low Energy (BLE) application.
Frequency FM TX data
PA
synthesizer
AM
Where A represents the amplitude of the synthesized signal while 𝜔 is its frequency.
𝜓 captures the phase fluctuation and it could be expressed as
The first item represents the periodical modulation and will appear as spurious tone
at the output. The second item in the equation is the random fluctuation and it is
shown as noise-skirt around the carrier frequency.
,QSXW
6LJQDO 3KDVH /RRS 1
26&
'HWHFWRU )LOWHU
ratio. The output is then tracking the input signal. The history of PLL dates back
to early 1930s when British researchers developed the zero-IF receiver to alternate
the famous superheterodyne one proposed by Armstrong in 1918. Problem was
about the rapid drift in frequency with the output of the local oscillator. A French
engineer Henri de Bellescize published the paper [1] about the idea of maintaining
the phase of the oscillator in a desired way by adding a feedback correction. This
is recognized as the first PLL ever published. In 1960s, boosted by large needs in
consumer products such as analog TV and explosion of IC industry, PLL theory and
design got more mature and numerous analog as well as digital PLL structures were
explored and published ever since then. Nevertheless, all-digital PLL did not come
to life until early this century. In the year of 2004, a novel all digital PLL system
was published by TI [2] and it triggered a new trend of PLL design, serving as the
start point of this thesis.
Charge Pump
1
Loop Filter VCO
Up
FVCO
FREF
PFD
Down
FDIV
Frequency
Divider
is reduce in advanced technology nodes due to the reduced voltage supply and this
is a bad news for CP-PLL who basically operates in voltage domain. Besides, the
passives do not scale well with the technology and the loop-filter used in CP-PLL
occupies large area, increasing the overall system costs. In addition, the reduced
gate-oxide thickness in the nanoscale CMOS technologies results in significant cur-
rent leakage through the integration capacitor of the loop-filter. This leakage current
increases the total PLL jitter, thereby degrades the performance.
1.5. Motivation
As analog frequency synthesizer design comes across problems in advanced
CMOS process, methods to integrate RF front-end with baseband processor are
highly desired. Compared with conventional analog design, a digital intensive ap-
proach provides other benefits such as better testability, more reconfigurability,
smaller area and higher degree of integration. Besides, it is also easier for digital
calibration such as DCO gain calibration or fractional spur reducing calibration.
At this proper moment, ADPLL’s emergence meets this anticipation and also en-
ables the way leading to digital assisted RF, e.g. all digital transmitter with two-point
modulation. Compared with its analog counterpart, it is not only more economic
but also technology scaling friendly. Its performance is already proved in standards
1.5. Motivation .
7
such from Bluetooth to GSM [3] to be able to alternate the CP-PLL. What’s more,
the settling procedure is generally more faster owing to the digital algorithm whom
1
it bases on. Numerous publications and researches have been conducted to opti-
mize the performance regarding power, phase noise and spur level. However, it
is still difficult to see a DPLL design realized with sub-mW power consumption. As
shown in Figure 1.4, the only design to break sub-mW barrier among all the state
of the art in the past decade [2] [4] [5] [6] [7] [8] [9] is proposed by IMEC Holst
Centre [5]. Nevertheless, in terms of phase noise and power there is still a large
headroom to fully explore. This is also the start point as well as motivation for
this thesis work, which is to reduced the worst fractional spur with optimized phase
noise and power. The target specifications will be analysed in detail in Chapter 2.
ISSCC'12 CP[4]
ISSCC'14[5]
Integrated Ji er (ps^2)
1 ISSCC'13[6] ISSCC'04[2]
ISSCC'11[8]
ISSCC'11’bangbang [7]
JSSCC'04[9]
0.1
0.1 1 10 100
Power (mW)
1 References
[1] H. de Bellescize, La reception synchrone, L’Onde Electrique (1932).
[5] V. Chillara, Y.-H. Liu, B. Wang, A. Ba, M. Vidojkovic, K. Philips, H. de Groot, and
R. Staszewski, 9.8 An 860𝜇W 2.1-to-2.7GHz all-digital PLL-based frequency
modulator with a DTC-assisted snapshot TDC for WPAN (Bluetooth Smart and
ZigBee) applications, in Solid-State Circuits Conference Digest of Technical
Papers (ISSCC), 2014 IEEE International (2014) pp. 172–173.
[6] J.-W. Lai, C.-H. Wang, K. Kao, A. Lin, Y.-H. Cho, L. Cho, M.-H. Hung, X.-Y. Shih,
C.-M. Lin, S.-H. Yan, Y.-H. Chung, P. Liang, G.-K. Dehng, H.-S. Li, G. Chien, and
R. Staszewski, A 0.27mm2 13.5dBm 2.4GHz all-digital polar transmitter using
34DPA in 40nm CMOS, in Solid-State Circuits Conference Digest of Technical
Papers (ISSCC), 2013 IEEE International (2013) pp. 342–343.
[10] X. Gao, E. A. Klumperink, M. Bohsali, and B. Nauta, A 2.2 ghz 7.6 mw sub-
sampling pll with-126dbc/hz in-band phase noise and 0.15 psrms jitter in 0.18
𝜇m cmos, (2009).
2
ADPLL System: Background,
Analysis and Proposed
Architecture
All-digital PLL (ADPLL) structure vouched its potential for ultra-low power ap-
plication in nanoscale CMOS technology in recent publications [1], as its con-
ventional analog counterpart finds more difficulties to scale in CMOS technol-
ogy. As one of the most exciting invention in RF field, counter based ADPLL
not only addresses the issues such as reference spur faced by charge-pump
PLL, but also benefits people with more reconfigurability, flexibility features to
meet the stringent requirements set by the modern advanced wireless stan-
dards.
13
.
14 2. ADPLL System: Background, Analysis and Proposed Architecture
VCO
FREF
TDC Loop Filter DAC ∼
. 2
÷N
FCW
Σ∆
DCO
FCW + CKV
Σ + Loop Filter ∼
−
−
2 .
FREF
TDC Sampler Σ
CKR
Retiming
difference by TDC. Thus the phase detector is quantizing both the integer and
fractional phase error in the same block. In addition, the covering range required
for the TDC requires a prohibiting large since the whole reference period has to be
covered which is really a disadvantage of this structure. Otherwise, a frequency
detection function has to be added. The positive aspect of this structure is the TDC
operation frequency is only at reference rate.
2. Counter-based architecture The essential of the scheme is based in
purely phase domain. Counter and the reference phase accumulator are introduced
to quantize the integer part of the phase error. While the TDC is holding the burden
of quantizing the fractional phase error. Thus the required operation range for the
TDC is only one output frequency period and the counter could be turned off in lock-
in state since integer phase error is supposed to be zero under that circumstance.
Nevertheless, without special clock gating technique, the operation frequency of
the TDC has to be DCO output frequency which is usually rather high.
Table 2.1 is a summary conclusion for the comparison in terms of power and
complexity as well as flexibility. (This comparison is based on the typical designs.
Special techniques such as clock gating, etc. may change the results.)
Based on all the discussion above, counter-based architecture is chosen for
this project due to its lower complexity as well as more flexibility.
× + IIR × +
×
α α
×
fR ×
fR
Type-I s s
ρ ρ
The block LF(s) shown in Figure 2.3 could be three difference types: it can be
a type-I, with only a proportional gain 𝛼, or type-II with both proportional (𝛼) and
integral (𝜌) paths, or of higher order with IIR filter turned on. These LF parameters
are programmable and can be dynamically configured during normal PLL operation.
For the following frequency response analysis, type-II operation would be assumed
due to its universality (type-I could be considered as type-II with 𝜌 = 0).
For convenience, we would start by open the loop at 𝜙 to calculate the
2.2. ADPLL System Level Analysis .
19
𝜙
𝐻 (𝑠) =
𝑁∗𝜙
𝜌𝑓 𝑓 𝐾 1 (2.2) 2
= (𝛼 + )• ∧ • •
𝑠 𝑠 𝑀
𝜙
𝐻 (𝑠) =
𝜙
𝑁 • 𝑀 • 𝐻 (𝑠)
= (2.4)
1 + 𝐻 (𝑠)
𝛼𝑓 𝑠 + 𝜌𝑓
=𝑁•
𝑠 + 𝑠+ 𝑓
2𝜉𝜔 𝑠 + 𝜔
𝐻(𝑠) = 𝑁 (2.5)
𝑠 + 2𝜉𝜔 𝑠 + 𝜔
𝜌
𝜔 =√ •𝑓 (2.6)
𝑀
and
1 𝛼
𝜉= • (2.7)
2 √𝑀𝜌
For a type-I loop, the closed-loop transfer function simplifies to
𝛼𝑓
𝐻 (𝑠) = 𝑁 • (2.8)
𝑠+
.
20 2. ADPLL System: Background, Analysis and Proposed Architecture
Four independently controlled IIR stages are also implemented as an extra option to
shape the phase noise and attenuate the reference and the TDC quantization noise
at -80 dB/decade slope. This strong filtering also helps the attenuate the close-in
fractional spurs. Each IIR stage has an attenuation factor 𝜆 , and the open-loop
transfer function becomes
𝜌𝑓 𝑓 1 1 + 𝑠/𝑓
𝐻 , (𝑠) = (𝛼 + )• • •∏ (2.10)
𝑠 𝑠 𝑀 1 + 𝑠/𝜆𝑓
𝜙
𝐻 , (𝑠) =
𝜙 ,
1
= (2.11)
1 + 𝐻 (𝑠)
𝑠
=
𝑠 + 𝑠+ 𝑓
indicating that the DCO noise has a high-pass characteristics and that it dom-
inates the PLL phase noise outside the loop bandwidth. For phase noise at low
frequency offsets (i.e., in-band part), the type-II loop will suppress it at a slope of
40 dB/decade.
The second internal noise source 𝜙 , arises from the fractional phase error
counter (usually accomplished by TDC) operation of calculating 𝜖 . Here use TDC
2.2. ADPLL System Level Analysis .
21
(2𝜋) △𝑡 1
ℒ= ( ) (2.12)
12 𝑀 • 𝑇 𝑓
In Eq. 2.12, △𝑡 is the time resolution of the TDC, and 𝑀 • 𝑇 is the after-
divider clock period at the input of TDC. The factor M is due to the divider. The
closed-loop transfer function of the TDC noise can be expressed as
𝜙
𝐻 (𝑠) =
𝜙 ,
𝑀 • 𝐻 (𝑠)
= (2.13)
1 + 𝐻 (𝑠)
𝛼𝑓 𝑠 + 𝜌𝑓
=
𝑠 + 𝑠+ 𝑓
which is a low-pass response with a gain factor of M within the loop bandwidth.
Therefore the phase noise at the ADPLL RF output due to TDC quantization noise
is simply
(2𝜋) △𝑡 1
ℒ= ( ) (2.14)
12 𝑇 𝑓
Besides the three mentioned above, the finite frequency resolution of the DCO
also contributes to the phase noise at the output and this should be kept much
lower than the natural phase noise of the DCO in order to be negligible. Similar to
Eq. 2.12, the phase noise due to DCO frequency quantization could be derived [4]
.
22 2. ADPLL System: Background, Analysis and Proposed Architecture
as
1 △𝑓 1 △𝑓
ℒ= ( ) (𝑠𝑖𝑛𝑐 ) (2.15)
12 △𝑓 𝑓 𝑓
2 Since the DCO input tuning word is held constant between two different values,
the white noise assumption is not fully justified. Hence, there is a sinc function in
E.q 2.15 to account for the zero-order hold operation on the input tuning word of
DCO.
The quantization noise of the DCO has a 20 dB/decade attenuation, similar to
that of the up-converted thermal noise from the oscillator. As long as this quan-
tization noise is kept sufficiently low compared to the inherent phase noise of the
oscillator , the overall phase noise is not significantly affected. Usually the raw res-
olution of the DCO is too coarse for the required specification. For that reason, ΣΔ
dithering on the smallest switched capacitance is applied. As a consequence, the
quantization noise is shaped in frequency resulting in
1 △𝑓 , 1 𝜋△𝑓
ℒ= ( ) (2𝑠𝑖𝑛𝑐 ) (2.16)
12 △𝑓 𝑓 𝑓
△𝑓
△𝑓 , = (2.17)
2
Noise Sources 2
−80
DCO PN: -107 dBc/Hz
Output phase noise [dB]
@1 MHz offset
−100 Fref PN: -150 dBc/Hz
TDC resolution 2ps
−120
−140
−160
uncorrected DCO
Reference
−180 TDC
Variable (DCO)
Composite
−200
3 4 5 6 7 8
10 10 10 10 10 10
Frequency [Hz]
olution and mismatch in phase detector under fractional-N mode would modulate
the output of the ADPLL. Under the lock-in state, the frequency fluctuation of out-
put is as small as tens of kHz so this error-modulating procedure would be exactly
a narrowband FM behaviour. Thanks to the low-pass filtering characteristic loop,
2 the out-of-band fractional spurs generally will not cause serious problem. What
is of interest to designers are usually the positions and levels of worst fractional
spurs shown in-band. Position estimation theory are mature and discussed in pre-
vious publications [11] [12] [13] while the spur level estimation has not been fully
analysed. In the following part of this subsection, spur level estimation derivation
would be demonstrated, in three parts according to the origins of the spurs: limited
resolution, nonlinearity and gain estimation error.
a. Open-loop FM analysis
Frequency modulation (FM) is a process of producing a wave whose instanta-
neous frequency varies as a function of the instantaneous amplitude of a modulating
2.3. Worst Inband Fractional Spurious Components of ADPLL Output
Spectrum .
25
0.75
——Residue error repeats every 4 cycle
with peak amplitude as large as TDC 2
resolution
value
0.50
0.25
0.00
0 63 127 191
Time (Reference cycle)
wave at a rate given by the frequency of the modulating source. From Figure 2.6,
a periodical residue resulted from the rough resolution of TDC could be observed.
This periodically behaved residue could be considered as such a modulating source
of DCO block. In this way, DCO output could be expressed in the method of a
frequency-modulated signal as follows:
Where 𝑉 (𝜏) is the variation at the input of the DCO while 𝜃 is the initial phase.
𝐾 is the DCO sensitivity in Hz.
As we can see from the quantization residue ramping shape regarding time,
the ”modulation” due to the quantization residue could be modelled as a sawtooth
narrowband frequency modulation. As far as this paper is concerned, the modu-
lating signal is one of the Fourier Series components of the sawtooth waveform,
which is given by:
𝑉 (𝑡) = 𝐴 cos (2𝜋𝑓 𝑡) (2.19)
where 𝐴 is the peak amplitude of the modulating signal from digital phase error
while 𝑓 is the frequency of it. Thus the variation phase of the modulating signal
.
26 2. ADPLL System: Background, Analysis and Proposed Architecture
in DCO output is
2𝜋𝐾 ∗𝐴
𝐾 ∫ 𝑉 (𝜏)𝑑𝜏 = sin (2𝜋𝑓 𝑡) (2.20)
2𝜋𝑓
𝑉 (𝑡) = 𝑉 cos (2𝜋𝑓 𝑡)∗ cos (𝑚∗ sin(2𝜋𝑓 𝑡))−𝑉 sin (2𝜋𝑓 𝑡)∗ sin(𝑚∗ sin (2𝜋𝑓 𝑡))
(2.23)
Considering the narrowband modulation, m should be far less than . Thus we
would have
cos(𝑚 ∗ sin (2𝜋𝑓 𝑡)) ≈ 1 (2.24)
sin(𝑚 ∗ sin (2𝜋𝑓 𝑡)) ≈ 𝑚 ∗ sin (2𝜋𝑓 𝑡) (2.25)
Then
This is equivalent to
𝑉 (𝑡) = 𝑉 cos (2𝜋𝑓 𝑡) − 0.5𝑚 ∗ 𝑉 [ cos (2𝜋(𝑓 + 𝑓 )𝑡) − cos (2𝜋(𝑓 − 𝑓 )𝑡)]
(2.27)
In this way we can see that the single sideband (SSB) spur to signal ratio could be
written in this way:
𝑠𝑝𝑢𝑟 △𝑓
= 20 log (0.5𝑚) = 20 log ( ) (2.28)
𝑠𝑖𝑔𝑛𝑎𝑙 2•𝑓
Here fm would be
𝑓 =2 𝑓
𝐹𝐶𝑊 𝑇 (2.29)
𝑓 = 𝑓
𝑡
where 𝑓 is the reference frequency,
2.3. Worst Inband Fractional Spurious Components of ADPLL Output
Spectrum .
27
b. close-loop PM analysis
As the modulating residue is a sawtooth waveform, we need to recall the
Fourier Series of it here first,
𝑥 (𝑡) =
𝐴 𝐴
− Σ
sin (2𝜋𝑛𝑓𝑡)
(2.30) 2
2 𝜋 𝑛
A is the amplitude of the waveform in time domain which is 𝑡 here. Now the in-
band spur level will be investigated first from close-loop aspect. For convenience,
we can start from phase modulation (with sinusoidal modulating signal, it makes
no difference whether one speaks of phase or frequency deviation because the two
are related by the rate of modulation as Eq. 2.21). Thus rewriting Eq. 2.18 as
follows,
𝑉 (𝑡) = 𝑉 sin[𝜔 𝑡 + 𝐾 𝐴 sin (2𝜋𝑓 𝑡)] (2.31)
𝜎 =𝑘 𝐴 𝑟𝑎𝑑 (2.32)
𝜎 =𝑚 (2.33)
Under the assumption of lock-in state, the phase deviation is rather smaller than
, Eq. 2.34 becomes,
𝑠𝑝𝑢𝑟 △𝑡
= 20 log (0.5𝜎 ) = 20 log (2𝜋 )
𝑠𝑖𝑔𝑛𝑎𝑙 2𝑇 (2.38)
𝑡
= 20 log ( )
𝑇
c. comparison
First, since we have the relation between time domain and frequency domain
as
𝑡 𝑓
𝛼 = (2.39)
𝑇 𝑓
The corresponding peak deviation frequency is
1 𝑡
△𝑓 = 𝑓 • (2.40)
𝜋 𝑇
Thus Eq.2.28 is rewritten as,
𝑠𝑝𝑢𝑟 𝑡 ∗𝛼∗𝑓
= 20 log ( ) (2.41)
𝑠𝑖𝑔𝑛𝑎𝑙 2𝜋𝑇 ∗ 𝑓
To show the equivalence between Eq. 2.41 and Eq. 2.38, let’s assume there is one
fractional spur locates at PLL cutoff frequency 𝑓 as
1
𝑓 =𝑓 𝛼𝑓 = (2.42)
2𝜋
where the tangential open-loop response and closed-loop response are both
at unity (actually they are both at -3 dB gain):
𝑠𝑝𝑢𝑟 𝑡 ∗𝛼∗𝑓
= 20 log ( )
𝑠𝑖𝑔𝑛𝑎𝑙 2𝜋𝑇 ∗ 𝑓
(2.43)
𝑡
= 20 log ( )
𝑇
2.3. Worst Inband Fractional Spurious Components of ADPLL Output
Spectrum .
29
Table 2.3: Validation about close integer spur estimation with FCW= 38.0001.
15 ps -34.8911 -35.12
2
64 ps -22.2893 -22.31
This is exactly the same as Eq. 2.38. Theoretically estimated results and Verilog
simulated results comparisons are summarised in Table 2.3 for the worst inband
fractional spur level. No TDC nonlinearity is added into the Verilog model and the
channel is chosen to be close integer one (FCW= 38.0001). Bandwidth is set to be
200 kHz while the standard deviation of the jitter from reference clock is 3 ps which
is much lower than the TDC resolution. 𝑇 is 1.2 GHz in this example. However
this matching result holds under the condition that the thermal noise level from the
loop at the input of the TDC is much lower than the TDC quantization residue level,
which is expected. Once they become comparable to each other, the random noise
could dither the quantization error and the spurs energy would be flatten around
more into the while noise floor. And this could be a solution to reduce the fractional
spur as shown in [14] by adding DTC to dither the reference clock before sending
it into the TDC.
2 0.75
variable
residue
Fourier Series can always be expressed in the form of Eq. 2.19. Then the sinusoidal
modulated DCO could be expressed by a Bessel function [15] series with modulation
index m as defined previously.
𝑉 (𝑡) = 𝑉 {𝐽 (𝑚) sin(𝜔 𝑡) + 𝐽 (𝑚)[sin(𝜔 𝑡 + 2𝜋𝑓 𝑡) − sin(𝜔 𝑡 − 2𝜋𝑓 𝑡)]
+ 𝐽 (𝑚)[sin(𝜔 𝑡 + 2𝜋2𝑓 𝑡) + sin(𝜔 𝑡 − 2𝜋2𝑓 𝑡)]
+ 𝐽 (𝑚)[sin(𝜔 𝑡 + 2𝜋3𝑓 𝑡) − sin(𝜔 𝑡 − 2𝜋3𝑓 𝑡)] + ⋯}
(2.44)
Only when modulation index m is much smaller than 1, we can have the approx-
imation for the fundamental tone in the form of Eq. 2.38. Under this assumption
𝐽 (𝑚) ≈ while the higher order item would be zero. Regarding m, the peak de-
viation value is also related to the shape of the INL. As demonstrated in Figure 2.7,
assume the max INL is 𝑡 then we can go through the same procedure discussed
before to get a result as
𝑠𝑝𝑢𝑟 𝑡
= 20 log (𝜋 ) (2.45)
𝑠𝑖𝑔𝑛𝑎𝑙 𝑇
This nonlinearity is also modelled and simulated with a TDC resolution as 0.5 ps
which is smaller than the thermal noise in the loop so the quantization residue
2.3. Worst Inband Fractional Spurious Components of ADPLL Output
Spectrum .
31
could be fully randomized by the thermal noise. Results match with deviation less
than 2 dB.
0.9
variable
0.6
residue
0.3
0.0
0 15 31 47
time (Reference Cycle)
Now the spur position would be related to the number of the TDC bits while
the level could be expressed as
𝑠𝑝𝑢𝑟 𝑡
= 20 log ( ) (2.46)
𝑠𝑖𝑔𝑛𝑎𝑙 𝑇
.
32 2. ADPLL System: Background, Analysis and Proposed Architecture
𝐺𝑎𝑖𝑛 − 𝐺𝑎𝑖𝑛
𝑡 =2 ∗ •𝑡 (2.47)
𝐺𝑎𝑖𝑛
Table 2.5: Bluetooth radio specifications for the basic rate and for the BLE extension
Spurious emissions
1 MHz 0 dB 15 dB
Here the specifications of the in-band noise floor is set as -85 dBc/Hz and spot
phase noise of -105 dBc/ Hz at 1 MHz offset for much more margin according to
the plan of NXP.
.
34 2. ADPLL System: Background, Analysis and Proposed Architecture
where 𝑡 is the rising edge of reference clock and 𝑡 is the rising edge of the
feedback variable clock, 𝑇 is the period of the feedback variable clock, 𝜀 is the
quantization target of TDC. In lock-in state,
𝑅 , +𝜀=1 (2.49)
where 𝑅 , is the fractional part of the reference phase. However it could be ex-
pressed now also as,
𝑡 −𝑡
𝑅 , +1− =1
𝑇 (2.50)
𝑡 + (1 − 𝑅 , )∗𝑇 =𝑡 +𝑇
Thus under the lock-in state assumption, what TDC is tracking is (1 − 𝑅 , ) plus
additional noise and residue in the loop, which is a predicable variant (1 − 𝑅 , )
plus small turbulence. According to Eq. 2.50, DTC could be introduced to handle
the phase prediction work and thus reference clock is always delayed to be aligned
to the next rising edge of the feedback variable phase. TDC’s work now is to only
quantize the residue between these two phases. Figure 2.10 shows the phase
prediction diagram. This phase prediction scheme is already implemented in [17]
and [1].
.
36 2. ADPLL System: Background, Analysis and Proposed Architecture
0 1 2 3 4 5 6 7
CKV
0
PHR
PHR,f FCW,f=0.25
PHR,f TDC
time
digital
I. DTC resolution » TDC resolution Under this circumstance, the DTC lim-
ited resolution contributes nothing to the output spur or phase noise. Because the
residue error from phase tracking (the fractional phase steps could not be tracked
.
38 2. ADPLL System: Background, Analysis and Proposed Architecture
exactly by DTC resolution) would be detected by the finer TDC. Besides, the large
residue left from DTC could fully scramble the TDC output thus the quantization
noise floor’s assumption still holds (determined by TDC) [4]. Only under close-
integer case would TDC face the deadzone issue [9]. Under this case. the phase
2 prediction scheme experiences no big difference compared with the TDC-only struc-
ture. The only potential impact from DTC stage would be its nonlinearity and gain
estimation error, which could manipulate the TDC output and hence further modu-
lates the output spectrum. Other than these, DTC is only assisting the narrow-range
TDC.
II. DTC resolution ≈ TDC resolution This is the case when DTC resolution
is slightly larger or smaller than TDC resolution. Under this circumstance, what DTC
could not track can not be detected out by TDC also. Hereby the TDC is always
facing the deadzone problem theoretically since DTC always aligns the two input to
the extent that the difference is smaller than TDC’s resolution. Nevertheless, in real
case the strong nonlinearity of DTC and thermal noise could enlarge the difference
2.5. Implementation from System level .
39
and relax this serious problem. To solve this problem, what can be done is either to
add dithering to reference or to use worse crystal to save the TDC out of deadzone.
1.00
DTC working principal
2
0.75
variable
DTC
value
0.50
fractional phase
0.25
0.00
III. DTC resolution « TDC resolution In this situation, what DTC does is
exactly to align the input of TDC into close integer case and it would take extremely
long time for the accumulated error from DTC phase tracking to be detected by
TDC. Under this case the spur and in-band noise floor would be dominated by TDC
mainly.
In a word, in terms of resolution, as long as TDC’s could recognise the residue
left from DTC, this equivalent resolution of phase error detection is determined by
TDC mainly. In terms of nonlinearity resulted spur, both of DTC and TDC would
contribute and generally it is DTC dominating since DTC’s transfer characteristic
would be fully travelled while fractional phase is ramping periodically.
Besides, a wrongly estimated DTC gain also contributes significantly to the
fractional spur. Since DTC’s gain would determine the input error of the TDC, DTC
gain estimation is not only important for spurs but also important for locking. It
would cause serious spur problem in close integer channel with wrongly estimated
DTC gain. Nevertheless, this problem can be totally solved by LMS algorithm as
.
40 2. ADPLL System: Background, Analysis and Proposed Architecture
depicted in [17].
The reference design measured results is rebuilt in the model to verify the
analysis mentioned above, and the result is shown in Figure 2.14. The reference
noise floor is -120 dBc/Hz and 𝛼 = 2 while 𝜌 = 2 . Channel is chosen as
2 FCW=38.001 and reference clock is 32 MHz with RBW=1 kHz. DTC’s resolution
is 22 ps while TDC’s resolution is 15 ps. All these parameters are set up as done
during the measurement. Nonlinearity resulted fractional spurs are concentrating
at multiple of 32 kHz. The INL of DTC is plot in Figure 2.13. The simulation result
matches the analysis really well in the sense that:
0.4
DNL
2
0.2
0
(LSB)
−0.2
−0.4
−0.6
−0.8
0 10 20 30 40 50 60 70
DIGITAL Control CODE
−100
Spectrum of the2 π*rad SSB phase noise at CKV [dBc/Hz]
−150
−200
−250
−300
2 3 4 5 6 7 8 9
10 10 10 10 10 10 10 10
Frequency [Hz] (FFT: len=4645250, rbw=1000)
2 +1
0
-1
Resolution
Resolution
(a) (b)
could be as simple as a D flip-flop. The deadzone (TDC outputs zero while error is
accumulating) issue in close integer channel for TDC is not a problem anymore for
Bang-Bang based DPLL. However, without random noise to dither the input signal,
Bang-Bang’s strong nonlinearity would cause limit cycle problem also and make the
loop hard to be analysed in a linear s-domain model.
Thanks to the thermal noise in the loop which is mainly determined by reference
clock, the inputs would be randomized and the transfer function is shaped by the
thermal noise as shown in Figure 2.16.
The transfer function of the Bang-Bang phase detector is then linearised by
the noise with a equivalent gain as
2 1
𝐾 =√ (2.51)
𝜋𝜎
where 𝜎 is the standard deviation of the thermal noise in the loop under assumption
that the noise is Gaussian distributed. Now in the closed loop transfer function the
gain of Bang-Bang would be observed. Use type-I close loop function as an example
which should be,
𝛼𝐾 𝑓
𝐻 (𝑠) = 𝑁 • (2.52)
𝑠+
and the 3-dB bandwidth of the loop is 𝑓 = 𝛼𝐾 𝑓 /(2𝜋𝑀) now.
By far, two negative issues with Bang-Bang phase detector have to be clarified
here. First is the input phase difference should not be too large compared with
2.5. Implementation from System level .
43
t E[k]
+1
FREF_DLY
D Q -1 2
CKV_FB +1
P t -1 t
(a ) (b ) (c )
Figure 2.16: (a)Simplified Bang-Bang (b)Input edge shifted by thermal noise (c) Gain linearised by
thermal noise
thermal noise in the loop otherwise the transfer function is still strongly nonlinear.
Another drawback of Bang-Bang phase detector is the loop bandwidth will be im-
pacted by the gain of the phase detector which means the bandwidth would be
reference noise depending. However since the quality of the reference clock could
be ensured with no big variance, it is expected to see the bandwidth maintained
stable.
Back to the first issue, a rather fine DTC with acceptable linearity is required
to ensure the Bang-Bang phase detector’s input is always randomized by the loop
thermal noise. In such a way the ”TDC” is always working in close integer mode,
ensured by a good DTC. Instead of a mid-tread quantizer TDC, mid-rise based
Bang-Bang phase detector is adopted to take advantage of the thermal noise so
that the conventional deadzone problem is avoided. Now all of the phase noise as
well as fractional spur issue are determined by DTC. No dithering is needed at all
and in theory the PLL is expected to work perfectly as Bang-Bang DPLL does under
integer-N mode as long as the DTC is extremely linear with picosecond resolution.
Hence, all of the burden of the TDC are transferred into DTC and there are good
reasons to do so in terms of power, mismatch as well as compatibility with digital
calibration ( under the target of ultra-low power, delay chain based DTC or TDC is
preferred to maintain low power as well as simplicity):
.
44 2. ADPLL System: Background, Analysis and Proposed Architecture
In terms of power DTC does not need the D Flip-Flop (DFF) to sample the
output value and is consuming lower power due to simplicity compared with TDC.
In terms of linearity If the delay chains for both TDC and DTC are the
2 same, TDC will get additional mismatch contributed from the array of DFFs.
4 phase output from the divider by 2. By choosing the phase which is closest
to reference edge from the four phases generated, the required range for DTC is
reduced to only of the original 𝑇 .
2. TDC assisted fast locking. Full range TDC has a advantage which is to let
the loop always observe a linearly quantized phase error and thus locking would
2
be rather fast. Before lock-in, DTC is not doing the correct phase prediction and
this results in a drawback of slower settling time compared to full range TDC based
ADPLL designs. However, a Full range TDC could help observe the fractional phase
error in the range of full 𝑇 period at the beginning. The TDC could be switched
to a 1 bit Bang-Bang operation after lock-in. Since spurious and power as well as
phase noise are discussed only after the loop locks, the resolution, linearity of the
full range TDC has no impact at all. In this way, the advantages of both TDC and
Bang-Bang are combined together.
References
[1] V. Chillara, Y.-H. Liu, B. Wang, A. Ba, M. Vidojkovic, K. Philips, H. de Groot, and
R. Staszewski, 9.8 An 860𝜇W 2.1-to-2.7GHz all-digital PLL-based frequency
2 modulator with a DTC-assisted snapshot TDC for WPAN (Bluetooth Smart and
ZigBee) applications, in Solid-State Circuits Conference Digest of Technical
Papers (ISSCC), 2014 IEEE International (2014) pp. 172–173.
[11] E. Temporiti, C. Weltin-Wu, D. Baldi, M. Cusmai, and F. Svelto, A 3.5 GHz wide-
band ADPLL with fractional spur suppression through TDC dithering and feed- 2
forward compensation, Solid-State Circuits, IEEE Journal of 45, 2723 (2010).
[19] D. Pfaff and Q. Huang, A quarter-micron cmos, 1 ghz vco/prescaler-set for very
low power applications, in Custom Integrated Circuits, 1999. Proceedings of
the IEEE 1999 (IEEE, 1999) pp. 649–652.
[21] N. Da Dalt, Linearized Analysis of a Digital Bang-Bang PLL and Its Validity
Limits Applied to Jitter Transfer and Jitter Generation, Circuits and Systems I:
Regular Papers, IEEE Transactions on 55, 3663 (2008).
Jitter or phase noise is always of great concern to the designers of PLL sys-
tem. Jitter and phase noise are different ways of describing an undesired
variation in the timing of events at the output of the PLL. They are difficult to
predict with traditional circuit simulators because the PLL generates repeti-
tive switching events as an essential part of its operation.
49
.
50 3. ADPLL Modelling and Simulation
which are offering steep timing stamps. Table 3.1 summarises the differences be-
tween the two methodologies mentioned above.
Table 3.2 shows the proposed mixed signal simulation methodology which is
used in this thesis design. All the design results presented in this thesis are based
on this flow. One thing has to be explained here is about the AMS in the step 4 in
Table 3.2. This is a simulator available in Cadence which is a mixed signal simulator.
It combines the Verilog-based script together with schematic netlist in one simula-
tion. The essential principal is to simulate all the schematic netlist based parts by
SpectreRF while simulating the digital Verilog codes by NCSim. The interaction be-
tween digital code and voltage domain analog netlist is finished by the predefined
rule guided conversion between analog and digital domain.
.
52 3. ADPLL Modelling and Simulation
3.3.1. Mismatch
Mismatch among different delay stages would cause serious spurious problem
and this mismatch is modelled by a Gaussian distribution model. However, for a
certain INL shape we can also directly predefined the delay for each stage.
in Chapter 4.
3.3.4. Metastability
Metastability could be an issue for Bang-Bang phase detector. It adds really
ignorable effect to the result in terms of the relatively smaller metastability window
compared with the thermal noise in the loop. In this way the transfer function
of the Bang-Bang detector is not so much impacted by this effect. Besides (as 3
will be explained later) in the retiming block, CKV is resampled twice to create a
1.6 ns separation between variable input (CKVG) to the Bang-Bang and resampled
reference CKR. Thus time is rather sufficient for the 1 bit quantizer’s output to reach
to a well defined level.
3.4. RF Modelling
3.4.1. DCO
The oscillator jitter part and wander part are modelled according to [2]. How-
ever, the flicker noise part mainly impacts the integrated jitter but won’t impact the
locking behaviour at all. Besides, considering the flicker noise is not a serious prob-
lem in terms of the design specification, this part is not included in my model. For
jitter noise, the standard deviation of the time domain Gaussian distributed jitter is
𝑇
𝜎△ = √ℒ𝑓 (3.1)
2𝜋
where ℒ is the noise floor and for wander noise the standard deviation is
Δ𝑓
𝜎△ = √𝑇 √ℒ(Δ𝑓) (3.2)
𝑓
where Δ𝑓 represents the offset frequency from the output carrier. The noise is
added to the DCO output in time domain based on Gaussian distribution model.
4 settling behaviour
x 10
1
Simulated
0.5
phase error at TDC output
−0.5
−1
BangBang Type II locking
−1.5
10 11 12 13 14 15 16 17 18
time [us]
X: 9845
Y: −81.99
−80 X: 1.106e+05
Y: −76.02
−100
Spectrum of the2 π*rad SSB phase noise at CKV [dBc/Hz]
−120
−140
−160
−180
−200
−220
−240
−260
3 4 5 6 7
10 10 10 10 10
Frequency [Hz] (FFT: len=9290284, rbw=300)
References
[1] R. B. Staszewski, C. Fernando, and P. T. Balsara, Event-driven simulation and
modeling of phase noise of an RF oscillator, Circuits and Systems I: Regular
Papers, IEEE Transactions on 52, 723 (2005).
In this chapter, the whole design of an ultra-low power ADPLL for BLE appli-
cation is presented in three parts. Section 4.1 demonstrates the digital flow
based synthesizable low speed digital logic blocks. Section 4.2 describes the
mixed signal blocks such as DTC and counter design. The DCO and divider
design which belongs to RF domain is covered in Section 4.3.
59
.
60 4. ADPLL Implementation in Transistor-Level
=35 $
=35 3
3+(>@
7; GDWD>@
LQYB. GFR B 3 >@
GO\BGFR SDWK>@
PHPBGFR 3 >@
63,B)&:>@
7; DOSKD 3 9 7 >@ 27: >@
/RRS FiOWHU 39 7
4 . GFR BPRGQ>@
,QWHUIDFH EDQNBHQ>@
39 7
'7&FWUOBH[W& >@
&.5
&.5
PRGQBRQ
.Y >@
EDQNBVHO>@
EDQNBVHO>@
)&:>@ 3+(>@
'7&FWUOBH[WQ
%DQJ%DQJRXWSXW '7&FWUOBH[WI >@ PHPBGFR $ >@ 27: $% >@
3KDVH
LQYB.WGF >@ LQYB. GWF >@ LQYB. >@ /RRS FiOWHU
$%
%% 'HWHFWRU GFR B $
VUVW
3+(>@
27:BGDWD>@
'7& FWUOI >@
ODPEGD>@
'(0
,,5B(1
&.5
EDQNBHQ>@
Where the 𝑓 means the divided feedback variable high frequency signal. The
number of bits of integer part of FCW determines the upper limit of the ADPLL’s
output frequency 2 ∗ 𝑓 according to Eq.4.1, while its fractional part determines
the resolution of the output frequency. The available reference frequency are 16
MHz and 32 MHz and targeted highest output frequency is at least 2.8 GHz. Thus
7 bits are allocated to the integer part. According to the modulation requirement,
potentially 1 kHz should be available as the output resolution. Thus 16 bits are
allocated to the fractional part of FCW.
The state machine is used to assist the bank switching between the three banks
of DCO by determining the duration and order in which each of the three banks of
DCO are activated after the frequency search is triggered. The block ”tx interface” 4
receives the tx data and the FCW from the digital baseband and generates the
input control to the low-frequency and high frequency paths to facilitate two-point
frequency modulation. The phase detector has four functions: (1) generate integer
phase error by comparing output from both variable phase counter and reference
phase accumulator. (2) Predict the next variable phase by accumulating DTC control
words based on accumulated fractional reference phase. (3) Encode the Bang-Bang
output since the Bang-Bang linear range is set to be double of DTC finest step size.
(4) Observe the locking behaviour by looking at integer phase error and frequency
error deviation. Based on (4) the loop can also be duty-cycled after lock-in at the
cost of boosted in-band noise floor since the equivalent reference clock frequency
is reduced. A separate loop filter for each of the three banks then processes the
phase error to generate the appropriate control words for the DCO. Receiving the
DTC control word, an LFSR based binary-to-thermal rotator encoder is implemented
for relaxing the nonlinearity issue by DEM method. The implementation details of
each of these blocks are described briefly in the following sections.
3+9 , >@
G3+9 , >@ G3+( , >@
G3+( DFFXP >@
' 4
^3+( , >@ÿE`
FNU 4
' 4 QUVW
FNU 4
QUVW
VUVW 3+(>@
4 Figure 4.2: Integer path of differential mode based Phase error detector.
the FCW to generate the instantaneous integer part of the frequency deviation and
when the loop finally locks in, the integer part of the phase error would be zero.
For the fractional part, a combination of DTC and Bang-Bang detector is used. As
discussed in Chapter 2, the accumulated value of the fractional part of the FCW
generates the DTC control words. The overflow resulted from the accumulation
of fractional part of FCW is added to the integer FCW. The scaling by ”𝑖𝑛𝑣 ” is
required to convert the normalization factor of the delay from divider-by-2 various
period (ckvd2) to the DTC unit delay. This value of ”𝑖𝑛𝑣 ” is obtained from the
kdtc calibration block. The total phase error is obtained by adding the output of
Bang-Bang phase detector to the integer part of the phase error. External control
word is set up to help test the DTC part alone during test.
For the Bang-Bang’s output, either +1 or −1, has to be scaled by a gain which
is the ratio between a fine DTC step to a ckvd2 period as this is its linear detection
range.
PRGQBRQ
QUVW
&.5
GO\BGFRSDWK>@
0X[
63,B)&:>@
LQYB. GFR BPRGQ>@
)&:>@
ORZIUHTXHQF\ SDWK
27:BGDWD>@
KLJK IUHTXHQF\ SDWK
IUDF>@
RXW>@ 27: I
' 4
VLJGHOHQ
4
QUVW
RXW>@
QUVW
4 FNU
FNYG
FONBVHO
to thermal DTC control words encoder starts to rotate the thermal output according
to the rotation index from LFSR. An example is shown in Figure 4.6, the rotation
index is 3 and the encoded thermal control code is right shifted by 3 bits. The
comparison between with and without rotation in terms of fractional spur level is
shown in Figure 4.7.
DEM implemented here is to pseudo-randomly possible usage patterns of the
DTC each reference period such that the error arising from unit delay element mis-
matches is scrambled from cycle to cycle. Thus, DEM increases DTC linearity at the
expense of introducing more noise inband at the output spectrum of the ADPLL.
However, the additional noise introduced by DEM is not a serious issue especially
when the fractional spur requirement becomes more important according to certain 4
standards.
&/.
Figure 4.5: 4-bit Galois LFSR.
DEM result
−60
With no DEM
With DEM
−80
−120
−140
−160
−180
4 −200
−220
−240
−260
2 3 4 5 6 7 8 9
10 10 10 10 10 10 10 10
Frequency [Hz]
DEM improvement
6
5.5
4.5
dB
3.5
2.5
2
0 0.5 1 1.5 2 2.5 3
frequency [Hz] x 10
5
The effect of DEM could be observed from Figure 4.7 and Figure 4.8. The INL
of the example in this simulation is 3 LSB.
Figure 4.9 shows the top-level view of digital loop filter. In terms of area
and accuracy, DCO’s capacitor array is implemented in a coarse-fine way by three
segmented parts: PVT, Acquisition and Tracking bank as done in [1]. The former
4
two are basically assisting the frequency acquisition by large frequency steps while
the last one determines the frequency accuracy of the DCO as well as the PLL output
signal. Thus different filters are used to generate corresponding control words to
the three bands according to the input phase error bits (PHE). As already mentioned
in previous chapter, type-I loop consists of only a proportional path which is good
regarding large bandwidth, fast lock at the cost of less filtering effect (20 dB/decade
out-of-band attenuation) and thus is used here for the PVT and Acquisition bank.
With additional integral path, type-II loop could filter the DCO noise with attenuation
of 40 dB/decade outside bandwidth at the cost of narrower bandwidth as well as
slower locking speed. Besides, switchable IIR filter is also implemented in addition
as a potential solution for the case when less noise is preferred. The control blocks
implement the functionality to generate the zero-phase restart signals at mode
switch-over and also to freeze the tuning word once the DCO mode is changed. For
the control blocks offer three features: 1. Freeze the corresponding oscillation
control words which is the input of related capacitor bank. This is done by
the bank selection indicator word 𝑏𝑎𝑛𝑘 to set which operation mode is chosen
at the moment. 2. Generate zero-phase restart signal. This is done by use
an OR gate fed by the output control words to DCO and a preset signal which is
”1” under a certain operation mode. However, whenever 𝑏𝑎𝑛𝑘 changes, it would
trigger a D flip-flop to generate a negative pulse with the width as a reference clock.
This negative pulse is used to clear the contents of the phase error accumulator as
the output would be reset to 0. 3. Open loop test plan. The 𝑏𝑎𝑛𝑘 signal with
0 value would choose the control word from SPI externally instead of passing the
filtered PHE corresponding results to DCO. Thus the DCO open loop performance
could be tested.
.
68 4. ADPLL Implementation in Transistor-Level
/RRS FiOWHU3 9 7
*DLQ 27: 3
3523 &75/
QRUP
%DQNBVHO>@
7\SH,
/RRS FiOWHU$%
*DLQ 27: $
3523 &75/
QRUP
%DQNBVHO>@
7\SH,
3KDVH
GHWHFW /RRS FiOWHU7 %
4
3523 7\SH,,
%DQNBVHO>@
,17
with TDC in ADPLL related literatures but is the most essential component in the
phase detector of this design according to Figure 2.17. However, the delay line
(both voltage controlled and digital controlled) topic is really common and could
be found everywhere: from delay line based TDC [3] to delay-locked loop (DLL),
from VCO (ring oscillator) to pulse width modulation (PWM) [4]. The simple but
meaningful block functions as a cornerstone in a lot of systems.
Regarding a delay line based DTC design, both system level (control logic) and
transistor level (unit cell) concepts are discussed as follows:
a. Architecture of DTC
The DTC’s resolution, complexity as well as power are mostly determined at
4
system level. The requirement for DTC in this design is to cover more than the
required range (800 ps which is corresponding to 1.2 GHz as the divide-by-2 of the
output) with as fine as possible step size while burning as low as possible power.
As set by the goal proposed in Chapter 2, the DTC is required to cover more than
800 ps at a resolution at the level of 2 ps around (reference thermal noise level)
with power lower than 40 𝜇𝑊. What’s more, tolerable linearity has to be ensured
for a certain low worst fractional spur, according to the analysis done in Chapter 2.
However, none of the exiting popular DTC structures implemented in DPLL
could meet such a need at low power. In [2], the resolution is as coarse as 15 ps
and the control logic is not simple as well. The low power of the DTC is achieved
at the cost of smallest size inverters with intolerable nonlinearity. What’s more,
the control logic determines that delay is always generated from a certain direction
along the delay chain. This means at some certain channel, certain cells are working
much more often than others. This is rather undesired because certain mismatch
pattern would be visited periodically and thus introduces strong spurious tones at
output. In [5], the resolution is good but the structure is kind of complex in terms
of the huge numbers of MOS capacitor needed and the strong nonlinearity without
look-up-table (LUT) based calibration. Besides, due to the shunt capacitor based
delay unit, power consumed by that structure is really not low. In other publications,
such as [6], the DTC resolution is not fine enough and the power of 200 𝜇W is also
too high compared with the proposed target.
In this design, all the design considerations of the resolution, linearity and
power with conventional TDC design are shifted to DTC part as covered in Chapter 2.
Nevertheless the sub-gate resolution of DTC seems really hard to be implemented.
.
70 4. ADPLL Implementation in Transistor-Level
Though techniques such as passive delay line [7] and resistive interpolation [8]
could help to offer such a fine resolution, still these methods are either too passive
or analog intensive and thus they are not area economical or digitally compatible.
Nevertheless, delay line based DTC should still be the solution. Delay line could be
generally divided into two categories as: absolute delay and relative delay in terms
of the timing relation between input and output. Absolute delay means
𝜙 =𝜙 +𝑛∗𝑡 (4.2)
where the 𝜙 denominates the phase of input and output signal. n is the digital
delay control word while 𝑡 is the resolution of the delay line. Single delay line or
4 pseudo-differential delay line based TDC [3] and common multiplexer based DTC
(shown in Figure 4.10) all belong to this category. Yet sometimes the relative delay
between input and output phase is of more interest and relative delay based delay
line’s working principal is
𝜙 + 𝑛𝑡 =𝜙 + 𝑛𝑡
(4.3)
𝑑=𝑡 −𝑡
where the equivalent resolution d is realized by the relative difference between two
different delay states of the same delay unit. To achieve this relative delay in TDC,
the so called Vernier delay line consists of at least two delay lines is required. In this
way the mismatch between two different delay line would create serious problem
once going into pico second resolution. Besides, fine resolution realized by such a
structure means that incredible long delay line would be unavoidable and this may
result in large power consumption as well as large mismatch. In [9], a coarse-fine
TDC is proposed. In terms of noise and spur this is really well designed. However,
due to the large time amplifier array as well as the complex interface between
coarse and fine TDC, the power is prohibiting high for the BLE application.
ref
Q Multiplexer
In this design, DTC is used as an alternative block to replace TDC for tracking
4.2. Mixed Signal Implementation .
71
the fractional phase changing. Bang-Bang Detector is used for the error detection.
The equivalent resolution of this structure is determined by DTC step size. Unlike
TDC circuit, with DTC we can realized the relative Vernier delay thinking by a single
delay line only. In this phase predication scheme, any fixed delay between reference
edge and delayed reference edge could be considered as a initial phase offset which
does not impact the loop locking function. In this way we can tolerate relative delay
instead of an absolute delay. Thus with a single delay line, sub-gate resolution is
already possible as shown in Figure 4.11. Assuming all the delay cells are with
two status (when the control signal is low, the corresponding delay is 𝑡 while
when the control signal is high the corresponding delay is 𝑡 ). When no delay is
desired, there will be a fixed delay as 𝑛𝑡 while additional delay according to the 4
control words of ”1” would increase at the step of (𝑡 − 𝑡 ). Here the equivalent
resolution of the DTC is only 𝑡 − 𝑡 and could be infinitely fine as long as the
mismatch is tolerable. However, in order to cover the required range at a low
power regime, a single fine delay line consists of more than 400 ( ) stages is
impossible. Thus coarse-fine is introduced here to reduce the power consumption.
What necessary is to reduce required number of stages as many as possible in terms
of accumulated jitter, mismatch and power. Finally 16 stages of coarse DTC with
60 ps resolution and 32 stages of fine DTC with 2 ps resolution are implemented
in the tape out as an optimized option according to simulation. In this design, only
the ratio between coarse stage and fine stage needs to be taken care of. Power
hungry blocks such as time amplifier are avoided compared with its counterpart
coarse-fine TDC implementation [9]. Besides, in terms of the ratio calibration, it
could be implemented simply by the LMS algorithm as discussed in [10].
d1 d2 dn
X Ă Y
n
t = nt + Σd ( t - t )
delay 0 i =11
i 1 0
𝜏 = ln(2)𝑅 𝐶 (4.4)
where ln(x) is the natural logarithm of x. 𝐶 here consists of self load and input
capacitance of any block driven by it as well as the capacitance of interconnect
wires. 𝑅 depends on the size of the transistors in the inverter and is different for
NMOS and PMOS. The 𝑅 of a transistor when used with full-swing signal is given
4 by
3 𝑉 (1 − 7𝜆𝑉 /9)
𝑅 = (4.5)
4 𝛽((𝑉 − 𝑉 )𝑉 −𝑉 /2)
where 𝛽 = 𝑘 and W and L are transistor’s width and length respectively. The
parameters k, 𝑉 and 𝜆 and the threshold voltage 𝑉 are technology dependant
constants. According to the analysis, we can see the 𝑅 is proportional to L/W and
within a delay chain consists of the same unit cells, 𝐶 is approximately proportional
to WL. Thus the 𝜏 is proportional to 𝐿 and independent of W in first-order.
Based on this conclusion, we can generally divide delay unit cells into two
main categories: current starved based and shunt capacitor based. The basic idea
of shunt capacitor based is to change the 𝐶 by shunting a equivalent switchable
capacitor at the output of an inverter as shown in Figure 4.12. On the other hand,
current starved based technique is to manipulate the (dis)charging current of the
load capacitor. As shown in Figure 4.15, it is actually changing the 𝑅 by changing
the W/L ratio. However, the DTC in ADPLL mainly consumes dynamic power which
is given by [11]
𝑃=𝐶 𝑉 𝑓 (4.6)
where f is the gate toggling frequency. This equation implies that to generate a
large delay by increasing load capacitance would increase the power consumption.
Thus the current starve based delay cell is chosen for coarse stage. Because it is
not wisdom to cover the whole ckvd2 range by enlarging 𝐶 via the shunt capacitor
unit cell in terms of power. Besides current starve based structure is also featured
with wide tuning range. Regarding capacitor based delay cell, it is more proper for
implementing fine stage in terms of linearity. Benefiting from the single line Vernier
concept, this coarse-fine DTC’s resolution is totally free from low supply voltage
4.2. Mixed Signal Implementation .
73
impact and is really proper for low power low supply application. (Resolution is not
depending on gate delay and thus independent from supply level.)
𝐴
𝜎 = (4.7)
√𝑊𝐿
and
𝜎 𝐴
∧
= (4.8)
√𝑊𝐿
𝛽
∧
where 𝐴 and 𝐴 are technology constants and 𝛽 is the nominal value here.
Though this formula has deviation from real case especially in 40nm technology
node, we can still use it for a qualitative estimation during design. According to
4.4, under the assumption of variations are small and uncorrelated, then 𝜏 could be
∧
approximated by Gaussian distribution with a mean value equal to 𝜏 with standard
deviation as
𝜕𝜏 𝜕𝜏
𝜎 = √( |∧ ∧ 𝜎 ) + ( |∧ ∧ 𝜎 ) (4.9)
𝜕𝑉 , 𝜕𝛽 ,
where
𝐴
𝐴 =√ +𝐴 (4.11)
∧
(𝑉 −𝑉 −𝑉 /2)
The derivation in detail is attached behind in appendix. In first order this could be
considered as a Pelgrom’s law form except the fact that 𝑉 is proportional to L
which is not a constant. However, as L decreases in modern technology, contribution
from 𝑉 becomes trivial and even negligible in 40 nm. The important conclusion
from 4.10 is that W should be enlarged in order to reduce mismatch while L should
be keep small since 𝜎 is proportional to 𝐿√𝐿.
4 Consider a N-cell delay line and all cells are identical with 𝜏 as the delay mean
value while 𝜎 as the standard deviation. Then it is easily to draw the conclusion
that the largest standard deviation occurs at the end of the delay line and is
𝜎 = 𝜎 √𝑛 (4.12)
where n is the number of the delay cells turned on and being N (all cells are turned
on for delay) gives the largest error. According to the differential nonlinearity (DNL)
definition, it is
𝐷𝑁𝐿 = 𝜏 − 𝜏 (4.13)
𝐼𝑁𝐿 = Σ [𝜏 ] − 𝑛𝜏
(4.14)
=Σ 𝐷𝑁𝐿
Thus 𝐼𝑁𝐿 has zero mean and a standard deviation proportional to 𝜎 √𝑛. The
analysis done above is a really meaningful guideline for transistor sizing.
VDD VDD
Vctrl 4
resolution_sf
Name
resolution_sf (pass) 50.0
mu = 58.2435p
sd = 1.21292p
npass = 200
nfail = 0
40.0
30.0
20.0
4 10.0
0.0
54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0
(p)
Printed on Page 1 of 1
by nxp68517
VDD
VDD VctrlP
Current starve
transistor
VDD
1000
800
Delay
600
4 400
200
0
0 100 200 300 400 500 600
DIGITAL Control CODE
1.5
1
(LSB)
0.5
−0.5
−1
0 100 200 300 400 500 600
DIGITAL Control CODE
simulated result is shown in Figure 4.19, the integrated rms jitter is 156 fs while
the phase noise is also far below reference’s level (-140 dBc/Hz).
x 10 Integrated jitter from DTC delay chain, rms value=156 fs phase noise of DTC delay chain
−16
3.5 −150
3 −152
−154
2.5
Noise [dBc/Hz]
−156
2
jitter [s]
−158
1.5
−160
1
−162
4
0.5 −164
0 4 5 6 7 8
−166 4 5 6 7 8
10 10 10 10 10 10 10 10 10 10
Frequency [Hz] Frequency [Hz]
Figure 4.19: (1)JEE of DTC delay chain.(2) Phase noise of DTC delay chain.
The power of the whole DTC is only 20 𝜇W which is rather low. The INL would
be as large as 4 ps.
−140
−145
Noise [dBc/Hz]
−150
−155
4
−160
−165 2 3 4 5 6 7
10 10 10 10 10 10
Frequency [Hz]
In this design, the fractional phase error actually is tracked by a 2-stage DTC in-
stead of complex TDC at a good resolution and linearity with low power. Bang-Bang
Phase Detector is mainly situated for giving the feedback information. Compared
with coarse-fine TDC proposed in [12], the DTC based phase detector is really en-
joying the benefits from phase prediction.
9''ZLWKQRLVH
DTC
Fref Fref_delay
CKV CKV_delay
Supply noise
cancellation replica
Figure 4.22: Supply noise monitor concept for single delay line based DTC.
.
82 4. ADPLL Implementation in Transistor-Level
here. The high frequency variable clock CKV could be first clock gated according
to reference’s edge, as shown in Figure 4.23 (CKV has to go through dummy path
before time freezer).
CKVD2
D
Q
CK
D R
QB CKV_gated
R
Fref
4
Fref Fref_dly CKV
Then gated CKV ”ckv_gated” is fed into the replica supply noise monitor as
mentioned above, and shown in Figure 4.24. Now a supply noise monitor imple-
mented by a clock gated replica delay path with low power is achieved in a low
power way. Since only one OR gate is toggling at high CKV frequency, the power is
really low with this structure (delay line is still toggling at reference clock rate). In
a word, the potential supply noise introduced spur is suppressed for the first order.
Fref
DTC Fref_dly
Fref_dly
TDC/!!
CKVG
Time
Replica Fref_dly
CKVG
CKV_gated CKV_gated_dly Freezer
Q Q Q
clk clk clk
IN
Toggle Register
IN
Register
A T
B
D
clk
Q
Q
Q
Q
4
OUT
clk
Counter block performs the tracking of the integer phase of the variable clock by
counting the number of ckvd2 clock cycles. As discussed previously, a 7-bit integer
counter is required. The counter could be implemented either as synchronous or
asynchronous logic. In a synchronous counter, all the flip-flops are clocked by the
input clock. This may leads to a large power consumption but the outputs are
synchronized and the delay is small. Though a asynchronous counter ( it behaves
like a chain of divider-by-2) can save power, the robustness is not satisfying. It
is sure that the output of each stage is only available after the preceding stage’s
output changing with some delay. In order to compensate such a delay, long delay
chain is required for each digit output to ensure resampled clock can sample the
right output. However, the delay chain is PVT sensitive and the delay chain for the
first several high frequency bit would burn a lot of power. Once MSB is sampled at
a improper time then the PLL may loss lock.
Thus from robustness aspect, a synchronous counter is implemented as shown
in Figure 4.25. Though effort spent on making it contain standard cell only, the DFF
in 40nm can not handle frequency of 1.5 GHz under 0.8 V in post-layout simulation.
Thus custom designed TSPC DFF is used to replace the standard DFF. The TSPC
.
84 4. ADPLL Implementation in Transistor-Level
is shown in Figure 4.26. Though the core part burns more power than the asyn-
chronous one, it still wins in terms of the fact that no delay compensation needed
and the final power for the counter is 60 𝜇 W.
4
4%
FON FON
4
FON
FREFdly
CKVENB
CKVD2
CKVD2f
CKR
4
terror
However, there is always an additional phase offset introduced during the freez-
ing, which is to say the phase error between two input signal is generally enlarged
due to different logic gates they are passing in this block. This would cause problem
for phase error quantizer since once the distortion is so large that it is away from
linear detectable range then the loop could not lock at all. In [2], delay chains of
buffers are introduced for compensation. However this is really PVT sensitive which
is still risky for the PLL’s locking function. In this design, a dummy logic path as the
one faced by 𝐶𝐾𝑉𝐷2 is added for 𝐹𝑅𝐸𝐹 to generate a compensation signal as
shown in the top path in Figure 4.28.
4.3. RF Implementation
DCO is the soul of ADPLL as it offers the most important output carrier signal.
As shown in Figure 1.2, a DCO is at the heart of an all-digital phase-locked loop
(ADPLL). With a digital input called as oscillator tuning word (OTW), a sinusoidal
output with frequency proportional to the OTW could be gotten. Metrics such as
power consumption, area, and phase noise profile are largely depending on this
block. It is the horse when comparing designing of ADPLL to riding a horse, without
whom there is no meaning to talk about ”PLL” at all (actually it then becomes a
delay-locked loop (DLL)).
General design consideration would be discussed first in this section, followed
by implementation of key building blocks such as inductor, gm pair and capacitor
.
86 4. ADPLL Implementation in Transistor-Level
D D
4 R
Q
R
Q
Fref_dly_compensated
Fref_dly
CKV
D
CKR
Q
CK
D R
QB CKVG
R
Fref_dly
Window operation
D D D
Q Q Q
R R R
CKR
CK
array. Post-layout simulation results would be also shown to prove the design idea.
In this work, a DCO covering a frequency range of 2.2 to 3 GHz with a phase
noise of -110 dBc/Hz at 1 MHz offset from the oscillating frequency and a raw fre-
quency resolution of 50 kHz is targeted. The design goal is to achieve this phase
noise performance over the required tuning range with minimum power consump-
tion. The design variables are the choice of inductor and capacitor values, width
and length of the transistors.
a. Inductor
The power dissipated in the LC tank needs to be compensated so as to sustain
the oscillation, which places a lower limit on the power consumption of the DCO.
This loss power can be calculated by a simple tank model. The loss in the inductor
due to the finite resistance of the metal is represented as a series resistance 𝑅 .
The quality factor of the on-chip inductors is usually worse than that of the capacitor
banks and hence they dominate the overall energy loss of the LC tank. Hence, the
capacitor losses are neglected. The negative resistance is used to model the active
element that compensates the LC tank loss. The maximum energy stored in the
inductor and capacitor is equal and is given by
1 1
𝐸 = 𝐿𝐼 = 𝐶𝑉 (4.16)
2 2
.
88 4. ADPLL Implementation in Transistor-Level
where the 𝐼 and 𝑉 represent the peak values across the LC tank. The
power loss due to 𝑅 can be expressed as
1
𝑃 = 𝐼 𝑅
2
1 𝐶𝑉 𝑅
=
2 𝐿
(4.17)
1𝑉 𝑅
=
2 𝐿 𝜔
1𝑉
=
2 𝐿𝜔𝑄
4 Where 𝜔 is the operating frequency and 𝑄 is the quality factor of the inductor
defined as the ratio between imaginary part of the impedance over its real part.
𝜔𝐿
𝑄 = (4.18)
𝑅
According to Eq. 4.17, we can see that larger inductance and better Q is in favour
of lower power consumption. However, regarding choosing the inductance, upper
limit has to be set. Because for a certain center frequency, larger L means smaller
headroom for capacitance to be tuned which limits the tuning range of the DCO.
What’s more, a larger L means more coupling capacitance to the substrate and thus
a lower self-resonant frequency, beyond which the coils could not be considered
as inductor anymore. Regarding Q, it is more or less limited by the technology.
After estimating the amount of fixed parasitic capacitance from the capacitor banks
and active part from post-layout extraction, a value of 8 nH is chosen after some
iterations. Figure 4.29 shows the plot of the inductance as a function of frequency.
The self-resonance frequency, beyond which an inductor behaves like a capacitor,
is sufficiently far (>5.5 GHz) from the required operating frequencies (2.3–3 GHz).
Figure 4.30 shows the plot of the quality factor of the inductance with fre-
quency. The quality factor is slightly less than 19.
b. Active Part
In terms of reducing power, it seems to be a golden rule that the NMOS-
PMOS complementary pair would achieve lower power compared with single switch
structure such as NMOS-only in term of the fact that the current can get reused
by additional gm. However, this is the truth for high supply. Due to the additional
PMOS pair, voltage headroom is reduced and can not win anymore under low supply
4.3. RF Implementation .
89
−7
x 10 Inductance
1.5
1
Inductance [H]
0.5
4
−0.5
−1
0 2 4 6 8 10
Frequency [Hz] 9
x 10
18
16
14
12
Q value
10
0
1 1.5 2 2.5 3 3.5 4 4.5 5
Frequency [Hz] x 10
9
such as 0.4 V in GlobalFoundries 40 nm-LP. Besides, the output swing is also limited
by the PMOS. However, for single switch such as NMOS-only structure, the swing
could be as high as double of supply since the oscillator is DC biased at VDD. In
this ADPLL, the dynamic divider can be used directly as a buffer for the feedback
loop (In order to be connected to PA in TX path, additional AC coupled buffer is
required for sure but it does not contribute power to the PLL budget anymore).
However such a dynamic divider has a requirement for a larger input driving swing
compared with CML based divider. Considering the buffer in [2] consumed more
than 130 𝜇W finally, NMOS-only structure is adopted here for a peak-to-peak swing
at least as large as 450 mV to drive the divider directly without a DCO buffer. The
4 supply can be biased as low as 0.35 V. Thus finally much lower power is consumed
in the RF part. The implemented circuit is shown in Figure 4.31. Current source is
also removed for the lower DC bias.
c. Capacitor Array
Design of a capacitor array is rather challenging since a tuning range from 2.2
GHz to 2.8 GHz has to be covered with step size as fine as 50 kHz. This would result
in an inconceivable huge array if thermal structure is adopted. Thus segmented
structure is popular by dividing the capacitor array into 3 banks: PVT bank to cover
4.3. RF Implementation .
91
the full required tunable range with margin with most coarse step, Acquisition bank
to cover one PVT step with much more margin at medium step and Tracking bank
to cover a acquisition step with far more headroom at the required resolution. If
even finer resolution is required then ΣΔ modulator could be used.
The step size requirement could be related to the relationship as
Based on Eq. 4.19, the resolution of each bank is determined by the resolution of
the capacitor array.
Another important requirement of the capacitor banks is their quality factor.
4
The total tank quality factor is given by the mathematical parallel combination of
the inductor and capacitor quality factors. Hence, quality factor of the capacitor
banks should be made larger compared to that of the inductor so that overall Q is
limited by that of the on-chip inductance. As mentioned earlier, a Q around 20 for
the inductor is achievable. Hence, a quality factor at least as large as 50 is targeted
for the capacitor banks.
Enb
C C
En Enb
Enb Enb
4 Where the 𝐶 is mainly contributed from drain and source junction capacitors
of the NMOS switch, drain junction capacitance of the pull-down transistors, and
the parasitics from the MoM capacitor to substrate. The Q in the off-state is largely
determined by that of the MoM capacitors which is quite high.
The 5-bit control code is binary weighted and the number of unit cells controlled
by n-bit is corresponding to 2 . The on-off capacitance is simulated based on
extracted value from PVS.
Besides, quality factor is also ensured as shown in Appendix.
𝐶
△𝐶 = (4.22)
2(𝐶 + 𝐶 )
.
94 4. ADPLL Implementation in Transistor-Level
Cb Cb
Enb
Cs Cs
En Enb
Enb Enb
4
Figure 4.35: Tracking bank unit cell.
PVT 10 fF 5 bits
Tracking 14 aF 9(6MSB+3LSB)
From which we know that a smaller 𝐶 and a larger 𝐶 means the finer resolution.
Tracking bank is segmented also in terms of less area and less fixed capacitance.
The on-off capacitance is simulated based on extracted value from PVS.
Layout is shown in Figure 4.36. Capacitance, Q and cover range simulation
results are attached in Appendix A.
The capacitor bank’s performance could be summarised in Table 4.1.
d. DCO summary
The top-level DCO layout is shown in Figure 4.37. The size is 270 𝜇m * 390
𝜇m which is mainly determined by the inductor.
The power consumption is only 168 𝜇W for the lowest output frequency while
offering a peak-to-peak output swing larger than 500 mV, as shown in Figure 4.38.
From Figure 4.39 the phase noise is about -109 dBc/Hz at 1 MHz offset.
4.3. RF Implementation .
95
0.6
0.5
0.4
Time [s]
0.3
0.2
4
0.1
0 −7.01 −7
10 10
Amplitue [V]
−50
−60
−70
Noise [dBc/Hz]
−80
−90
−100
−110
−120
−130
−140 2 4 6 8 10
10 10 10 10 10
Frequency [Hz]
CKV CKV
I+ Q+
rst rst
CKV CKV
VDD VDD
nrst nrst
CKV CKV
Q-
I-
4 CKV CKV
nrst rst
edge. Instead of turning on more delay at the reference path, we can just choose
the closest variable phase according to the accumulated fractional reference phase
𝑅 , ’s relationship with 0.25, 0.5 or 0.75. Thus the required coarse DTC range is
reduced.
However, the layout should be carefully drawn as shown in Figure 4.42.
In simulation and model, this architecture consumes similar power to the pure
coarse-fine DTC structure and the mismatch from the four phase can bring serious
fractional spur issue. Thus this method is not implemented in this tape out. The
phase noise of the divider should be also kept much lower than the DCO’s output
as shown in Figure 4.43
4.3. RF Implementation .
99
TDC
coarse quantize
{
fine
{
FREF
0 1 2 3 4 Tv 4
CKV_0 CKV_3 CKV_2 CKV_1 CKV_0_next
−120
Noise [dBc/Hz]
−125
−130
−135
−140 2 4 6 8 10
10 10 10 10 10
Frequency [Hz]
References
[1] R. B. Staszewski and P. T. Balsara, All-digital frequency synthesizer in deep-
submicron CMOS (John Wiley & Sons, 2006).
[2] V. Chillara, Y.-H. Liu, B. Wang, A. Ba, M. Vidojkovic, K. Philips, H. de Groot, and
R. Staszewski, 9.8 An 860𝜇W 2.1-to-2.7GHz all-digital PLL-based frequency
modulator with a DTC-assisted snapshot TDC for WPAN (Bluetooth Smart and
ZigBee) applications, in Solid-State Circuits Conference Digest of Technical
Papers (ISSCC), 2014 IEEE International (2014) pp. 172–173.
In previous chapters, the ADPLL has been analysed in system level regarding
basic transfer function and fractional spur estimation, with the insight into
building blocks, simulation methodology and performance analysis. To forge
all these design considerations into a feasible solution (finally a working chip)
takes significant effort at the top level of the system. This chapter presents
the top-level layout completion, summarized top-level simulation results and
the final test plan.
103
.
104 5. ADPLL Top Level Completion, Simulation and Test Plan
3. 𝑉𝐷𝐷 𝑉𝐷𝐷 is the supply for all the synthesized digital blocks.
4. 𝑉𝐷𝐷 𝑉𝐷𝐷 is the supply of the variable counter to isolate its fre-
quency toggling impact from supply sensitive DTC.
5. 𝑉𝐷𝐷 𝑉𝐷𝐷 is the supply for the output buffer since power con-
tributed from output buffer is not included in this ADPLL design.
5.3.2. Power
The power of digital part is simulated based on schematic level only. Other
parts are simulated based on PVS extracted file with post-layout simulation. Th
power budget is shown in Figure 5.3 as well as Table 5.1. The total power is less
than 500 mW and from the pie diagram we can see this PLL’s power consumption
is mainly dominated by the DCO part.
DCO+divider 210
DTC+BangBang 20
Counter+time freezer 65
as long as the gain is rightly estimated and this can be observed in Figure 5.4. Since
the 2 ps resolution of DTC is at the same level as thermal noise, the potential spurs
due to quantization are already randomized into noise. The black curve is the the- 5
oretically estimated result based on TDC based ADPLL model. Here, as mentioned
before, the bandwidth of Bang-Bang based DPLL is shaped by the thermal noise and
thus the real bandwidth is expected to be with slight difference compared with TDC
based DPLL’s theoretical bandwidth (depicted in black). The integrated RMS jitter
is 600 fs now even in close integer channel. However, this could be explained: as
long as the DTC is fine enough with good linearity, the Bang-Bang phase detector
is almost working at an integer-N mode, and this is the channel which it is known
for good performance.
Counter+Time
Freezer
16% DCO+divider
Low speed 49%
Digital
30%
DTC+BangBang
5%
Based on this, the PLL with no linearity issue would generate almost the same
phase noise for each channel according to simulation.
.
108 5. ADPLL Top Level Completion, Simulation and Test Plan
−100
−120
Spectrum of the2 π*rad SSB phase noise at CKV [dBc/Hz]
−140
−160
−180
−200
−220
−240
5 10
2
10
3
10
4
10
5
10
6
The fractional spur level in the worst case with linearity issue according to
Monte Carlo post-layout simulation is also good as shown in Figure 5.5. And the
fractional spur’s level of -45 dB could be explained by the formula derived in Chapter
2 with 2 dB deviation. This deviation is mainly resulted from the fact that the real
INL is no longer ideal sinusoidal as assumed for the derivation. However, this level
of spur is already good enough for most of the wireless standards (far beyond the
BLE specification). The integrated RMS jitter is 1.2 ps around.
The fractional spur levels now in both close-integer channels and the rest chan-
nels are reduced to a satisfying level. Simulation is based on a reference input with
thermal noise level at -138 dBc/Hz.
To sum up, for channel with worst fractional spur such as FCW=38.0001, the
result is shown in Figure 5.5 with integrated RMS jitter as large as 1.2 ps. Other
channels with fractional part larger than 0.001 are randomly checked and the inte-
grated jitters are around 0.8 ps (other normal channels including the close integer
one such as FCW=38.001 are more or less the same). Result for the case with
channel FCW=38.001 is shown in Figure 5.6.
5.3. Top Level Simulation Result .
109
X: 2778
Y: −76.11
−100
Spectrum of the2 π*rad SSB phase noise at CKV [dBc/Hz]
−150
−200
−250
−300
10
2
10
3
10
4
10
5 6
10
Frequency [Hz] (FFT: len=4645142, rbw=300)
10
7
10
8
10
9 5
X: 3.111e+04
Y: −74.52
−100
Spectrum of the2 π*rad SSB phase noise at CKV [dBc/Hz]
−150
−200
−250
−300
2 3 4 5 6 7 8 9
10 10 10 10 10 10 10 10
Frequency [Hz] (FFT: len=4645251, rbw=300)
ISSCC'12 CP[1]
ISSCC'14[2]
Integrated Ji er (ps^2)
1 ISSCC'13[3] ISSCC'04[7]
This Work
5 ISSCC'11[5]
JSSC'11’bangbang[4] JSSC'04[6]
0.1
0.1 1 10 100
Power (MW)
𝜎 𝑃
𝐹𝑜𝑀 = 10 log[ ∗( )] (5.1)
1𝑠 1𝑚𝑊
In the worst case ( 1.2 ps RMS jitter together with 460 𝜇W power consumption),
the estimated FoM could be as good as -241.7 dB while in the good case (0.8 ps
integrated jitter) the FoM could be as good as -245 dB. However, certain deviation
would always happen and that is why I put a red range around the best FoM point
to represent a tolerable as well as normal deviation between simulation and mea-
surement, considering noisy environment, complex nonlineairty and so on. This is
shown in Figure 5.7.
DCO open loop test The open loop test mode of DCO is available as men-
tioned in Chapter 4.It is realized by the fact that the OTW can be read from the
external DCO control signals via SPI from FPGA. This will be operated before the
close-loop test of the ADPLL.
DTC external control The DTC’s control word can be read from the external
DTC control signals via SPI also from FPGA.
phase error observation The phase error generated from the digital phase
error detection logic can be saved to the registers and read out also via the SPI.
PCB The PCB is implemented with OrCAD and the measurement would start
from middle of December.
5
.
112 References
References
[1] Y.-H. Liu, X. Huang, M. Vidojkovic, K. Imamura, P. Harpe, G. Dolmans, and
H. De Groot, A 2.7 nJ/b multi-standard 2.3/2.4 GHz polar transmitter for wireless
sensor networks, in Solid-State Circuits Conference Digest of Technical Papers
(ISSCC), 2012 IEEE International (IEEE, 2012) pp. 448–450.
[3] J.-W. Lai, C.-H. Wang, K. Kao, A. Lin, Y.-H. Cho, L. Cho, M.-H. Hung, X.-Y. Shih,
C.-M. Lin, S.-H. Yan, Y.-H. Chung, P. Liang, G.-K. Dehng, H.-S. Li, G. Chien, and
5
R. Staszewski, A 0.27mm2 13.5dBm 2.4GHz all-digital polar transmitter using
34DPA in 40nm CMOS, in Solid-State Circuits Conference Digest of Technical
Papers (ISSCC), 2013 IEEE International (2013) pp. 342–343.
113
.
114 6. Conclusion
117
118 A. Appendix
12.5
10.0
Mag (fA)
7.5
5.0
0.0
0.0 .25 .5 .75 1.0
as
Printed on Page 1 of 1
by nxp69435
3 inv_ktdc[11] 0
4 inv_ktdc[12] 0
5 inv_ktdc[13] 1
6 inv_ktdc[14] 0
7 inv_ktdc[15] 0
03 0 DTC_coarse[0] SPI 0
1 DTC_coarse[1] 0
2 DTC_coarse[2] 0
3 DTC_coarse[3] 0
4 DTC_fine[0] SPI 0
5 DTC_finec[1] 0
6 DTC_fine[2] 0
7 DTC_fine[3] 0
04 0 FCW[0] DEFAULT 38 CLOSE IN 0
1 FCW[1] 1
2 FCW[2] 1
3 FCW[3] 0
4 FCW[4] 0
5 FCW[5] 0
6 FCW[6] 0
7 FCW[7] 0
05 0 FCW[8] 0
1 FCW[9] 0
2 FCW[10] 0
3 FCW[11] 0
4 FCW[12] 0
5 FCW[13] 0
6 FCW[14] 0
7 FCW[15] 0
06 0 FCW[16] 0
1 FCW[17] 0
2 FCW[18] 1
3 FCW[19] 0
4 FCW[20] 0
5 FCW[21] 1
A.2. SPI register table 125
6 FCW[22] 0
7
07 0 inv_kdcomod[0] 1
1 inv_kdcomod[1] 0
2 inv_kdcomod[2] 0
3 inv_kdcomod[3] 0
4 inv_kdcomod[4] 1
5 inv_kdcomod[5] 0
6 mod_on 0
7
08 0 memdcop[0] 0
1 memdcop[1] 0
2 memdcop[2] 1
3 memdcop[3] 0
4 memdcop[4] 0
5
6
7
09 0 memdcoa[0] 0
1 memdcoa[1] 1
2 memdcoa[2]] 1
3 memdcoa[3] 0
4 memdcoa[4] 0
5 memdcoa[5] 0
6
7
10 0 memdcot[0] 0
1 memdcot[1] 0
2 memdcot[2] 0
3 memdcot[3] 1
4 memdcot[4] 1
5 memdcot[5] 0
6 memdcot[6] 0
7 memdcot[7]
11 0 memdcot[8] 0
126 A. Appendix
1
2 alphaad[0] 0
3 alphaad[1] 1
4 alphaa[0] 0
5 alphaa[1] 1
6 alphap[0] 0
7 alphap[1] 1
12 0
1 rho[1] 1
2 rho[2] 1
3 rho[3] 1
4 rho[4] 0
5 alphat[0] 0
6 alphat[1] 0
7 alphat[2] 1
13 0 pvt_mode[0] 1
1 pvt_mode[1] 0
2 pvt_mode[2] 0
3 BUFF_en 1
4 iiren 0
5 lambda[0] 0
6 lambda[1] 1
7 lambda[2] 0
14 0
1
2 kdcop[0] 0
3 kdcop[1] 1
4 kdcop[2] 1
5 abmode[0] 0
6 abmode[1] 1
7 abmode[2] 0
15 0 dtc_mu[0] 1
1 dtc_mu[1] 1
2 dtc_mu[2] 0
3 dtc_mu[3] 1
A.3. Derivation for Mismatch for DTC unit cell design 127
4 dtccal 0
5 kdcoa[0] 0
6 kdcoa[1] 0
7 kdcoa[2] 1
16 0 kdcot[0] 0
1 kdcot[1] 1
2 kdcot[2] 0
3 kdcot[3] 0
4 kdcot[4] 0
5 kdcot[5] 1
6 kdcot[6] 1
7 kdcot[7] 0
17 0 Kres[0] 0
1 Kres[1] 1
2 Kres[2] 0
3 Kres[3] 0
4 Kres[4] 0
5 Kres[5] 1
6 Krest[6] 1
7 Kres[7] 0
18 0 Kres[8] 0
1 rotate_en 1
2
3
4
5
6
7
𝜕𝑅 𝜕𝑅
𝜎 = 𝑙𝑛(2)𝐶 √( |∧ ∧ 𝜎 ) + ( |∧ ∧ 𝜎 ) (A.1)
𝜕𝑉 , 𝜕𝛽 ,
𝜎 1 𝜕𝑅 𝜕𝑅
= √( |∧ ∧ 𝜎 ) + ( |∧ ∧ 𝜎 ) (A.2)
∧ 𝑅 𝜕𝑉 , 𝜕𝛽 ,
𝜏
Where |∧ ∧ is
,
∧
𝜕𝑅 𝑅
|∧ ∧ = (A.3)
𝜕𝑉 , ∧
𝑉 −𝑉 −𝑉 /2
and |∧ ∧ is
,
𝜕𝑅 𝑅
|∧ ∧ = − ∧ (A.4)
𝜕𝛽 ,
𝛽
Submitting them into Eq. A.2 we can get the Eq. 4.10.