Low Power VLSI Design: J.Ramesh ECE Department PSG College of Technology
Low Power VLSI Design: J.Ramesh ECE Department PSG College of Technology
Low Power VLSI Design: J.Ramesh ECE Department PSG College of Technology
J.Ramesh
ECE Department
PSG College of Technology
Introduction
Progress in semiconductor technology
Minimum Feature size
Consequences
- reduced device capacitances
- higher integration densities
- performance improvements
- increased circuit complexities
Power
In the past,
Area, performance, cost and reliability
In recent years,
Power is being given comparable weight
to area and speed considerations
TECHNICAL NEEDS
Packaging costs
Power supply rail design
Chip and system cooling costs
Noise immunity and system reliability
Battery life (in portable systems)
Environmental concerns
Office equipment accounted for 5% of
total US commercial energy usage in 1993
Energy Star compliant systems
Short-circuit
power
Leakage
power
Vin
Vout
CL
V DD
tpHL = f(Ron.CL)
Rp
= 0.69 RonCL
V out
V out
CL
CL
Rn
V in 5 0
V in 5 V DD
(a) Low-to-high
(b) High-to-low
dvc (t )
ic (t ) CL
dt
During the charging cycle, energy drawn
t1
Es
vi
(t ) dt
t0
Es C LV
Ecap
(t )ic (t ) dt
t0
Ecap
C LV
2
Energy dissipated at Rc
Ec Es Ecap
1
CLV 2
2
Ecap
(t )ic (t ) dt
t0
Ecap
C LV
2
Energy dissipated
at Rd
t2
1
Ed vc (t )ic (t )dt CLV 2
2
t1
Dynamic Power
Consumption
Vdd
Vin
Vout
CL
f01
Supply Voltage:
Has been dropping
with successive
generations
Activity factor:
How often, on
average, do wires
switch?
Clock frequency:
Increasing
Vin
Isc
Vout
CL
Impact of CL on Psc
Isc 0
Vin
Isc Imax
Vout
CL
Vin
Vout
CL
Ipeak as a Function of CL
x 10-4
Ipeak (A)
CL = 20 fF
CL = 100 fF
CL = 500 fF
x 10-10
time (sec)
500 psec input slope
P normalized
VDD= 3.3 V
VDD = 2.5 V
VDD = 1.5V
tsin/tsout
W/Lp = 1.125 m/0.25 m
W/Ln = 0.375 m/0.25 m
CL = 30 fF
VDD Ileakage
Vout
Drain junction
leakage
Gate leakage
Sub-threshold current
Leakage as a Function of VT
10-2
10-7
10-12
An 90mV/decade VT
roll-off - so each
255mV increase in
VT gives 3 orders of
magnitude reduction
in leakage (but
adversely affects
performance)
Leakage
Vdd
Vout
Drain Junction
Leakage
Sub-Threshold
Current
Sub-threshold current one of most compelling issues
in low-energy circuit design!
p+
p+
ReverseLeakageCurrent
+
V
dd
IDL=JSA
Occurs when Source or Drain of N transistor
is at Vdd
2
=15pA/m
fora1.2mCMOStechnology
J
S
PN junctions are formed
at S or D of transistors because of a Parasitic effect of the bulk CMOS
device structure .
o
Junction currentJat
sdoublewithevery9
the s or d of transistors isCincreaseintemperature
picked up though bulk or well contact
JS = 10-100 pA/m2 at 25 deg C for 0.25m CMOS
JS doubles for every 9 deg C!
even though the transistor is logically turned off there is a Non zero
leakage current through the channel at the microscopic level
CL018
LP
CL018
ULP
CL018
HS
CL015
HS
CL013
HS
Vdd
1.8 V
1.8 V
1.8 V
2V
1.5 V
1.2 V
Tox (effective)
42
42
42
42
29
24
Lgate
0.16 m
0.16 m
0.18 m
0.13 m 0.11 m
0.08 m
IDSat (n/p)
(A/m)
600/260
500/180
320/130
780/360
860/370
920/400
20
1.60
0.15
300
1,800
13,000
0.42 V
0.63 V
0.73 V
0.40 V
0.29 V
0.25 V
30
22
14
43
52
80
Ioff (leakage)
(A/m)
VTn
FET Perf.
(GHz)
Ileakage(nA/m)
Temp(C)
From De,1999
Istat
Vin =5V
Vout
CL
Dynamic power
(~90% today and
decreasing
relatively)
Short-circuit
power
(~8% today and
decreasing
absolutely)
Leakage power
(~2% today and
increasing)
Energy
Design Time
Variable
Throughput/Latency
Non-active Modules
Logic Design
Active
Reduced Vdd
Sizing
Run Time
DFS, DVS
Clock Gating
Multi-Vdd
(Dynamic
Freq, Voltage
Scaling)
Sleep Transistors
Leakage
+ Multi-VT
Multi-Vdd
Variable VT
+ Variable VT
W =2 2 u
L =2 u
vin
vout
W =2 2u
L =2 u
+
-
C=1pF
5.0
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
CL=1pF
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
W = 22 u
L = 2u
vin
vout
W= 2 2 u
L= 2 u
+
-
C=10pF
5.0
v (v out)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
30
35
40
45
50
Time (ns)
CL=10pF
5.0
v (v in )
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
W=2 2u
L=2 u
vin
vout
W= 22u
L= 2u
+
-
C=25pF
Voltage (V)
v (v ou t)
1.5
1.0
0.5
0.0
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
CL=25pF
Voltage (V)
5.0
v (v in )
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
POWER RESULTS
1.25 Technology
CL
POWER (Watts)
1 pF
3.16 x 10-4
10 pF
3.6 x 10-3
25 pF
1.486 x 10-2
Voltage (V)
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
1.25 micron Technology
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
Voltage (V)
v (v ou t)
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
0.18 m icron Technology
Voltage (V)
1.5
v (v in )
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
POWER RESULTS
Technology
POWER (Watts)
1.25
2.9 x 10-3
0.18
6.73 x 10-5
E 0 1 = CL Vdd V dd Vt
5.0
v (v o u t)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
15
20
25
30
35
40
45
50
Time (ns)
SV=11110000
5.0
v (v i n )
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
Voltage (V)
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
15
20
25
30
35
40
45
50
Time (ns)
SV=10101010
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
Voltage (V)
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
15
20
25
30
35
40
45
50
Time (ns)
SV=11101011
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
Bit Stream
Number of Switching
Transitions
1111100000
9.5 x 10-5
1010101010
2.8 x 10-4
1110000000
1.44 x 10-4
NAND GATE
clear all;
close all;
clc;
P=inline('(1-pa*pb)*(pa*pb)');
figure(1);
title('NAND GATE');
ezsurfc(P,[0 ,1.0, 0, 1.0]);
view(40,60);
NOR GATE
clc;
clear all;
P=inline('(1-pa-pb+(pa*pb))*(pa+pb-(pa*pb))');
figure(3);
title('NOR GATE');
ezsurf(P,[0 ,1.0, 0, 1.0]);
view(30,30);
XOR GATE
clc;
clear all;
P=inline('(pa+pb-2*(pa*pb))*(1-pa-pb+2*(pa*pb))');
figure(5);
title('XOR GATE');
ezsurfc(P,[0 ,1.0, 0, 1.0]);
NOT GATE
clc;
clear all;
P=inline('(1-pa)*pa');
figure(7);
title('NOT GATE');
ezsurfc(P,[0 ,1.0]);
COMPARISION TABLE
GATE
EXPRESSION
P0->1 FOR
Pa=Pb=0.5
NAND / AND
PaPb(1-(PaPb))
0.1875
NOR / OR
(Pa+Pb-PaPb)(1-Pa-Pb+PaPb)
0.1875
XNOR / XOR
(1-Pa-Pb+2PaPb)(Pa+Pb-2PaPb)
0.25
NOT
Pa(1-Pa)
0.25
x1
x2
x3
x4
(a)Chain Structure
x1
x2
F
x3
x4
(b)Tree Structure
Chain
3/16
7/64
15/256
Tree
3/16
3/16
15/256
bin
W=22u
cin
W=22u
L=2u
L=2u
out
W=22u
cin
L=2u
L=2u
W=22u
bin
AND
xin
INVERTER
W=22u
ain
L=2u
L=2u
W=22u
W=22u
L=2u
W=22u
xin
L=2u
out
in
W=22u
L=2u
W=22u
ain
L=2u
out
FUNCTION F=A(B+C)
bin
inverter
B+C
inverter
and
ain
cin
A-INPUT
5.0
B-INPUT
v (ain)
5.0
4.5
4.0
4.0
3.5
3.5
3.0
Voltage (V)
Volta ge (V)
v (bin)
4.5
2.5
2.0
1.5
1.0
0.5
3.0
2.5
2.0
1.5
1.0
0.0
0
10
20
30
40
50
60
70
80
90
100
0.5
Time (ns)
0.0
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
C-INPUT
5.0
OUTPUT
v(cin)
4.5
4.5
4.0
V o lta ge (V )
4.0
3.5
Voltage (V)
v(out)
5.0
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
0.5
1.0
0.0
0.5
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
10
20
30
40
50
60
Time (ns)
70
80
90
100
out
A-INPUT
5.0
5.0
v(ain)
4.5
4.0
4.0
3.5
3.5
3.0
3.0
Voltage (V)
Voltage (V)
v(bin)
4.5
2.5
2.0
1.5
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0
10
20
30
40
50
60
70
80
90
0.0
100
10
20
30
40
Time (ns)
C-INPUT
60
70
80
90
100
OUTPUT
5.0
v(cin)
v (out)
5.0
4.5
4.5
4.0
4.0
Vo lta ge (V)
3.5
Voltage (V)
50
Time (ns)
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
10
20
30
40
50
60
Time (ns)
70
80
90
100
TREE IMPLEMENTATION OF
F=A(B+C)
A NAND B
ain
A
NAND C
W=22u
bin
L=2u
ain
L=2u
W=22u
cin
L=2u
L=2u
W=22u
W=22u
out
out
W=22u
ain
L=2u
W=22u
ain
W=22u
bin
L=2u
L=2u
W=22u
cin
L=2u
INVERTER
NOR
YIN
W=22u
L=2u
W=22u
L=2u
XIN
W=22u
L=2u
OUT
W=22u
XIN
L=2u
L=2u
W=22u
YIN
XIN
W=22u
L=2u
XOUT
TREE F=A(B+C)
bin
ain
A NAND B
I NV E R T E R
N OR G A T E
ain
cin
A NAND C
I NV E R T E R
INVERTER
out
B-INPUT
v (ain)
5.0
4.5
4.0
4.0
V o lta g e (V )
Vo lta ge (V)
v (bin)
4.5
3.5
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0
10
20
30
40
50
60
70
80
90
0.0
100
10
20
30
Time (ns)
40
50
70
80
90
100
OUTPUT
C-INPUT
5.0
v(cin)
4.5
4.0
4.0
Voltage (V)
3.0
2.5
2.0
1.5
v(out)
5.0
4.5
3.5
Voltage (V)
60
Time (ns)
3.5
3.0
2.5
2.0
1.5
1.0
0.5
1.0
0.0
0.5
-0.5
0
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
10
20
30
40
50
60
Time (ns)
70
80
90
100
B-INPUT
v (ain)
5.0
4.5
4.0
4.0
Volta ge (V)
3.5
V o lta ge (V )
v (bin)
4.5
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0.0
0
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
C-INPUT
70
80
90
100
OUTPUT
5.0
v (cin)
v (out)
5.0
4.5
4.5
4.0
4.0
3.5
3.5
Voltage (V)
V o lta ge (V )
60
Time (ns)
Time (ns)
3.0
2.5
2.0
1.5
1.0
3.0
2.5
2.0
1.5
1.0
0.5
0.5
0.0
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
-0.5
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
I/P PROBABILITY
CKT IMPLEMENTATION
0.5 PROBABILITY
UNEQUAL
PROBABILITY
CHAIN IMPLEMENTATION
TREE IMPLEMENTATION
POWER ANALYSIS
CIRCUIT LEVEL
GATE LEVEL
ARCHITECTURAL LEVEL
Circuit Level
Transistor and Gate Sizing
Network Restructuring
Special Latches and Flip Flops
W=20
W=20
L=1
L=0.5
W=10
W=10
W=10
L=1
L=2
L=1
L=1
Wp/Lp = 20/1
Wn/Ln = 10/1
Trise = 1
Tfall = 1
Pleakage = 1
Wp/Lp = 20/1
Wn/Ln = 10/2
Trise = 1
Tfall = 2
Pleakage = 0.1
Wp/Lp = 20/0.5
Wp/Lp = 10/1
Trise = 0.5
Tfall = 1
Pleakage = 1
Network Restructuring
Four different circuit implementation of Y = A ( B +
C)
VDD
VDD
A
B
(a)
(b)
VDD
VDD
Y=A(B+C)
Y=A(B+C)
(c)
(d)
Logic Level
Signal Gating
Logic Encoding
Precomputation Logic
GATE REORGANIZATION
Network reorganization is applied to
the gate level network to produce
logically equivalent networks with
different qualities of Power, Area and
Delay.
Logic Reconstruction Techniques.
LOCAL RECONSTRUCTION
Transform one logic circuit to another
that is functionally equivalent.
Local reconstruction rules.
Gate reorganization applies series of
local transformations.
Best among the many generated and
evaluated circuit is chosen.
TRANSFORMATION
OPERATORS
TRANSFORMATION
OPERATORS
COMBINE -> hide high frequency nodes
inside the cell so that node capacitance
is not being switched.
DECOMPOSE and DUPLICATE ->
separate critical path from non critical
path.
DELETE -> reduces circuit size.
ADD -> provide intermediate circuit that
might eventually yield to better one.
SIGNAL GATING
Mask unwanted switching activities.
Methods
AND or OR gates.
Latch or Flip flops.
Transmission gate or Tristate buffer.
Signal Gating
Latch/
FF
(c) Latch / FF
Gray Code
Sequence
No. Toggles
Sequence
No. Toggles
000
000
001
001
010
011
011
010
100
110
101
111
110
101
111
100
Toggle Activities of
Binary Vs Gray Code Counter
No. of bits
No. of Toggles
Binary
Bn = 2 (2n-1)
Bn / Gn
Gray
Gn = 2n
1.5
14
1.75
30
16
1.88
62
32
1.94
126
64
1.99
Data Representation
2s Complement
Sign Magnitude
Sign Extension
(MSB Sign Bits
Switch for positive to
Negative Transitions)
LOGIC ENCODING
Num. bits
Regular bus
E[P]
Invert bus
E[Q]
Invert /
Regular
E[Q]/E[P]
0.75
0.75
1.56
0.781
3.27
0.817
16
6.83
0.854
32
16
14.19
0.886
64
32
29.27
0.915
128
64
59.96
0.937
256
128
122.1
0.954
1.00
0.
1
0.
1
01
11
0.
3
0.
1
0.
3
0.
4
01
00
0.
1
0.
4
11
00
0.
1
0.
1
M1
M2
Precomputation Logic
Precomputed
R1
Input
Combinational Logic
f(x)
Gated
R2
Inputs
Load
g(x)
Precomputed
Logics
Disable
Outputs
R1
n- bit comparator
An-1
Bn-1
A>B
R2
A1
B1
A n=/= B n
Load Disable
A>B
PRE COMPUTATION
CONDITION
DESIGN ISSUES IN
PRECOMPUTATION LOGIC
Select pre computation architecture.
Determine the pre computed inputs R1
and gated inputs R2 given the function
f(x).
With R1 and R2 selected, find pre
computation logic g(x).
Evaluate the probability of the pre
computation condition and the
potential power savings.
BUS A
BLOCK A
BUS B
BLOCK B
BUS C
PBB =
n n En
n R
Operation Reduction
Pipelining and Parallelism
Retiming
Unfolding and Folding
Power and Performance Management
Register
Register
Frequency = f
Datapath 1
DeMUX
MUX
Datapath 2
Cap =2.2C,
Voltage =0.6V ,
Frequency = 0.5f
Register
Register
B(1)
DeMUX
MUX
B(2)
( X2 + XA )
X
X2 + XA + B
+
X
XA
+
( X +A)
X2 + XA + B
( X +A) X
X
B
X2A
X3+X2A+XB+C
X +X A
3
X
X2
X3
XB+C
X
+
XB
X
*
X+A
A
*
(X+A)X
X2+AX+B
+
X3+X2A+BX
X3+X2A+BX+C
Power(watt)
Binary code
Chebychev distance
Hamming distance
Gray Code
Average
7.186733e-004
1.602692e-003
1.602683e-003
7.185032e-004
maximum
2.710073e-002
2.319953e-002
1.837399e-002
1.837406e-002
minimum
5.911938e-007
7.837975e-007
9.031063e-008
8.799572e-008
Power
Hamming
distance
Gray code
Average
6.983956e-006
6.985183e-006
6.662661e-006
6.506795e-006
Maximum
4.742954e-002
4.742954e-002
1.678512e-002
1.689332e-002
Minimum
6.399358e-008
2.674323e-007
2.203997e-008
2.203997e-008
RETIMING
Retiming is a mapping from a given DFG, G to a retimed DFT, Gr
such that the corresponding transfer function of G and Gr differ by a pure delay
z-L.
Purposes
To reduce clock cycle time
To reduce number of registers needed.
To reduce the power consumed by the circuit.
Properties of Retiming
Retiming does NOT change the total number of delays for each cycle.
Retiming does not change loop bound or iteration bound of the DFG
If the retiming values of every node v in a DFG G are added to a
constant integer j, the retimed graph Gr will not be affected. That is, the
weights (# of delays) of the retimed graph will remain the same.
Unfolding
It is a transformation technique that can be applied to a DSP Program to create
a new program describing more than one iteration of the original program.
Bit parallel adder designed by unfolding the bit serial adder using J = 4
Digit serial adder designed by unfolding the bit serial adder using J = 2
FOLDING
Clock Cycle
Adder input
( left )
Adder Input
( top )
System Output
a(0)
b(0)
a(0) + b(0)
c(0)
a(1)
b(1)
a(1) + b(1)
c(1)
a(2)
b(2)
a(2) + b(2)
c(2)
RESULTS
Estimation and
Optimization
Analysis precedes optimization
Accurate analysis techniques must be developed so that they can serve as
proper estimation functions for optimization tools
Optimization precedes Synthesis
A strong foundation in optimization is required before synthesis can proceed
to the next level
Transistor level
Power Estimation at the transistor level can be done by
computing the current flow
Transistor Level
SPICE is widely accepted reference
Epics Power Mill
Circuit-Level Power
Optimization
Transistor Sizing :
Adjusting the size of the each gate or
transistor for minimum power
Voltage Scaling :
Lower supply voltages use less power,
but go
Slower
Voltage islands :
Different blocks can be run at different
voltages, saving power
Level Shifters are required
Variable VDD:
The voltage for a single block can be
varied during operation (high & low)
Multiple threshold voltages :
Modern processes can build
transistors with different thesholds
Power can be saved by using a
mixture of CMOS transistors with two
or more different threshold voltages.
Power Gating :
Uses high Vt sleep transistors which cutoff a circuit block when the block is not
swtiching
Also known as MTCMOS
Long Channel Transistors :
Transistors of more than minimum length
leak less, but are bigger and slower
Stacking and Parking states :
Logic gates may leak differently during
logically equivalent input states
State machines may have less leakage in
certain states
Logic Styles :
Dynamic and static logic , for
example, have different speed/power
tradeoffs