Low Power VLSI Design: J.Ramesh ECE Department PSG College of Technology

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 146

Low Power VLSI Design

J.Ramesh
ECE Department
PSG College of Technology

Introduction
Progress in semiconductor technology
Minimum Feature size
Consequences
- reduced device capacitances
- higher integration densities
- performance improvements
- increased circuit complexities

Power
In the past,
Area, performance, cost and reliability

In recent years,
Power is being given comparable weight
to area and speed considerations

Power and Energy Figures of Merit


Power consumption in Watts
determines battery life in hours
Peak power
determines power ground wiring
designs
sets packaging limits
impacts signal noise margin and
reliability analysis

Energy efficiency in Joules


rate at which power is consumed over
time
Energy = power * delay
Joules = Watts * seconds
lower energy number means less power
to perform a computation at the same
frequency

Low Power VLSI Design


Art of power analysis and
optimization

INTEREST IN LOW POWER CHIPS


AND SYSTEMS
BUSINESS NEEDS
- Growing class of personal computing
devices
as
well
as
wireless
communications and imaging systems
which demand high-speed computations,
complex functionalities and often real-time
processing capabilities with low power
consumption.

TECHNICAL NEEDS

- excessive power consumption is


becoming
limiting
factor
in
integrating more transistors on a
single chip or on a Multichip module
- Unless power consumption is
reduced, the resulting heat will limit
the
feasible
packing
and
performance of VLSI circuits and
systems

Need for Low Power VLSI Chips


Evolution forces of Integrated Circuits
Increased Market Demand
High Performance Computing Systems
Environmental concerns

Why Power Matters

Packaging costs
Power supply rail design
Chip and system cooling costs
Noise immunity and system reliability
Battery life (in portable systems)
Environmental concerns
Office equipment accounted for 5% of
total US commercial energy usage in 1993
Energy Star compliant systems

CMOS Energy & Power Equations


E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD
Ileakage
f01 = P01 * fclock

P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage


Dynamic
power

Short-circuit
power

Leakage
power

Dynamic Power Dissipation


Vdd

Vin

Vout
CL

Energy / Transition C L Vdd2


Power (Energy / Transition)Frequency C L V f
2
dd

CMOS Inverter: Transient


Response
V DD

V DD

tpHL = f(Ron.CL)

Rp

= 0.69 RonCL
V out

V out

CL

CL
Rn

V in 5 0

V in 5 V DD

(a) Low-to-high

(b) High-to-low

Dynamic power Charging and


Discharging of a Capacitance
According to the Laws of Physics

dvc (t )
ic (t ) CL
dt
During the charging cycle, energy drawn
t1

Es

vi

(t ) dt

t0

Es C LV

Dynamic power contd.


Energy stored in the capacitor
t1

Ecap

(t )ic (t ) dt

t0

Ecap

C LV
2

Energy dissipated at Rc
Ec Es Ecap

1
CLV 2
2

Dynamic power contd.


during discharge cycle, Energy dissipated in Rd
t1

Ecap

(t )ic (t ) dt

t0

Ecap

C LV
2

Energy dissipated
at Rd
t2

1
Ed vc (t )ic (t )dt CLV 2
2
t1

Dynamic Power
Consumption
Vdd
Vin

Vout
CL

Energy/transition = CL * VDD * P01


2

f01

Pdyn = Energy/transition * f = CL * VDD2 * P01 * f


Pdyn = CEFF * VDD2 * f where CEFF = P01 CL
Not a function of transistor sizes!
Data dependent - a function of switching activity!

Lowering Dynamic Power


Capacitance:
Function of fan-out,
wire length,
transistor sizes

Supply Voltage:
Has been dropping
with successive
generations

Pdyn = CL VDD2 P01 f

Activity factor:
How often, on
average, do wires
switch?

Clock frequency:
Increasing

Short Circuit Power


Consumption

Vin

Isc

Vout
CL

Finite slope of the input signal causes a direct current path


between VDD and GND for a short period of time during
switching when both the NMOS and PMOS transistors are
conducting.

Short Circuit Currents


Determinates
Esc = tsc VDD Ipeak P01
Psc = tsc VDD Ipeak f01
Duration and slope of the input signal, t sc
Ipeak determined by
the saturation current of the P and N
transistors which depend on their sizes,
process technology, temperature, etc.
strong function of the ratio between input
and output slopes
a function of CL

Impact of CL on Psc
Isc 0
Vin

Isc Imax
Vout

CL

Vin

Vout
CL

Large capacitive load

Small capacitive load

Output fall time significantly


larger than input rise time.

Output fall time substantially


smaller than the input rise
time.

Ipeak as a Function of CL
x 10-4

When load capacitance


is small, Ipeak is large.

Ipeak (A)

CL = 20 fF

CL = 100 fF
CL = 500 fF
x 10-10

time (sec)
500 psec input slope

Short circuit dissipation


is minimized by
matching the rise/fall
times of the input and
output signals - slope
engineering.

Psc as a Function of Rise/Fall


Times
When load capacitance
is small (tsin/tsout > 2 for
VDD > 2V) the power is
dominated by Psc

P normalized

VDD= 3.3 V

VDD = 2.5 V

VDD = 1.5V

If VDD < VTn + |VTp| then


Psc is eliminated since
both devices are never
on at the same time.

tsin/tsout
W/Lp = 1.125 m/0.25 m
W/Ln = 0.375 m/0.25 m
CL = 30 fF

normalized wrt zero input


rise-time dissipation

Leakage (Static) Power


Consumption

VDD Ileakage

Vout

Drain junction
leakage
Gate leakage

Sub-threshold current

Sub-threshold current is the dominant factor.


All increase exponentially with temperature!

Leakage as a Function of VT

Continued scaling of supply voltage and the subsequent


scaling of threshold voltage will make subthreshold
conduction a dominate component of power dissipation.

10-2

10-7

10-12

An 90mV/decade VT
roll-off - so each
255mV increase in
VT gives 3 orders of
magnitude reduction
in leakage (but
adversely affects
performance)

Leakage

Vdd

Vout

Drain Junction
Leakage
Sub-Threshold
Current
Sub-threshold current one of most compelling issues
in low-energy circuit design!

Sub-Threshold Current Dominant Factor

Reverse-Biased Diode Leakage


GATE

p+

p+

ReverseLeakageCurrent
+

V
dd

IDL=JSA
Occurs when Source or Drain of N transistor
is at Vdd
2
=15pA/m
fora1.2mCMOStechnology
J
S
PN junctions are formed
at S or D of transistors because of a Parasitic effect of the bulk CMOS
device structure .
o
Junction currentJat
sdoublewithevery9
the s or d of transistors isCincreaseintemperature
picked up though bulk or well contact
JS = 10-100 pA/m2 at 25 deg C for 0.25m CMOS
JS doubles for every 9 deg C!

Subthreshold Leakage Component

even though the transistor is logically turned off there is a Non zero
leakage current through the channel at the microscopic level

TSMC Processes Leakage and


VT
CL018
G

CL018
LP

CL018
ULP

CL018
HS

CL015
HS

CL013
HS

Vdd

1.8 V

1.8 V

1.8 V

2V

1.5 V

1.2 V

Tox (effective)

42

42

42

42

29

24

Lgate

0.16 m

0.16 m

0.18 m

0.13 m 0.11 m

0.08 m

IDSat (n/p)
(A/m)

600/260

500/180

320/130

780/360

860/370

920/400

20

1.60

0.15

300

1,800

13,000

0.42 V

0.63 V

0.73 V

0.40 V

0.29 V

0.25 V

30

22

14

43

52

80

Ioff (leakage)
(A/m)
VTn
FET Perf.
(GHz)

Ileakage(nA/m)

Exponential Increase in Leakage Currents

Temp(C)
From De,1999

Static Power Consumption


Vdd

Istat

Vin =5V

Vout

CL

Pstat = P(In=1) .Vdd . Istat


Dominates
Wasted
energy over dynamic consumption
Should
beaavoided
in almost
all cases,frequency
Not
function
of switching

Review: Energy & Power Equations


E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage

f01 = P01 * fclock

P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage

Dynamic power
(~90% today and
decreasing
relatively)

Short-circuit
power
(~8% today and
decreasing
absolutely)

Leakage power
(~2% today and
increasing)

Power and Energy Design


Space
Constant
Throughput/Latency

Energy

Design Time

Variable
Throughput/Latency

Non-active Modules

Logic Design
Active

Reduced Vdd
Sizing

Run Time
DFS, DVS

Clock Gating

Multi-Vdd

(Dynamic
Freq, Voltage
Scaling)

Sleep Transistors
Leakage

+ Multi-VT

Multi-Vdd
Variable VT

+ Variable VT

Approaches for Low Power

Reducing Chip and Package Capacitance


- achieved through process development such as SOI
- Closer packing of P and N transistors
- Lower Parasitic substrate Capacitance
- Achieved through advanced interconnect structures

Scaling down the supply voltage


- Power Consumption is quadratically dependent on supply
voltage
- Supporting circuits can be employed

Employing better power Management techniques


- Careful Management of performance and throughput of the
system based on its computational needs

Employing better design techniques

CMOS INVERTER with CL=1 pF

W =2 2 u
L =2 u

vin

vout

W =2 2u
L =2 u

+
-

C=1pF

Simulation output of Inverter with CL = 1 pF


CL=1pF
v (v out)

5.0

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

20

30

40

50

60

70

80

90

100

Time (ns)
CL=1pF

5.0

v (v i n)

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

20

30

40

50
Time (ns)

60

70

80

90

100

INVERTER with CL=10 pF

W = 22 u
L = 2u

vin

vout

W= 2 2 u
L= 2 u

+
-

C=10pF

Simulation output of Inverter with CL = 10 pF


CL=10pF

5.0

v (v out)

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

15

20

25

30

35

40

45

50

Time (ns)
CL=10pF

5.0

v (v in )

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

15

20

25
Time (ns)

30

35

40

45

50

CMOS INVERTER with CL=25 pF

W=2 2u
L=2 u

vin

vout

W= 22u
L= 2u

+
-

C=25pF

Simulation output of Inverter with CL = 25 pF


CL=25pF

Voltage (V)

v (v ou t)

1.5

1.0

0.5

0.0
0

10

20

30

40

50

60

70

80

90

100

Time (ns)
CL=25pF

Voltage (V)

5.0

v (v in )

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

20

30

40

50
Time (ns)

60

70

80

90

100

POWER RESULTS

1.25 Technology
CL

POWER (Watts)

1 pF

3.16 x 10-4

10 pF

3.6 x 10-3

25 pF

1.486 x 10-2

Simulation output of Inverter (1.25 Technology)


1.25 micron Technology
v (v o ut)

Voltage (V)

5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0

10

20

30

40

50

60

70

80

90

100

Time (ns)
1.25 micron Technology

5.0

v (v i n)

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

20

30

40

50
Time (ns)

60

70

80

90

100

Simulation output of Inverter (0.18 Technology)


0.18 m icron Technology

Voltage (V)

v (v ou t)

2.0
1.5
1.0
0.5
0.0
-0.5
0

10

20

30

40

50

60

70

80

90

100

Time (ns)
0.18 m icron Technology

Voltage (V)

1.5

v (v in )

1.0

0.5

0.0
0

10

20

30

40

50
Time (ns)

60

70

80

90

100

POWER RESULTS

Technology

POWER (Watts)

1.25

2.9 x 10-3

0.18

6.73 x 10-5

Modification for Circuits with Reduced Swing


Vdd
Vdd
VddVt
CL

E 0 1 = CL Vdd V dd Vt

Can exploit reduced swing to lower power


(e.g., reduced bit-line swing in memory)

Simulation output of Inverter with input 1111100000


SV=11110000

5.0

v (v o u t)

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0

10

15

20

25

30

35

40

45

50

Time (ns)
SV=11110000

5.0

v (v i n )

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

15

20

25
Time (ns)

30

35

40

45

50

Simulation output of Inverter with input 1010101010


SV=10101010
v (v ou t)

Voltage (V)

5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0

10

15

20

25

30

35

40

45

50

Time (ns)
SV=10101010

5.0

v (v i n)

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

15

20

25
Time (ns)

30

35

40

45

50

Simulation output of Inverter with input 1111100000


SV=11101011
v (v ou t)

Voltage (V)

5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0

10

15

20

25

30

35

40

45

50

Time (ns)
SV=11101011

5.0

v (v i n)

Voltage (V)

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0

10

15

20

25
Time (ns)

30

35

40

45

50

Bit Stream

Number of Switching
Transitions

Average Power (Watts)

1111100000

9.5 x 10-5

1010101010

2.8 x 10-4

1110000000

1.44 x 10-4

NAND GATE
clear all;
close all;
clc;
P=inline('(1-pa*pb)*(pa*pb)');
figure(1);
title('NAND GATE');
ezsurfc(P,[0 ,1.0, 0, 1.0]);
view(40,60);

NOR GATE
clc;
clear all;
P=inline('(1-pa-pb+(pa*pb))*(pa+pb-(pa*pb))');
figure(3);
title('NOR GATE');
ezsurf(P,[0 ,1.0, 0, 1.0]);
view(30,30);

XOR GATE
clc;
clear all;
P=inline('(pa+pb-2*(pa*pb))*(1-pa-pb+2*(pa*pb))');
figure(5);
title('XOR GATE');
ezsurfc(P,[0 ,1.0, 0, 1.0]);

NOT GATE
clc;
clear all;
P=inline('(1-pa)*pa');
figure(7);
title('NOT GATE');
ezsurfc(P,[0 ,1.0]);

COMPARISION TABLE
GATE

EXPRESSION

P0->1 FOR
Pa=Pb=0.5

NAND / AND

PaPb(1-(PaPb))

0.1875

NOR / OR

(Pa+Pb-PaPb)(1-Pa-Pb+PaPb)

0.1875

XNOR / XOR

(1-Pa-Pb+2PaPb)(Pa+Pb-2PaPb)

0.25

NOT

Pa(1-Pa)

0.25

SWITCHING POWER DISSIPATION


D

x1

x2

x3

x4

(a)Chain Structure
x1

x2
F
x3
x4

(b)Tree Structure

Four-input AND gate built using two-input AND gates


(a) Chain Structure (b) Tree Structure

Probabilities for Tree and Chain Topologies

Chain

3/16

7/64

15/256

Tree

3/16

3/16

15/256

Implementing the function F=A(B+C) by using TREE and CHAIN


CHAIN IMPLEMENTATION
SCHEMATIC DIAGRAMS:B+C

bin

W=22u

cin

W=22u

L=2u

L=2u

out
W=22u

cin

L=2u

L=2u
W=22u

bin

AND

xin

INVERTER

W=22u

ain

L=2u

L=2u

W=22u

W=22u

L=2u

W=22u

xin

L=2u

out

in

W=22u
L=2u

W=22u

ain

L=2u

out

FUNCTION F=A(B+C)

bin

inverter

B+C

inverter

and

ain

cin

WAVEFORM 0.5 PROBABILITIY

A-INPUT
5.0

B-INPUT

v (ain)

5.0

4.5
4.0

4.0

3.5

3.5

3.0

Voltage (V)

Volta ge (V)

v (bin)

4.5

2.5
2.0
1.5
1.0
0.5

3.0
2.5
2.0
1.5
1.0

0.0
0

10

20

30

40

50

60

70

80

90

100

0.5

Time (ns)

0.0
0

10

20

30

40

50

60

70

80

90

100

Time (ns)

C-INPUT
5.0

OUTPUT

v(cin)

4.5

4.5

4.0

V o lta ge (V )

4.0

3.5

Voltage (V)

v(out)

5.0

3.0
2.5
2.0
1.5

3.5
3.0
2.5
2.0
1.5
1.0
0.5

1.0

0.0

0.5

0.0
0

10

20

30

40

50

60

Time (ns)

70

80

90

100

10

20

30

40

50

60

Time (ns)

70

80

90

100

out

FUNCTION F=A(B+C) : (UNEQUAL PROBABILITY)


WAVEFORM FOR UNEQUAL PROBABILITY
B-INPUT

A-INPUT
5.0

5.0

v(ain)

4.5
4.0

4.0

3.5

3.5

3.0

3.0

Voltage (V)

Voltage (V)

v(bin)

4.5

2.5
2.0
1.5

2.5
2.0
1.5

1.0

1.0

0.5

0.5

0.0
0

10

20

30

40

50

60

70

80

90

0.0

100

10

20

30

40

Time (ns)

C-INPUT

60

70

80

90

100

OUTPUT

5.0

v(cin)

v (out)

5.0

4.5

4.5

4.0

4.0

Vo lta ge (V)

3.5

Voltage (V)

50

Time (ns)

3.0
2.5
2.0
1.5

3.5
3.0
2.5
2.0
1.5

1.0

1.0

0.5

0.5
0.0

0.0
0

10

20

30

40

50

60

Time (ns)

70

80

90

100

10

20

30

40

50

60

Time (ns)

70

80

90

100

TREE IMPLEMENTATION OF
F=A(B+C)

A NAND B

ain

A
NAND C

W=22u

bin

L=2u

ain

L=2u

W=22u

cin

L=2u

L=2u

W=22u

W=22u

out

out

W=22u

ain

L=2u

W=22u

ain

W=22u

bin

L=2u

L=2u

W=22u

cin

L=2u

INVERTER

NOR

YIN

W=22u
L=2u

W=22u
L=2u

XIN

W=22u
L=2u

OUT
W=22u

XIN

L=2u

L=2u
W=22u

YIN

XIN

W=22u
L=2u

XOUT

TREE F=A(B+C)

bin
ain

A NAND B

I NV E R T E R
N OR G A T E

ain
cin

A NAND C

I NV E R T E R

INVERTER

out

WAVEFORM 0.5 PROBABILITIY


A-INPUT
5.0

B-INPUT

v (ain)

5.0

4.5
4.0

4.0

V o lta g e (V )

Vo lta ge (V)

v (bin)

4.5

3.5
3.0
2.5
2.0
1.5

3.5
3.0
2.5
2.0
1.5

1.0

1.0

0.5

0.5

0.0
0

10

20

30

40

50

60

70

80

90

0.0

100

10

20

30

Time (ns)

40

50

70

80

90

100

OUTPUT

C-INPUT
5.0

v(cin)

4.5

4.0

4.0

Voltage (V)

3.0
2.5
2.0
1.5

v(out)

5.0

4.5

3.5

Voltage (V)

60

Time (ns)

3.5
3.0
2.5
2.0
1.5
1.0
0.5

1.0

0.0

0.5

-0.5
0

0.0
0

10

20

30

40

50

60

Time (ns)

70

80

90

100

10

20

30

40

50

60

Time (ns)

70

80

90

100

WAVEFORM FOR UNEQUAL PROBABILITY


A-INPUT
5.0

B-INPUT

v (ain)

5.0

4.5
4.0

4.0

Volta ge (V)

3.5

V o lta ge (V )

v (bin)

4.5

3.0
2.5
2.0
1.5

3.5
3.0
2.5
2.0
1.5

1.0

1.0

0.5

0.5
0.0

0.0
0

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

C-INPUT

70

80

90

100

OUTPUT

5.0

v (cin)

v (out)

5.0

4.5

4.5

4.0

4.0

3.5

3.5

Voltage (V)

V o lta ge (V )

60

Time (ns)

Time (ns)

3.0
2.5
2.0
1.5
1.0

3.0
2.5
2.0
1.5
1.0
0.5

0.5

0.0

0.0
0

10

20

30

40

50

60

Time (ns)

70

80

90

100

-0.5
0

10

20

30

40

50

60

Time (ns)

70

80

90

100

COMPARATION OF POWER FOR VARIOUES CONFIGURATION


FUNCTION F=A(B+C)

I/P PROBABILITY
CKT IMPLEMENTATION

0.5 PROBABILITY

UNEQUAL
PROBABILITY

CHAIN IMPLEMENTATION

1.245 e-003 watts

3.321 e-004 watts

TREE IMPLEMENTATION

1.913 e-003 watts

5.282 e-004 watts

POWER ANALYSIS
CIRCUIT LEVEL
GATE LEVEL
ARCHITECTURAL LEVEL

REDUCING THE POWER DISSIPATION AT


THE DEVICE AND CIRCUIT LEVELS.

AT THE DEVICE LEVEL,


Silicon on insulator (SOI) technology.
Place and route optimization.
Transistor sizing.
Using submicron devices.
Reducing the sub threshold voltages.

AT THE CIRCUIT AND LOGIC LEVELS,


Reduce gate capacitance.
Reduced logic swing.
Low power support circuitry.
Logic level power down.
Multi threshold circuit technology.
Scaled multi buffer stages.

Circuit Level
Transistor and Gate Sizing
Network Restructuring
Special Latches and Flip Flops

Transistor sizing for leakage power reduction and


speed increase
W=20

W=20

W=20
L=1

L=0.5

W=10

W=10

W=10

L=1

L=2

L=1

L=1

Wp/Lp = 20/1
Wn/Ln = 10/1
Trise = 1
Tfall = 1
Pleakage = 1

Wp/Lp = 20/1
Wn/Ln = 10/2
Trise = 1
Tfall = 2
Pleakage = 0.1

Wp/Lp = 20/0.5
Wp/Lp = 10/1
Trise = 0.5
Tfall = 1
Pleakage = 1

For all cases static probability is 0.99

Network Restructuring
Four different circuit implementation of Y = A ( B +
C)
VDD
VDD

A
B

(a)

(b)

VDD

VDD

Y=A(B+C)

Y=A(B+C)

(c)

(d)

Logic Level

Signal Gating
Logic Encoding
Precomputation Logic

POWER REDUCTION AT LOGIC


LEVEL
Logic level power optimization
techniques.
Reduction of switching activities.
Logic Encoding
Data representation
Boolean function implementation
Elimination of stray switching activities

GATE REORGANIZATION
Network reorganization is applied to
the gate level network to produce
logically equivalent networks with
different qualities of Power, Area and
Delay.
Logic Reconstruction Techniques.

LOCAL RECONSTRUCTION
Transform one logic circuit to another
that is functionally equivalent.
Local reconstruction rules.
Gate reorganization applies series of
local transformations.
Best among the many generated and
evaluated circuit is chosen.

TRANSFORMATION
OPERATORS

TRANSFORMATION
OPERATORS
COMBINE -> hide high frequency nodes
inside the cell so that node capacitance
is not being switched.
DECOMPOSE and DUPLICATE ->
separate critical path from non critical
path.
DELETE -> reduces circuit size.
ADD -> provide intermediate circuit that
might eventually yield to better one.

SIGNAL GATING
Mask unwanted switching activities.
Methods
AND or OR gates.
Latch or Flip flops.
Transmission gate or Tristate buffer.

Clock signals, address bus, data bus,


signals with high frequency or glitches
are good candidates for gating.

Signal Gating
Latch/
FF

(a) Simple gate

(b) Tri-state Buffer

(c) Latch / FF

(d) Transmission gate

Signal gating ---> To mask unwanted switching activities from propagation


forward, causing unnecessary power dissipation

Different ways of implementation :


1. Put an AND / OR gate at the signal path to stop the propagation of the signal
when it needs to be masked.
2. Use a latch / FF to block the propagation of the signal.
3. Transmission gate or a Tri-state buffer can be used in place of a latch if charge
leakage is not a concern.

Binary and Gray Code Counting


Sequences
Binary Code

Gray Code

Sequence

No. Toggles

Sequence

No. Toggles

000

000

001

001

010

011

011

010

100

110

101

111

110

101

111

100

Toggle Activities of
Binary Vs Gray Code Counter
No. of bits

No. of Toggles
Binary
Bn = 2 (2n-1)

Bn / Gn
Gray
Gn = 2n

1.5

14

1.75

30

16

1.88

62

32

1.94

126

64

1.99

Data Representation
2s Complement

Sign Magnitude

Sign Extension
(MSB Sign Bits
Switch for positive to
Negative Transitions)

One bit allocated for


Sign Bit
(Switching is
Minimum)

LOGIC ENCODING

Maximum toggling reduces from n to


n/2.
For mutually uncorrelated signal,

Num. bits

Regular bus
E[P]

Invert bus
E[Q]

Invert /
Regular
E[Q]/E[P]

0.75

0.75

1.56

0.781

3.27

0.817

16

6.83

0.854

32

16

14.19

0.886

64

32

29.27

0.915

128

64

59.96

0.937

256

128

122.1

0.954

1.00

STATE MACHINE ENCODING


State transition graph.
Encoding of state machine.
TRANSITION ANALYSIS OF STATE ENCODING
Expected number of bit transitions in the state
register.
Expected number of transition of output.
E[M] <- expected number of state bit transitions.
Lower E[M] - - - > higher power efficient.
Fewer transition of state register.
Fewer transitions are propagated into combinational
logic of the machine.

0.
1

0.
1

01

11

0.
3

0.
1

0.
3

0.
4
01

00

0.
1
0.
4
11

00
0.
1

0.
1

M1

M2

OUTPUT DONT CARE


ENCODING
Proper assignment of dont care
signal.
Reduce expected number of
transitions in output signal.

PRE COMPUTATION LOGIC


Trade area for power in synchronous
digital circuit.
Identify logic conditions at some
inputs to a combinational logic that is
invariant to the output.
Those input transitions can be gated
or disabled.

Precomputation Logic
Precomputed

R1

Input

Combinational Logic

f(x)
Gated

R2

Inputs

Load
g(x)

Precomputed
Logics

Disable

Outputs

Binary Comparator using


Precomputation Logic
An
Bn

R1
n- bit comparator

An-1
Bn-1

A>B
R2

A1
B1

A n=/= B n
Load Disable

A>B

PRE COMPUTATION
CONDITION

DESIGN ISSUES IN
PRECOMPUTATION LOGIC
Select pre computation architecture.
Determine the pre computed inputs R1
and gated inputs R2 given the function
f(x).
With R1 and R2 selected, find pre
computation logic g(x).
Evaluate the probability of the pre
computation condition and the
potential power savings.

ARCHITECTURAL LEVEL POWER


ESTIMATION

BUS A

BLOCK A

BUS B

BLOCK B

BUS C

Simplified system consisting of two building blocks and


interconnection buses

ESTIMATING THE POWER


DISSIPATION
BLOCK ACTIVITY FACTOR() (no. of equations per
second).
OUTPUT SIGNAL ACTIVITY FACTOR( ).
NORMALIZED BLOCK ENERGY En ,
THE TOTAL POWER DISSIPATED BY THE BUILDING
BLOCKS IS GIVEN BY

PBB =

n n En
n R

Architecture and System


Level

Operation Reduction
Pipelining and Parallelism
Retiming
Unfolding and Folding
Power and Performance Management

Pipelining and Parallel Processing


Pipelining
leads to a reduction in the critical path
Either increases the clock speed (or sampling speed) or reduces the power
consumption at same speed in a DSP system
Parallel Processing
Multiple outputs are computed in parallel in a clock period
The effective sampling speed is increased by the level of parallelism
Can also be used to reduce the power consumption

Two Cascaded Operations :

Register

Register

(a) Non-pipelined architecture

(b) Two-stage pipelined architecture


Cap = 1.2C, Voltage = 0.6 V ,

Frequency = f

A two-datapath parallel system :

Datapath 1

DeMUX

MUX

Datapath 2

Cap =2.2C,

Voltage =0.6V ,

Frequency = 0.5f

Power dissipation of parallel and pipelined


systems

Ppar 2.2C (0.6V ) (0.5 f ) 0.396 Puni


2

Ppip (1.2C )(0.6V ) 2 f 0.432 Puni

Combining parallelism with pipelining to balance


pipe-stage delays :

Register

Register

B(1)

DeMUX

MUX

B(2)

Reducing operations maintaining throughput


X
X2

( X2 + XA )
X

X2 + XA + B
+

X
XA

+
( X +A)
X2 + XA + B

( X +A) X

X
B

Reducing operations with less throughput


A

X2A

X3+X2A+XB+C

X +X A
3

X
X2

X3

XB+C

X
+
XB
X
*

X+A

A
*

(X+A)X

X2+AX+B
+

X3+X2A+BX

X3+X2A+BX+C

Table : Power results for benchmark circuit(C17)

Power(watt)

Binary code

Chebychev distance

Hamming distance

Gray Code

Average

7.186733e-004

1.602692e-003

1.602683e-003

7.185032e-004

maximum

2.710073e-002

2.319953e-002

1.837399e-002

1.837406e-002

minimum

5.911938e-007

7.837975e-007

9.031063e-008

8.799572e-008

Power results for 5 bit even parity generator

Power

Binary code Chebychev


distance

Hamming
distance

Gray code

Average

6.983956e-006

6.985183e-006

6.662661e-006

6.506795e-006

Maximum

4.742954e-002

4.742954e-002

1.678512e-002

1.689332e-002

Minimum

6.399358e-008

2.674323e-007

2.203997e-008

2.203997e-008

power dissipation of unit distance based reordering method is better


than straight binary, chebychev and hamming distance.
Hence it can be concluded that gray code can be employed for
reducing the power dissipation of combinational logic circuits during
testing.
Also the reordering of the test vectors using distance based
techniques does not alter the fault coverage.

RETIMING
Retiming is a mapping from a given DFG, G to a retimed DFT, Gr
such that the corresponding transfer function of G and Gr differ by a pure delay
z-L.

Purposes
To reduce clock cycle time
To reduce number of registers needed.
To reduce the power consumed by the circuit.

Properties of Retiming

The weight of the retimed path p = Vo -> V1 -> -> Vk is given by


Wr (p) = w(p) + r(Vk) r(Vo)

Retiming does NOT change the total number of delays for each cycle.
Retiming does not change loop bound or iteration bound of the DFG
If the retiming values of every node v in a DFG G are added to a
constant integer j, the retimed graph Gr will not be affected. That is, the
weights (# of delays) of the retimed graph will remain the same.

Unfolding
It is a transformation technique that can be applied to a DSP Program to create
a new program describing more than one iteration of the original program.

Bit parallel adder designed by unfolding the bit serial adder using J = 4

Digit serial adder designed by unfolding the bit serial adder using J = 2

FOLDING

Clock Cycle

Adder input
( left )

Adder Input
( top )

System Output

a(0)

b(0)

a(0) + b(0)

c(0)

a(1)

b(1)

a(0) + b(0) + c(0)

a(1) + b(1)

c(1)

a(2)

b(2)

a(1) + b(1) + c(1)

a(2) + b(2)

c(2)

RESULTS

Estimation and
Optimization
Analysis precedes optimization
Accurate analysis techniques must be developed so that they can serve as
proper estimation functions for optimization tools
Optimization precedes Synthesis
A strong foundation in optimization is required before synthesis can proceed
to the next level

State of art in Commercial


EDA tools for low power

Transistor level
Power Estimation at the transistor level can be done by
computing the current flow

Transistor Level
SPICE is widely accepted reference
Epics Power Mill

Circuit-Level Power
Optimization

Transistor Sizing :
Adjusting the size of the each gate or
transistor for minimum power
Voltage Scaling :
Lower supply voltages use less power,
but go
Slower
Voltage islands :
Different blocks can be run at different
voltages, saving power
Level Shifters are required

Variable VDD:
The voltage for a single block can be
varied during operation (high & low)
Multiple threshold voltages :
Modern processes can build
transistors with different thesholds
Power can be saved by using a
mixture of CMOS transistors with two
or more different threshold voltages.

Power Gating :
Uses high Vt sleep transistors which cutoff a circuit block when the block is not
swtiching
Also known as MTCMOS
Long Channel Transistors :
Transistors of more than minimum length
leak less, but are bigger and slower
Stacking and Parking states :
Logic gates may leak differently during
logically equivalent input states
State machines may have less leakage in
certain states

Logic Styles :
Dynamic and static logic , for
example, have different speed/power
tradeoffs