Pipelining and Parallel Processing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

VLSI Digital Signal Processing Systems

Pipelining and Parallel Processing


Lan-Da Van (), Ph. D.
Department of Computer Science
National Chiao Tung University
Taiwan, R.O.C.
Spring, 2007
[email protected]
http://www.cs.nctu.edu.tw/~ldvan/
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-2
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-3
Introduction
Pipelining
Reduce the critical path
Increase the clock speed or sample speed
Reduce power consumption
Parallel processing
Not reduce the critical path
Not increase clock speed, but increase sample
speed
Reduce power consumption
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-4
A 3-tap FIR Filter
Direct-form structure
) 2 ( ) 1 ( ) ( ) ( + + = n cx n bx n ax n y
A M
sample
T T
f
2
1
+

A M sample
T T T 2 +
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-5
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-6
Pipelining and Parallel Concept
T
A
Pipelining
Introduce pipelining latches
along the datapath
Parallel processing
Duplicate the hardware
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-7
Pipelining FIR Filter
Critical path
2T
A
+T
M
-->T
A
+T
M
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-8
Pipelining (1/2)
Drawbacks
Increase number of delay elements (registers/latches) in the
critical path
Increase latency
Clock period limitation: critical path may be between
An input and a latch
A latch and an output
2 Latches
An input and an output
Pipelining latches can only be placed across any
feed-forward cutset of the graph
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-9
Pipelining (2/2)
Cutset: A cutset is a set of edges of a graph such that
if these edges are removed from the graph, the graph
becomes disjoint.
Feed-forward cutset: A cutset is called a feed-forward
cutset if the data move in the forward direction on all
the edges of the cutset.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-10
Example 3.2.1
4 u.t.
Error!
2 u.t.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-11
Transposition Theorem
Reversing the direction of all edges in a given SFG
and interchanging the input and output ports preserve
the functionality of the system.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-12
Data-Broadcast Structure
Critical path is reduced to (T
M
+T
A
).
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-13
Fine-Gain Pipelining
Let T
M
=10 u.t., T
A
=2 u.t., and the desired clock
period=6 u.t.
Break the MULTIPLIER into 2 smaller units with
processing time of 6 and 4 units.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-14
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-15
Parallel Processing
Parallel processing and pipelining are dual
If a computation can be pipelined, it can also be
processed in parallel.
Convert a single-input single-output (SISO) system to
multiple-input multiple-output (MIMO) system via
parallelism
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-16
Parallel Processing of 3-Tap FIR
Filter (1/2)
) 3 ( ) 1 3 ( ) 2 3 ( ) 2 3 (
) 1 3 ( ) 3 ( ) 1 3 ( ) 1 3 (
) 2 3 ( ) 1 3 ( ) 3 ( ) 3 (
k cx k bx k ax k y
k cx k bx k ax k y
k cx k bx k ax k y
+ + + + = +
+ + + = +
+ + =
) 2 ( ) 1 ( ) ( ) ( + + = n cx n bx n ax n y
) 2 (
3
1 1
A M clk
sample iter
T T T
L
T T
+ =
=
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-17
Parallel Processing of 3-Tap FIR
Filter (2/2)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-18
Complete Parallel Processing
System
Critical path has remained unchanged.
But the iteration period is reduced.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-19
S/P and P/S Converter
Edge Trigger!
Edge Trigger!
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-20
Why Parallel Processing ?
Parallel leads to duplicating many copies of hardware,
and the cost increases! Why use?
Answer lies in the fact that the fundamental limit to
pipelining is at I/O bottlenecks, referred to as
Communication Bound, composed of I/O pad delay
and the wire delay.
Parallel
Transmission
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-21
Combined Fine-Grain Pipelining and
Parallel Processing
) 2 (
6
1 1
A M clk
sample iter
T T T
LM
T T
+ = =
=
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-22
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-23
Underlying Low Power Concept
Propagation delay
Power consumption
Sequential filter
2
0
0
) V k(V
V C
T
t
charge
pd

=
f V C P
total
2
0
=
seq
T
f
1
=
,
2
0
0
) V k(V
V C
T
t
charge
seq

=
,
2
0
f V C P
total seq
=
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-24
Pipelining for Low-Power (1/2)
M-level pipelined system
Critical path-->1/M, capacitance to be charged in a
single clock cycle-->1/M
If the clock frequency is maintained, the power supply
can be reduced to V
0
(0<<1)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-25
Pipelining for Low-Power (2/2)
Power consumption
Propagation delay
Let T
seq
=T
pip
seq total pip
P f V C P
2 2
0
2
= =
2
0
0
) V V k(
V
M
C
T
t
charge
pip


,
2
0
0
) V k(V
V C
T
t
charge
seq

=
get ) ( ) (
2
0
2
0
==> =
t t
V V V V M
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-26
Example 3.4.1 (1/2)
Consider an original 3-tap FIR
filter and its fine-grain pipeline
version shown in the following
figures. Assume T
M
=10 ut, T
A
=2
ut, V
t
=0.6V, V
o
=5V, and C
M
=5C
A.
In fine-grain pipeline filter, the
multiplier is broken into 2 parts,
m1 and m2 with computation time
of 6 u.t. and 4 u.t. respectively,
with capacitance 3 times and 2
times that of an adder,
respectively. (a) What is the
supply voltage of the pipelined
filter if the clock period remains
unchanged? (b) What is the
power consumption of the
pipelined filter as a percentage of
the original filter?
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-27
Example 3.4.1 (2/2)
Solution:
% 4 . 36
V 0165 . 3
e) (infeasibl 0239 . 0 or 6033 . 0
) 6 . 0 5 ( ) 6 . 0 5 ( 2
3 : Grain Fine
6 : Original
2
2 2
2 1
= =
=
= =>
=
= + = =
= + =


Ratio
V
C C C C C
C C C C
pip
A A m m charge
A A M charge
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-28
Comparison
System Sequential
FIR (Original)
Pipelined FIR
(Without reducing Vo)
Pipelined FIR
(With reducing Vo)
Power (Ref)
Ref 2Ref 0.364Ref
Clock Period
(u.t.)
12 ut 6 ut 12 ut
Sample Period
(u.t.)
12 ut 6 ut 12 ut
Thinking Again!
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-29
Parallel Processing for Low-Power
L-parallel system
Maintain the same sample rate, clock period is
increased to LT
seq
This means that C
charge
is charged in LT
seq
, and the
power supply can be reduced to V
0
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-30
Parallel Processing for Low-Power
Power consumption
Propagation delay
LT
seq
=T
pap
get ) ( ) (
2
0
2
0
==> =
t t
V V V V L
seq total par
P
L
f
V LC P
2 2
0
) )( ( = =
,
2
0
0
) V k(V
V C
T
t
charge
seq

=
2
0
0
) V V k(
V C
LT
t
charge
seq


VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-31
Example 3.4.2 (1/2)
Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-
parallel version in 3.18(b). The two architectures are operated at
the sample period 9 u.t. Assume T
M
=8, T
A
=1, V
t
=0.45V, V
o
=3.3V,
C
M
=8C
A
(a) What is the supply voltage of the 2-parallel filter? (b)
What is the power consumption of the 2-parallel filter as a
percentage of the original filter?
Solution:
% 41 . 43 Ratio
2.1743V Vpar
0282 . 0 or 6589 . 0
) 6 . 0 3 . 3 ( 5 ) 45 . 0 3 . 3 ( 9
10 2 : Parallel 2
: Original
2
2 2
= =
=
= =>
=
= + =
+ =


A A M charge
A M charge
C C C C
C C C
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-32
Example 3.4.2 (2/2)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-33
Example 3.4.3 (1/2)
A more efficient structure than the previous one is depicted in Fig. 3.18(c).
(a) What is the supply voltage of the efficient 2-parallel filter? (b) What is
the power consumption of the efficient 2-parallel filter as a percentage of
the original filter?
Solution:
% 6 . 43
35
2
1
55
Ratio
45857 . 2
e) (infeasibl 0.025 or 0.745
) 6 . 0 3 . 3 ( 12 ) 45 . 0 3 . 3 ( 9 2
12 4 : Parallel - 2 New
9 : Original
2
0
2
0
2
2 2
=


= =
=
=
=
= + =
= + =
s A
s A
seq
par
pip
A A M charge
A A M charge
f V C
f V C
P
P
V V
C C C C
C C C C


VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-34
Example 3.4.3 (2/2)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-35
Combining Pipelining and Parallel
Processing
Parallel-pipelined structure
M=L=2, V
0
=5V, V
t
=0.6V-->=0.4,
2
=0.16
2
0
2
0
) ( ) (
t t
V V V V ML ==> =
seq
LT T
pp
=
,
2
0
0
) V k(V
V C
T
t
charge
seq

=
2
0
0
) V V k(
V
M
C
T
t
charge
pp


VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-36
Conclusions
Methodologies of pipelining of 3-tap FIR filter
Methodologies of parallel processing for 3-tap FIR
filter
Methodologies of using pipelining and parallel
processing for low power demonstration.
Pipelining and parallel processing of recursive digital
filters using look-ahead techniques are addressed in
Chapter 10.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-37
Self-Test Exercises
STE1: Problem 8 of Chap 3 in text book.
STE2: Problem 9 of Chap 3 in text book.
STE2: Problem 10 of Chap 3 in text book.

You might also like