RTL Design

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31
At a glance
Powered by AI
The key takeaways are about Register-Transfer Level (RTL) design and how to capture behavior and convert it to a circuit.

RTL design is a method for creating custom processors where the first step is to capture the desired behavior using a high-level state machine and the remaining steps are to convert it to a circuit.

The RTL design method involves first capturing the desired behavior using a high-level state machine, and then converting it to a circuit by creating a datapath and connecting it to a controller.

Register-Transfer Level (RTL) Design

Recall
Chapter 2: Combinational Logic Design
First step: Capture behavior (using equation or truth table) Remaining steps: Convert to circuit

Chapter 3: Sequential Logic Design


First step: Capture behavior (using FSM) Remaining steps: Convert to circuit

Capture behavior

RTL Design (the method for creating custom processors)


First step: Capture behavior (using highlevel state machine, to be introduced) Remaining steps: Convert to circuit

Convert to circuit

RTL Design Method

Step 1: Laser-Based Distance Measurer


T (in seconds) laser D Object of interest 2D = T sec * 3*108 m/sec

sensor

Example of how to create a high-level state machine to describe desired processor behavior Laser-based distance measurement pulse laser, measure time T to sense reflection
Laser light travels at speed of light, 3*108 m/sec Distance is thus D = T sec * 3*108 m/sec / 2
3

Step 1: Laser-Based Distance Measurer


T (in seconds) laser from button B L to laser

sensor

D to display

16

Laser-based distance measurer

S from sensor

Inputs/outputs
B: bit input, from button to begin measurement L: bit output, activates laser S: bit input, senses laser reflection D: 16-bit output, displays computed distance

Step 1: Laser-Based Distance Measurer


from button B L Laserbased distance measurer

Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)


to display D 16

to laser

from sensor

S0
a

L = 0 (laser off) D = 0 (distance = 0)

Step 1: Create high-level state machine Begin by declaring inputs and outputs Create initial state, name it S0
Initialize laser to off (L=0) Initialize displayed distance to 0 (D=0)

Step 1: Laser-Based Distance Measurer


Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) B (button not pressed)
a

from button B

L Laserbased distance measurer

to laser

to display

16

from sensor

S0 L=0 D=0

S1 B (button pressed)

Add another state, call S1, that waits for a button press
B stay in S1, keep waiting B go to a new state S2

Q: What should S2 do?

A: Turn on the laser


a

Step 1: Laser-Based Distance Measurer


Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) B
from button B L Laserbased distance measurer to laser to display D 16 S from sensor

S0 L=0 D=0

S1

S2 L=1 (laser on)

S3 L=0 (laser off)


a

Add a state S2 that turns on the laser (L=1) Then turn off laser (L=0) in a state S3 Q: What do next? A: Start timer, wait to sense reflection
a

Step 1: Laser-Based Distance Measurer


Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits)
to displ ay from but ton B Lase r-based distan ce measu rer L to laser D 16 S from sensor

S (no reflection) S (reflection) ?


a

S0 L=0 D=0

S1

S2 L=1

S3

Dctr = 0 (reset cycle count)

L=0 Dctr = Dctr + 1 (count cycles)

Stay in S3 until sense reflection (S) To measure time, count cycles for which we are in S3
To count, declare local register Dctr Increment Dctr each cycle in S3 Initialize Dctr to 0 in S1. S2 would have been O.K. too
8

Step 1: Laser-Based Distance Measurer


from but ton B Lase r-based distan ce measu rer L to laser

Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S0 L=0 D=0 S1 Dctr = 0 S2 L=1 S

to displ ay

16

from sensor

S3

S4

L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D)

Once reflection detected (S), go to new state S4


Calculate distance Assuming clock frequency is 3x108, Dctr holds number of meters, so D=Dctr/2

After S4, go back to S1 to wait for button again


9

Step 2: Create a Datapath


Datapath must
Implement data storage Implement data computations

Look at high-level state machine, do three substeps


(a) Make data inputs/outputs be datapath inputs/outputs (b) Instantiate declared registers into the datapath (also instantiate a register for each data output) (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations

Instantiate: to introduce a new component into a design.

10

Step 2: Laser-Based Distance Measurer


(a) Make data Local Registers: Dctr (16 bits) inputs/outputs be datapath B S inputs/outputs (b) Instantiate declared S4 S0 S1 S2 S3 registers into the B S datapath (also L=0 Dctr = 0 L=1 L=0 D = Dctr / 2 instantiate a D=0 Dctr = Dctr + 1 (calculate D) register for each a data output) Datapath (c) Examine every Dreg_clr state and Dreg_ld transition, and clear clear I Dctr_clr instantiate Dct r: 16-bit Dreg: 16-bit count D c tr_c n t load up-count er regist er datapath Q Q components and connections to implement any 16 data computations
D

Inputs: B, S (1 bit each)

Outputs: L (bit), D (16 bits)

11

Step 2: Laser-Based Distance Measurer


(c) (continued) Examine every state and transition, and instantiate datapath components and connections to implement any data computations
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S

S0 L=0 D=0
Datapath

S1 Dctr = 0

S2 L=1

S3

S4

L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D)


a

Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count Q 16 Dct r: 16-bit up -count er clear load

>>1 16 I Q Dreg: 16-bit regist er

16
D

12

Step 3: Connecting the Datapath to a Controller


from button B Controller Dreg_clr Dreg_ld Dctr_clr Dctr_cnt D to display 16 300 MHz Clock Datapath S L to laser from sensor

Laser-based distance measurer example Easy just connect all control signals between controller and datapath

Datapath Dreg_clr Dreg_ld Dctr_clr Dctr_cnt


>>1

16
clear count Q Dctr: 16-bit up-counter clear load I Dreg: 16-bit register Q 16 D

16

13

Step 4: Deriving the Controllers FSM


B from butt on Cont roller Dreg_clr Dreg_ld Dctr_clr Dctr_cnt D t o displ ay 16 300 M Hz Clock Datapath S L to laser from sensor

Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S

S0 L=0 D=0

S1 Dctr = 0

S2 L=1

S3

S4

L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D)

Inputs: B, S FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt structure as highB level state machine

S
a

Inputs/outputs all bits now Replace data operations by bit operations using datapath

S0

S1

S2

S3

S4

L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg)

L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count)

L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on)

L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up)

L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)14

Step 4: Deriving the Controllers FSM


B S S0 L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg) S1 L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count) B S2 L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on) S3 S S4 L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)

L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up)

Using shorthand of outputs not assigned implicitly assigned 0

Inputs: B, S

Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt


B S
a

S0 L=0 Dreg_clr = 1 (laser off) (clear D reg)

S1 Dctr_clr = 1 (clear count)

S2 L=1 (laser on)

S3

S4 Dreg_ld = 1 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)

L=0 Dctr_cnt = 1 (laser off) (count up)

15

Step 4
from button Controller B L Dreg_clr Dreg_ld Dctr_clr Dctr_cnt to display D 16
300 MHz Clock

Datapath

to laser from sensor

Datapath
>>1

Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count Dctr: 16-bit up-counter Q clear load 16

16 I Dreg: 16-bit register Q 16

Inputs: B, S

Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt B S

S0 L=0 Dreg_clr = 1 (laser off) (clear D reg)

S1 Dctr_clr = 1 (clear count)

S2 L=1 (laser on)

S3

L=0 Dctr_cnt = 1 (laser off) (count up)

Implement S4 FSM as state register and Dreg_ld = 1 Dctr_cnt = 0 logic (Ch3) to (load D reg with Dctr/2) complete the (stop counting) design
16

RTL Example: Video Compression Sum of Absolute


Only difference: ball moving
Frame 1 Frame 2

Differences
Frame 1 Frame 2

Digitized

Digitized

Digitized

Difference of

frame 1

frame 2

frame 1

2 from 1

1 Mbyte (a)

1 Mbyte

1 Mbyte (b )

0.01 Mbyte

Video is a series of frames (e.g., 30 per second) Most frames similar to previous frame
Compression idea: just send difference from previous frame

Just send difference

17

RTL Example: Video Compression Sum of Absolute


compare
Frame 1 Frame 2

Differences
Assume each pixel is represented as 1 byte (actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel)

Need to quickly determine whether two frames are similar enough to just send difference for second frame
Compare corresponding 16x16 blocks
Treat 16x16 block as 256-byte array

Compute the absolute value of the difference of each array item Sum those differences if above a threshold, send complete frame for second frame; if below, can use difference method (using another technique, not described)
18

RTL Example: Video Compression Sum of Absolute


Differences
256-byte array 256-byte array
A SAD

B go

sad

integer

!(i<256)

Want fast sum-of-absolute-differences (SAD) component


When go=1, sums the differences of element pairs in arrays A and B, outputs that sum

19

RTL Example: Video Compression Sum of Absolute


Differences
A
SAD B sad

Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits)
S0 go

go

!go
sum = 0 i=0
a

S0: wait for go S1: initialize sum and index S2: check if done (i>=256) S3: add difference to sum, increment index S4: done, write to output sad_reg

S1
(i<256)

!(i<256)

S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 S4

sad_reg = sum

20

RTL Example: Video Compression Sum of Absolute


Differences
Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) AB_addr i_lt_256 i_inc <256 9 i A_data B_data 8 8

S0 go S1
(i<256)

!go sum = 0 i=0

i_clr
sum_ld sum_clr

8 32

S2

sum

abs
8

i<256 !(i<256) sum=sum+abs(A[i]-B[i]) sad_reg_ld S3 i=i+1

32 32
sad_reg 32 sad

!(i<256) sad_reg=sum S4(i_lt_256)

Datapath

Step 2: Create datapath

21

RTL Example: Video Compression Sum of Absolute


Differences
go go i_inc sum=0 sum_clr=1 i=0 i_clr=1 i_clr sum_ld sum_clr sad_reg_ld sum 32 32 sad_reg 32 sad 32 abs 8 i AB_ rd i_lt_256 S0 go S1 <256 9 AB_addr A_data B_data 8 8

S2 i<256 i_lt_256 sum=sum+abs(A[i]-B[i]) S3 sum_ld=1; AB_rd=1 i=i+1 i_inc=1

!(i<256)

S4

!(i<256) !(i<256) (i_lt_256) Controller

sad_reg=sum sad_reg_ld=1 (i_lt_256)

Step 3: Connect to controller Step 4: Replace high-level state machine by FSM


22

RTL Example: Video Compression Sum of Absolute


Differences
Comparing software and custom circuit SAD
Circuit: Two states (S2 & S3) for each i, 256 is 512 clock cycles Software: Loop (for i = 1 to 256), but for each i, must move memory to local registers, subtract, compute absolute value, add to sum, !(i<256) increment i say about 6 cycles per array item 256*6 = 1536 cycles !(i<256) Circuit is about 3 times (300%) (i_lt_256) faster
(i<256)

S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1

23

Control vs. Data Dominated RTL Design


Designs often categorized as control-dominated or datadominated
Control-dominated design Controller contains most of the complexity Data-dominated design Datapath contains most of the complexity General, descriptive terms no hard rule that separates the two types of designs Laser-based distance measurer control dominated SAD circuit mix of control and data Now lets do a data dominated design

24

Data Dominated RTL Design Example: FIR Filter


Filter concept
Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle) That 240 is probably wrong!
Could be electrical noise

X 12 digital filter 12

Filter should remove such noise in its output Y Simple filter: Output average of last N values
Small N: less filtering Large N: more filtering, but less sharp output

clk

25

Data Dominated RTL Design Example: FIR Filter


FIR filter
Finite Impulse Response Simply a configurable weighted sum of past input values y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Above known as 3 tap Tens of taps more common Very general filter User sets the constants (c0, c1, c2) to define specific filter
X 12 clk digital filter 12 Y

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

RTL design
Step 1: Create high-level state machine
But there really is none! Data dominated indeed.

Go straight to step 2
26

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath
Begin by creating chain of xt registers to hold past values of X Suppose sequence is: 180, 181, 240
X 12 clk digital filter 12 Y

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

240 180 181

180 181

180

27

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath (cont.)
Instantiate registers for c0, c1, c2 Instantiate multipliers to compute c*x values
x(t) xt0 X clk
a

X 12 clk digital filter 12

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

c0

3-tap FIR filter x(t -1) c1 xt1

x(t -2) xt2

c2

*
Y

28

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath (cont.)
Instantiate adders
X 12 clk digital filter 12 Y

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)


3-tap FIR filter x(t) xt0 X clk c0 x(t -1) xt1 c1 x(t -2) xt2 c2

* +

* +

29

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath (cont.)
Add circuitry to allow loading of particular c register
CL Ca1 Ca0 C x(t) xt0 X clk c0 x(t-1) xt1 c1 x(t-2) xt2 c2
a

X 12 clk digital filter 12

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)


3-tap FIR filter 3 2x4 2 1 0 e

* +

* +

*
yreg Y

30

Data Dominated RTL Design Example: FIR Filter


Step 3 & 4: Connect to controller, Create FSM
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

No controller needed Extreme data-dominated example (Example of an extreme control-dominated design an FSM, with no datapath)

Comparing the FIR circuit to a software implementation


Circuit
Assume adder has 2-gate delay, multiplier has 20-gate delay Longest past goes through one multiplier and two adders
20 + 2 + 2 = 24-gate delay

100-tap filter, following design on previous slide, would have about a 34-gate delay: 1 multiplier and 7 adders on longest path

Software
100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction. (100*2 + 100*2)*10 = 4000 gate delays

Circuit is more than 100 times faster (10,000% faster).


31

You might also like