ASM Design Example Bin Mult
ASM Design Example Bin Mult
ASM Design Example Bin Mult
Introduction
An algorithmic state machine (ASM) is a Finite State Machine that uses a sequential circuit (the Controller) to coordinates a series of operations among other functional units such as counters, registers, adders etc. (the Datapath). The series of operations implement an algorithm. The Controller passes control signals which can be Moore or Mealy outputs from the Controller, to the Datapath. The Datapath returns information to the Controller in the form of status information that can then be used to determine the sequence of states in the Controller. Both the Controller and the Datapath may each have external inputs and outputs and are clocked simultaneously as shown in the following figure:
Inputs Outputs Inputs
Status
Controller
Control
Datapath
clock
Outputs
Think about this: A microprocessor may be considered as a (large !) ASM with many inputs, states and
outputs. A program (any software) is really just a method for specification of its initial state The two basic strategies for the design of a controller are: 1. hardwired control which includes techniques such as one-hot-state (also known as "one flipflop per state") and decoded sequence registers. 2. microprogrammed control which uses a memory device to produce a sequence of control words to a datapath.. Since hardwired control is, generally speaking, fast compared with microprogramming strategies, most modern microprocessors incorporate hardwired control to help achieve their high performance (or in some cases, a combination of hardwired and microprogrammed control). The early generations of microprocessors used microprogramming almost exclusively. We will discuss some basic concepts in microprogramming later in the course for now we concentrate on a design example of hardwired control. The ASM we will design is an n-bit unsigned binary multiplier.
Binary Multiplication
The design of binary multiplication strategies has a long history. Multiplication is such a fundamental and frequently used operation in digital signal processing, that most modern DSP chips have dedicated multiplication hardware to maximize performance. Examples are filtering, coding and compression for telecommunications and control applications as well as many others. Multiplier units must be fast !
The first example that we considered (in class) that used a repeated addition strategy is not always fast. In fact, the time required to multiply two numbers is variable and dependent on the value of the multiplier itself. For example, the calculation of 5 x 9 as 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 requires more clock pulses than the calculation of 5 x 3 = 5 + 5 + 5. The larger the multiplier, the more iterations that are required. This is not practical. Think about this: How many iterations are required for multiplying say, two 16-bit numbers, in the worst case ? Another approach to achieve fast multiplication is the look-up table (LUT). The multiplier and multiplicand are used to form an address in memory in which the corresponding, pre-computed value of the product is stored . For an n-bit multiplier (that is, multiplying an n-bit number by an n-bit number), a (2n+n x 2n)-bit memory is required to hold all possible products. For example, a 4-bit x 4-bit multiplier requires (28) x 8 = 2048 bits. For an 8-bit x 8-bit multiplier, a (28+8) x 16 = 1 Mbit memory is required. This approach is conceptually simple and has a fixed multiply time equal to the access time of the memory device, regardless of the data being multiplied. But it is also impractical for larger values of n. Think about this: What memory capacity is required for multiplying two 16-bit numbers ? Two 32-bit numbers ? Most multiplication hardware units use iterative algorithms implemented as an ASM for which the worst-case multiplication time can be guaranteed. The algorithm we present here is similar to the pencil-and-paper technique that we naturally use for multiplying in base 10. Consider the following example: 123 x 432 --246 369 492 ----53136 (the multiplicand) (the multiplier) (1st partial product) (2nd partial product) (3rd partial product) (the product)
Each digit of the multiplier is multiplied by the multiplicand to form a partial product. Each partial product is shifted left (that is, multiplied by the base) by the amount equal to the power of the digit of the corresponding multiplier. In the example above, 246 is actually 246x100, 369 is 369x101= 3690 and 492 is actually 492x102 = 49200, etc. There are as many partial products as there are digits in the multiplier. Binary multiplication can be done in exactly the same way: 1100 x 1011 ---1100 1100 0000 1100 -------10000100 (the multiplicand) (the multiplier) (1st partial product) (2nd partial product) (3rd partial product) (4th partial product) (the product)
However, with binary digits we can make some important observations: Since we multiply by only 1 or 0, each partial product is either a copy of the multiplicand shifted by the appropriate number of places, or, it is 0. The number of partial products is the same as the number of bits in the multiplier The number of bits in the product is twice the number of bits in the multiplicand. Multiplying two n-bit numbers produces a 2n-bit product.
We could then design datapath hardware using a 2n-bit adder plus some other components (as in the example of Figure 10.17 of Brown and Vranesic) that emulates this manual procedure. However, the hardware requirement can be reduced by considering the multiplication in a different light. Our algorithm may be informally described as follows. Consider each bit of the multiplier from right to left. When a bit is 1, the multiplicand is added to the running total that is then shifted right. When the multiplier bit is 0, no add is necessary since the partial product is 0 and then only the shift takes place. After n cycles of this strategy (once for each bit in the multiplier) the final answer is produced. Consider the previous example again:
1100 (the multiplicand) 1011 (the multiplier) ---0000 (initial partial product, start with 0000) 1100 (1 multiplier bit is 1, so add the multiplicand) ---1100 (sum) ---01100 (shift sum one position to the right) 1100 (2 multiplier bit is 1, so add multiplicand again) ---100100 (sum, with a carry generated on the left)
st nd
----
100100 (shift sum once to the right, including carry) 0100100 (3rd multiplier bit is 0, so skip add, shift once) ---1100 (4th multiplier bit is 1, so add multiplicand again) ---10000100 (sum, with a carry generated on the left) 10000100 (shift sum once to the right, including carry)
Notice that all the adds take place in these 4 bit positions we need only a 4-bit adder ! We also need shifting capability to capture the bits moving to the right as well as a way to store the carries resulting from the additions. The final answer (the product) consists of the accumulated sum and the bits shifted out to the right. A hardware design that can implement this algorithm is described in the next section.
Control inputs: Clear carry Load, Shift and Clear (for each shift register) Init (for the counter) Status outputs: Z (zero detect) and Q0 (each bit of the Multiplier, in succession)
Multiplicand
n A Cin Cout SUM n B Parallel Adder
Multiplier
Counter P
n Flipflop D C
Clear
n Shift Reg
Load Shift Clear Left serial input Load Shift Clear
Z
(Zero Detect)
Shift Reg
Left serial input
Register A
Register Q
n n
1 (lsb of Reg A) n
Q0 (lsb of Reg Q)
Product (msb's)
Product (lsb's)
IDLE
C 0, A 0, P n-1 Q multiplier
MUL0
Q0
1
A A + multiplicand C Cout
C0
MUL1
The process is achieved with 3 states (IDLE, MUL0 and MUL1). Each state will provide control signals to the Datapath to perform the multiplication sequence. The process is started with an input G. As long as G remains LO, the ASM remains in state IDLE. When G=1, the multiplication process is started. As the ASM moves to state MUL0, the carry flip flop is cleared (C<<0), Reg A is cleared (A<<0), the Counter is preset to n-1 (P << n-1) and Register Q is loaded with the Multiplier. In state MUL0, the value of each bit of the multiplier (available on Q0) determines if the multiplicand is added (Q0 = 1) or not (Q0=0). For the case Q0=0, the Carry flipflop is cleared ; for the case Q0=1, the Cout from the adder is stored in the carry flipflop. The next state is always MUL1.
In MUL1, the Carry flipflop, Reg A and Reg Q are treated as a (1 + n + n)-bit register and shifted one position to the right, together. This is indicated with the notation C|A|Q << shr (C|A|Q) in the ASM chart. The counter is also decremented (P << P 1). The value of Z then determines whether to: return to state MUL0 (Z=0) to continue iteration OR return to state IDLE (Z=1) thus completing the process. Remember that Z=1 means that the counter has counted down from n-1 to 0 and therefore n iterations have been completed. State IDLE=0 therefore indicates that the Multiplier is currently multiplying and when the ASM returns to state IDLE (IDLE=1), it indicates that multiplication is completed. At this point in the design process, the control signals must be identified and their names chosen. This is done by inspection of the ASM chart and the datapath circuit. In MUL0, the operations P << n 1, A<<0 and Q << multiplier are all independent of one another in the datapath and thus can be done simultaneously and therefore can share a common control signal (Initialize). However, the operation C<<0 must have its own control signal (Clear_C) since it occurs in both states IDLE and in MUL0. Operations C << Cout and A << A + multiplicand, required in state MUL0, can share a control signal (Load) since they are also independent functions in the datapath. And, similarly, the shifting of registers C|A|Q and decrementing of counter P can share a common control signal since they are independent operations in the datapath and are required in state MUL1 (Shift_dec). The names of the control signals are of course, a matter of design choice. We can summarize all the operations that must take place on each component in the datapath and indicate the corresponding control signal names that should be passed to the datapath in the following table:
Datapath component
Carry flipflop Counter P Register A Register Q
Operation
C << 0 C << Cout (from the adder) P << n - 1 P << P 1 A << 0 A << A + multiplicand C|A|Q << shr (C|A|Q) Q << multiplier C|A|Q << shr (C|A|Q
The state transition diagram for the controller for this ASM is shown below. Note that only the inputs are shown; the outputs are not indicated:
G=0 G=1
IDLE MUL0
z=0 z=1
MUL1
From inspection of the state transition diagram, the input equations for the D flipflops (using one flipflop per state) are easily formed: DIDLE = G IDLE + MUL1 Z DMUL0 = IDLE G + MUL1 Z DMUL1 = MUL0 From the ASM chart and the table above, the equations for the control signals outputs from the controller are formed: Initialize = G IDLE Clear_C = G IDLE + MUL0 Q0 Load = MUL0 Q0 Shift_dec = MUL1 Finally, to provide a mechanism to force the state machine to state IDLE (such as at power-up), an asynchronous input Reset_to_IDLE is connected to the asynchronous inputs of the flipflops. The circuit for the controller is then simply, an implementation of all of these equations as follows:
IDLE
Initialize Clear_C
Q0 MUL0 D C Q
Load MUL0
Z MUL1 D C Q Shift_dec
Go
Reset to IDLE
Multiplicand
Multiplier
n
Z, Q0
Controller
Datapath
2n
clock
Product
Combining the controller and the datapath to form the top level of our design, the binary multiplier may be viewed as:
n n
Binary Multiplier
2n
Product
Reset to IDLE
Clock
Note that the IDLE state variable has been brought to the top level since it can be use to indicate when the Binary Multiplier is busy. The Go and IDLE lines are called handshaking lines and are used to coordinate the operation of the multiplier with the external world. If IDLE =1, a multiply can be started by putting the numbers to be multiplied on the Multiplier and Multiplicand inputs and setting Go=1 at which time the state machine jumps to state MUL0 (and therefore, simultaneously, IDLE changes to 0) to start the process. When IDLE returns to 1, the answer is available on the Product output and another multiplication could be started. No multiplication should be attempted while IDLE is 0.
Conclusion
This design of a Binary Multiplier is valid for any value of n. For example, for n=16, the multiplication of two 16-bit numbers, the datapath components would simply be extended to accommodate 16 bits in Registers A and Q and the counter would require log 2(16) = 4 bits. The adder would also be required to be 16-bits in width. However, the same controller implementation can be used since its design is independent of n. The multiplication time for n=16 would be 2(16) + 1 = 33 clocks. The product would contain 32 bits. Further refinements can be made to enhance the speed and capability of the ASM. For example, in our algorithm, each 0 in the multiplier input data causes a shift without an add, each taking a clock pulse. If the multiplier input contains runs of consecutive 0s, a barrel shifter could be used to implement all of the required shifts (equal to the length of the run of 0s) in a single clock.
Think about this:
What modifications to our design would be required in order to be able to handle signed numbers. ?
Example: 12 x 5
Clock
pulse Counter P
Z 0 0 0 0 0 0 0 1 1 0
C x 0 0 0 0 0 0 0 0 0
Reg A xxxx 0000 1100 0110 0110 0011 1111 0111 0111 0011
Reg Q xxxx 0101 0101 0010 0010 0001 0001 1000 1000 1100
States
IDLE MUL0 MUL1
Control Signals
Initialize Clear_C Load Shift_dec
1 2 3 4 5 6 7 8 9
1 1 1 1 1 0 0 0 0 1
1 1 1 0 0 1 1 0 0 1
1 0 0 0 0 0 0 0 0 1
0 1 0 1 0 1 0 1 0 0
0 0 1 0 1 0 1 0 1 0
1 0 0 0 0 0 0 0 0 0
1 1 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0 0 0 0
0 0 1 0 1 0 1 0 1 0