Superscalar Processors Questions

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 12

Superscalar processors

Review
Dependence graph

• Nodes: instructions
• Edges: ordered relations among the instructions
• Any ordering-based transformation that does not change
the dependencies of the program will be guarantied not
to change the result of the program.

• Example
• S1: Load R1, A / R1  Memory (A) /
• S2: Add R2, R1 / R2  R2+R1 /

S1 S2
Data dependency

• Flow dependence: Statement 2 uses a variable computed by Statement 1.


Statement 1 must store/send the variable before Statement 2 fetches.
S1  S2
• Output dependence : Statement 1 and Statement 2 both compute the same
variable and Statement 2's value must be stored/sent after Statement 1's.
S1 S2
• Antidependence: Statement 1 reads from a location into which the second
statement stores. S1 S2

• Example
S1: Load R1, A / R1  Memory (A) /
S2: Add R2, R1 / R2  R2+R1 /
S1 S2
S3: Move R1, R3 / R1  R3 /
S4: Store B, R1 / Memory (B)  R3 /

S4 S3
EXAMPLE
• How long would the following sequence of instructions
take to execute on an superscalar processor with two
execution units, each of which can execute any
instruction? Load operations have a latency of two
cycles, and all other operations have a latency of one
cycle. Assume that the pipeline depth is 5 stages.

LD r1, (r2)
ADD r3, r1, r4
SUB r5, r6, r7
MUL r8, r9, r10
Example (cont.)
• In-order execution
• There are five pipeline stages and load has latency of 2
clock cycles
• Fetch, Decode, Execution, Memory access and Write
back are the pipeline stages
• Total number of cycles is 8
Example (cont.)
• Out-of-order execution
• There are five pipeline stages and load has latency of 2
clock cycles
• Fetch, Decode, Execution, Memory access and Write
back are the pipeline stages
• Total number of cycles is 7

• Solutions
Register renaming
• On an out-of-order superscalar processor with 8 execution units,
what is the execution time of the following sequence with and
without register renaming it any execution unit can execute any
instruction and the latency of all instructions is one cycle? Assume
that the hardware register file contains enough registers to remap
each destination register to a different hardware register and that
the pipeline depth is 5 stages.

• LD r7, (r8)
• MUL r1, r7, r2
• SUB r7, r4, r5
• ADD r9, r7, r8
• LD r8, (r12)
• DIV r10, r8, r10 .
Solution
• In this example, WAR dependencies are a significant limitation on paralle­lism, forcing
the DIV to issue 3 cycles after the first LD, for a total execution time of 8 cycles (the
MUL and the SUB can execute in parallel, as can the ADD and the second LD). After
register renaming, the program becomes
• LD hw7, (hw8)
• MUL hw1, hw7, hw2
• SUB hw17, hw4, hw5
• ADD hw9, hw17, hw8
• LD hw18, (hw 12)
• DIV hw10, hw18, hw10

• (Again, all of the renaming register choices are arbitrary.)


• With register renaming, the program has been broken into three sets of two
dependent instructions (LD and MUL, SUB and ADD, LD and DIV). The SUB and the
second LD instruction can now issue in the same cycle as the first LD. The MUL,
ADD, and DIV instructions all issue in the next cycle, for a total execution time of 6
cycles.
Example
• Figure on the next slide shows an example of a
superscalar processor organization. The processor can
issue two instructions per cycle if there is no resource
conflict and no data dependence problem. There are
essentially two pipelines, with four processing stages
(fetch, decode, execute, and store). Each pipeline has its
own fetch decode and store unit. Four functional units
(multiplier, adder, logic unit, and load unit! are available
for use in the execute stage and are shared by the two
pipelines ana dynamic basis. The two store units can be
dynamically used by the two pipeline depending on
availability at a particular cycle. There is a lookahead
window with its own fetch and decoding logic. This
window is used for instruction lookahead for out-of-order
instruction issue.
Example (cont.)
• What dependencies exist in the program?
• Show the pipeline activity for this program on
the processor using in-order issue with in-order
completion policies and using a presentation
similar to the Figure.
• Repeat for in-order issue with out-of-order
completion.
• Repeat for out-of-order issue with out-of-order
completion.

You might also like