Superscalar Processors Superscalar Processors vs. VLIW: Computer Science
Superscalar Processors Superscalar Processors vs. VLIW: Computer Science
Superscalar Processors Superscalar Processors vs. VLIW: Computer Science
7.1 Introduction 7.2 Parallel decoding 7.3 Superscalar instruction issue 7.4 Shelving 7.5 Register renaming 7.6 Parallel execution 7.7 Preserving the sequential consistency of instruction execution 7.8 Preserving the sequential consistency of exception processing 7.9 Implementation of superscalar CISC processors using a superscalar RISC core 7.10 Case studies of superscalar processors
TECH
CH01
Computer Science
+ shortens the overall cycle time or reduces the number of cycles needed
Issue policies
Direct Issue
Scope of shelving
Reg. file
Reg. file
- Dispatch policy
Selection Rule
Specifies when instructions are considered executable e.g. Dataflow principle of operation
f Those instructions whose operands are available are executable.
Arbitration Rule
Needed when more instructions are eligible for execution than can be disseminated. e.g. choose the oldest instruction.
Dispatch order
Determines whether a non-executable instruction prevents all subsequent instructions from being dispatched.
Maximum issue rate <= Maximum dispatch rates >> issue rate reaches max more often than dispatch rates
Example overview
Cycle i: Issue of the mul instruction into the reservation station and fetching of the corresponding operands
Cycle i+1: Checking for executable instructions and dispatching of the mul instruction
Cycle i+1 (2nd phase): Issue of the subsequent two ad instructions into the reservation station
Cycle i+2: Checking for executable instruction (mul not yet completed)
Cycle i+3: Updating the FX register file with the result of the mul instruction
Cycle i+3 (2nd phase): Checking for executable instructions and dispatching the older ad instruction
Choronology of introduction of renaming (high complexity, Sparc64 used 371K transistors that is more than i386)
format:
op Rd, Rs1, Rs2
Assume:
separate rename register file, associative access, and operand fetching during renaming
Structure of the rename buffers and their supposed initial contents Latest bit: the most recent rename 1, previous 0
Renaming steps
Allocation of a free rename register to a destination register Accessing valid source register value or a register value that is not yet available Re-allocation of destination register Updating a particular rename buffer with a computed result De-allocation of a rename buffer that is no longer needed.
Allocation of a new rename buffer to destination register (circular buffer: Head and Tail) (before allocation)
3 is the index
Updating the rename buffers with computed result of {mul r2, r0, r1} (register 2 with the result 0)
Deallocation of the rename buffer no. 0 (ROB retires instructions) (update tail pointer)
to finish
operation of the instruction is accomplished, except for writing back the result into
f the architectural register or f memory location specified, and/or f updating the status bits
to complete
writing back the results
to retire (ROB)
write back the results, and delete the completed instruction from the last ROB entry
7.7 Preserving Sequential Consistency of instruction execution // Multiple EUs operating in parallel, the overall instruction execution should >> mimic sequential execution
the order in which instruction are completed the order in which memory is accessed
Using Re-Order Buffer (ROB) for Preserving: The order in which instruction are <completed>
1. Instruction are written into the ROB in strict program order:
One new entry is allocated for each active instruction
3. An instruction is allowed to retire only if it has finished and all previous instruction are already retired.
retiring in strict program order only retiring instructions are permitted to complete, that is, to update the program state:
f by writing their result into the referenced architectural register or memory
7.8 Preserving the Sequential consistency of exception processing When instructions are executed in parallel,
interrupt request, which are caused by exceptions arising in instruction <execution>, are also generated out of order.
Precise interrupts: handling the interrupts in consistent with the state of a sequential processor
7.9 Implementation of superscalar CISC processors using superscalar RISC core CISC instructions are first converted into RISC-like instructions <during decoding>.
Simple CISC register-to-register instructions are converted to single RISC operation (1-to-1) CISC ALU instructions referring to memory are converted to two or more RISC operations (1-to-(2-4))
f SUB EAX, [EDI]
converted to e.g.
More complex CISC instructions are converted to long sequences of RISC operations (1-to-(more than 4))
PentiumPro: Decoding/converting CISC instructions to RISC operations (are done in program order)
67