CS 346: Code Generation: Resource
CS 346: Code Generation: Resource
CS 346: Code Generation: Resource
Resource: Textbook
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles,Techniques, and Tools”, Addison-
Wesley, 1986.
Compiler Architecture
Intermediate
Intermediate
Language
Language
Symbol
Table
Code Generator
Severe requirements imposed
Output must be correct and of high quality
Generator should run efficiently
Generating optimal code is undecidable
Must rely on heuristic
Choice of heuristic-very important
Details depends on
target language
operating system
Certain generic issues are inherent in the design of basically all
code generators
Input to Code Generator
Input to the code generator consists of:
Intermediate code produced by the front end (and perhaps
optimizer)
Remember that intermediate code can come in many forms
Three-address codes are more popular but several
techniques apply to other possibilities as well
Information in the symbol table
used to determine run-time addresses of data objects
Code generator typically assumes that:
Input is free of errors
Type checking has taken place and necessary type-conversion
operators have already been inserted
Output of Code Generator
Target program: output of the code generator
MOV b, R0 Redundant
a := b + c ADD c, R0 statements:
d := a + e MOV R0, a 4th and 3rd
MOV a, R0
ADD e, R0
MOV R0, d
MOV a, R0 INC a
a := a + 1
ADD #1, R0
MOV R0, a
Register Allocation
Main memory
Code area
Store large amount of data
Slower access
Global/static area
Registers
stack Very small amount of data
Data area
Faster access
Free space
registers
Heap
Main memory
Code Area
proc 2
General-purpose registers
Used for calculation
Special purpose registers
Program counter (pc)
Stack pointer (sp)
Frame pointer (fp)
Argument pointer (ap)
Addresses in Target Code (partitions)
Program runs in own logical address
Static
Holds global constants and other data
Size of these entities determined at compile time
Addresses in Target Code
Heap
Holds data objects created and freed during program
execution
Size of heap can’t be determined at compile time
Stack
Dynamically managed area for activation records (created
and destroyed during procedure calls and returns)
Size cant be determined at compile time
Activation Records
Activation records store information needed during the execution of a
procedure
Two possible storage-allocation strategies for activation records:
static allocation (decision can be taken by looking at program)
stack allocation (decision taken at run time)
An activation record has fields which hold:
result and parameters
machine-status information
local data and temporaries
Size and layout of activation records are communicated to code
generator via information in the symbol table
Sample Activation Records
Assume that run-time memory has areas for code, static data,
and optionally a stack
Heap is not being used in these examples
Static Allocation
A call statement in the intermediate code is implemented by
two target-machine instructions:
MOV #here+20, callee.static_area
GOTO callee.code_area
MOV instruction saves the return address
GOTO statement transfers control to the target code of the called
procedure
Determination of leaders:
The first statement is a leader
Any statement that is the target of a conditional or unconditional goto is a
leader
Any statement immediately following a goto or unconditional goto is a
leader
A basic block:
Starts with a leader
Includes all statements up to but not including the next leader
Basic Block Example
(1) prod := 0
begin (2) i := 1
prod := 0; (3) t1 := 4 * i
i := 1; (4) t2 := a[t1]
do (5) t3 := 4 * i
begin (6) t4 := b[t3]
prod := prod + a[i] * b[i] (7) t5 := t2 * t4
end (8) t6 := prod + t5
while i <= 20 (9) prod := t6
end (10) t7 := i + 1
(11) i := t7
(12) if i <= 20 goto 3
(13) …
Transformations on Basic Blocks
Basic block computes a set of expressions
Expressions: values of names that are live on exit from the
block
Two basic blocks are equivalent if they compute the same
set of expressions
a := b + c a := b + c
b := a – d b := a – d
c := b + c c := b + c
d := a - d d := b
-Dead-code elimination
Suppose the statement x := y + z appears in a basic block and x is dead
(i.e. never used again)
This statement can be safely removed from the code
Example Transformations (2)
Renaming of Temporary variables
Statement t := b + c appears in a basic block and t is a
temporary
We can safely rename all instances of this t to u, where u is a new
temporary
A normal-form block uses a new temporary for every statement
that defines a temporary
Interchange of two independent adjacent statements
Suppose we have a block with two adjacent statements:
t1 := b + c
t2 := x + y
If t1 is distinct from x and y and t2 is distinct from b and c, we
can safely change the order of these two statements
Next-Use Information
Defining use of a name in a three-address statement :
Suppose statement i assigns a value to x
Suppose statement j has x as an operand
We say that j uses the value of x computed at i if:
there is a path through which control can flow from
statement i to statement j
path has no intervening assignments to x
Knowing when the value of variable will be used next is essential for
generating good code
Basic blocks
B1: 1
B2: 2
B3: 3-9
B4: 10-11
B5: 12
B6: 13-17
Flow Graph
Representations of Flow Graphs
A basic block can be represented by a record consisting of:
A count of the number of quadruples in the block
Or, a pointer to the last instruction
A pointer to the leader of the block
The list of predecessors and successors
Set of nodes-L
t1 := a * a t1 := a * a
t2 := a * b t2 := a * b
t3 := 2 * t2 t3 := 2 * t2
t4 := t1 + t3 t1 := t1 + t3
t5 := b * b t2 := b * b
t6 := t4 + t5 t1 := t1 + t2
Directed Acyclic Graphs (DAGS)
DAG: Represents basic block
Directed acyclic graph such that:
leaves represent the initial values of name
Labeled by unique identifiers, either variable names or
constants
Operator applied to name determines if l-value (address) or r-
value (contents) is needed; usually it is r-value
Interior nodes are labeled by the operators
Nodes are optionally also given a sequence of identifiers
(identifiers that are assigned their values)
t1 := a + b t2 := c + d
t2 := c + d t3 := e – t2
t3 := e – t2 t1 := a + b
t4 := t1 – t3 t4 := t1 – t3
Generating Code from DAGS (2)
Assume only t4 is live on exit of previous example and two registers
(R0 and R1) exist
Code generation algorithm discussed earlier leads to following
solutions (second saves two instructions)
MOV a, R0 MOV c, R0
ADD b, R0 ADD d, R0
MOV c, R1 MOV e, R1
ADD d, R1 SUB R0, R1
MOV R0, t1 MOV a, R0
MOV e, R0 ADD b, R0
SUB R1, R0 SUB R1, R0
MOV t1, R1 MOV R0, t4
SUB R0, R1
MOV R1, t4