18 Code Gen
18 Code Gen
18 Code Gen
Code Generation
Compiler
Lexical Syntax Semantic
Analysis Analysis Analysis
Source Token
Abstract
Syntax Intermediate Code Target
Program
Program stream tree Code Generation
1
Code generation and Instruction
Selection
Front Intermediate Code
input output
end Code generator generator
Symbol
table
Requirements
• output code must be correct
• output code must be of high quality
• code generator should run efficiently
2
Design of code generator: Issues
• Input: Intermediate representation with symbol
table
– assume that input has been validated by the front end
• Target programs :
– absolute machine language
fast for small programs
– relocatable machine code
requires linker and loader
– assembly code
requires assembler, linker, and loader
3
More Issues…
• Instruction selection
– Uniformity
– Completeness
– Instruction speed, power consumption
• Register allocation
– Instructions with register operands are
faster
– store long life time and counters in registers
– temporary locations
– Even odd register pairs
• Evaluation order
4
Instruction Selection
• straight forward code if efficiency is not an issue
a=b+c Mov b, R0
d=a+e Add c, R0
Mov R0, a
Mov a, R0 can be eliminated
Add e, R0
Mov R0, d
5
Example Target Machine
• Byte addressable with 4 bytes per word
• n registers R0, R1, ..., Rn-l
• Two address instructions of the form
opcode source, destination
• Usual opcodes like move, add, sub etc.
• Addressing modes
MODE FORM ADDRESS
Absolute M M
register R R
index c(R) c+content(R)
indirect register *R content(R)
indirect index *c(R) content(c+content(R))
literal #c c
6
Flow Graph
• Graph representation of three address
code
• Useful for understanding code generation
(and for optimization)
• Nodes represent computation
• Edges represent flow of control
7
Basic blocks
• (maximum) sequence of consecutive
statements in which flow of control enters at
the beginning and leaves at the end
Algorithm to identify basic blocks
• determine leader
– first statement is a leader
– any target of a goto statement is a leader
– any statement that follows a goto statement is a
leader
• for each leader its basic block consists of the
leader and all statements up to next8 leader
Flow graphs
• add control flow information to basic
blocks
• nodes are the basic blocks
• there is a directed edge from B1 to B2 if B2
can follow B1 in some execution sequence
– there is a jump from the last statement of B1
to the first statement of B2
– B2 follows B1 in natural order of execution
• initial node: block with first statement as
leader
9
Next use information
• for register and temporary allocation
• remove variables from registers if not
used
• statement X = Y op Z
defines X and uses Y and Z
• scan each basic blocks backwards
• assume all temporaries are dead on
exit and all user variables are live on
exit
10
Computing next use information
Suppose we are scanning
i : X := Y op Z
in backward scan
1. attach to statement i, information in symbol
table about X, Y, Z
2. set X to “not live” and “no next use” in symbol
table
3. set Y and Z to be “live” and next use as i in
symbol table
11
Example
1: t1 = a * a
2: t2 = a * b
3: t3 = 2 * t2
4: t4 = t1 + t3
5: t5 = b * b
6: t6 = t4 + t5
7: X = t6
12
STATEMENT
1: t1 = a * a Example
2: t2 = a * b
3: t3 = 2 * t2
4: t4 = t1 + t3
5: t5 = b * b Symbol Table
6: t6 = t4 + t5
7: X = t6
t1 dead Use in 4
t2 dead Use in 3
7: no temporary is live
6: t6:use(7), t4 t5 not live t3 dead Use in 4
5: t5:use(6)
4: t4:use(6), t1 t3 not live t4 dead Use in 6
3: t3:use(4), t2 not live t5 dead Use in 6
2: t2:use(3)
1: t1:use(4) t6 dead Use in 7
13
1
Example … STATEMENT
1: t1 = a * a
2: t2 = a * b
3: t3 = 2 * t2
2 4: t4 = t1 + t3
t1 5: t5 = b * b
t2 1: t1 = a * a 6: t6 = t4 + t5
3 2: t2 = a * b 7: X = t6
t3
3: t2 = 2 * t2
4
4: t1 = t1 + t2
5 t4 5: t2 = b * b
t5 6: t1 = t1 + t2
6
t6
7: X = t1
7 14
Code Generator
• consider each statement
• remember if operands are in registers
• Register descriptor
– Keep track of what is currently in each register.
– Initially all the registers are empty
• Address descriptor
– Keep track of location where current value of
the name can be found at runtime
– The location might be a register, stack,
memory address or a set of those
15
Code Generation Algorithm
for each X = Y op Z do
• invoke a function getreg to
determine location L where X must
be stored. Usually L is a register.
• Consult address descriptor of Y to
determine Y'. Prefer a register for Y'.
If value of Y not already in L generate
Mov Y', L
16
Code Generation Algorithm
• Generate
op Z', L
Again prefer a register for Z. Update
address descriptor of X to indicate X is in L.
• If L is a register, update its descriptor to
indicate that it contains X and remove X
from all other register descriptors.
• If current value of Y and/or Z have no next
use and are dead on exit from block and
are in registers, change register descriptor
to indicate that they no longer contain Y
and/or Z.
17
Function getreg
1. If Y is in register (that holds no other values)
and Y is not live and has no next use after
X = Y op Z
then return register of Y for L.
2. Failing (1) return an empty register
3. Failing (2) if X has a next use in the block or
op requires register then get a register R,
store its content into M (by Mov R, M) and
use it.
4. else select memory location X as L
18
Example
Stmt code reg desc addr desc
t1=a-b mov a,R0
sub b,R0 R0 contains t1 t1 in R0
t1=a-b
t2=a-c mov a,R1 R0 contains t1 t1 in R0 t2=a-c
sub c,R1 R1 contains t2 t2 in R1 t3=t1+t2
d=t3+t2
t3=t1+t2 add R1,R0 R0 contains t3 t3 in R0
R1 contains t2 t2 in R1
d=t3+t2 add R1,R0 R0 contains d d in R0
mov R0,d d in R0 and
memory
19
DAG representation of basic blocks
• useful data structures for implementing
transformations on basic blocks
• gives a picture of how value computed by a
statement is used in subsequent statements
• good way of determining common sub-
expressions
• A dag for a basic block has following labels on the
nodes
– leaves are labeled by unique identifiers, either variable
names or constants
– interior nodes are labeled by an operator symbol
– nodes are also optionally given a sequence of
identifiers for labels
20
DAG representation: example
1. t1 := 4 * i t6 prod
2. t2 := a[t1] +
3. t3 := 4 * i
4. t4 := b[t3] prod0 * t5
5. t5 := t2 * t4 t4 (1)
t2 [ ]
6. t6 := prod + t5 [] <=
7. prod := t6 t1 t3
8. t7 := i + 1 a b * + t7 i 20
9. i := t7
10. if i <= 20 goto (1) 4 i0 1
21
Code Generation from DAG
S1 = 4 * i S1 = 4 * i
S2 = addr(A)-4 S2 = addr(A)-4
S3 = S2[S1] S3 = S2[S1]
S4 = 4 * i
S5 = addr(B)-4 S5 = addr(B)-4
S6 = S5[S4] S6 = S5[S4]
S 7 = S3 * S6 S 7 = S3 * S6
S8 = prod+S7
prod = prod + S7
prod = S8
S9 = I+1
I=I+1
I = S9
If I <= 20 goto (1)
If I <= 20 goto (1)
22
Rearranging order of the code
• Consider -
X
following basic
block t3
t1
+ -
t1 = a + b
t2 = c + d a b e + t2
t3 = e –t2
X = t1 –t3
c d