Jawaharlal Nehru Engineering College: Laboratory Manual

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Jawaharlal Nehru Engineering College

Laboratory Manual

of

Principle of Compiler Design

For

Final Year Students


Dept: Computer Science & Engineering

1
FOREWORD

It is my great pleasure to present this laboratory manual for final year


engineering students for the subject of Principle of compiler design keeping
in view the vast coverage required for understanding the concept of comiler
design

As a student, many of you may be wondering with some of the questions in


your mind regarding the subject and exactly what has been tried is to answer
through this manual.

As you may be aware that MGM has already been awarded with ISO 9000
certification and it is our endure to technically equip our students taking the
advantage of the procedural aspects of ISO 9000 Certification.

Faculty members are also advised that covering these aspects in initial
stage itself, will greatly relived them in future as much of the load will be taken
care by the enthusiasm energies of the students once they are conceptually clear.

Dr. S.D.Deshmukh,
Principal

2
LABORATORY MANUAL CONTENTS

This manual is intended for the Final year students of IT & CSE branches

in the subject of PCD. This manual typically contains practical/Lab Sessions

related PCD covering various aspects related the subject to enhanced

understanding. Strengthen knowledge of a procedural programming

language. Further develop your skills in software development using a

procedural language.

This course will also helpful for student for understanding design of a

compiler. We have made the efforts to cover various aspects of the subject

covering these labs encompass the regular materials as well as some

advanced experiments useful in real life applications. Programming aspects

will be complete in it to make it meaningful, elaborative understandable

concepts and conceptual visualization.

Students are advised to thoroughly go through this manual rather than only

topics mentioned in the syllabus, as practical aspects are the key to

understanding and conceptual visualization of theoretical aspects covered

in the books.

Dr.V.B.Musande Ms.Saroj S.Date,Mr. Mahendra K.Ugale


HOD,CSE CSE Dept.

3
MGM’s

Jawaharlal Nehru Engineering College, Aurangabad

Department of Computer Science and Engineering

Vision of CSE Department:


To develop computer engineers with necessary analytical ability and human values who can
creatively design, implement a wide spectrum of computer systems for welfare of the society.

Mission of the CSE Department:


I. Preparing graduates to work on multidisciplinary platforms associated with their
professional position both independently and in a team environment.

II. Preparing graduates for higher education and research in computer science and engineering
enabling them to develop systems for society development.

Programme Educational Objectives:

Graduates will be able to


I. To analyze, design and provide optimal solution for Computer Science & Engineering and
multidisciplinary problems.
II. To pursue higher studies and research by applying knowledge of mathematics and
fundamentals of computer science.
III. To exhibit professionalism, communication skills and adapt to current trends by engaging
in lifelong learning.
Programme Outcomes (POs):

Engineering Graduates will be able to:

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.

3. Design/development of solutions: Design solutions for complex engineering problems and


design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.

4. Conduct investigations of complex problems: Use research-based knowledge and research


methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.

5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.

7. Environment and sustainability: Understand the impact of the professional engineering


solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.

8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.

9. Individual and team work: Function effectively as an individual, and as a member or leader
in diverse teams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with the


engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.

12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
SUBJECT INDEX

1. Program to convert Non-deterministic finite automaton (NFA) to Deterministic


finite automaton (DFA).

2. Program to generate lexical tokens.

3. Study of LEX/FLEX tool and write LEX program to identify tokens: integer
numbers, decimal numbers, identifiers, keywords, arithmetic operators, relational
operators.

4. Program to implement LR parser.

5. Study of YACC tool.

6. Program to implement any one code optimization technique.

7. Implementation of any one method of Intermediate Code Generator.

8. Implementation of code generator.

4
DOs and DON’T DOs in Laboratory:

1. Make entry in the Log Book as soon as you enter the Laboratory.

2. All the students should sit according to their roll numbers starting from their left
to right.

3. All the students are supposed to enter the terminal number in the log book.

4. Do not change the terminal on which you are working.

5. All the students are expected to get at least the algorithm of the program/concept
to be implemented.

6. Strictly observe the instructions given by the teacher/Lab Instructor.

Instruction for Laboratory Teachers:

1. Submission related to whatever lab work has been completed should be done
during the next lab session. The immediate arrangements for printouts related to
submission on the day of practical assignments.

2. Students should be taught for taking the printouts under the observation of lab
teacher.

3. The promptness of submission should be encouraged by way of marking and


evaluation patterns that will benefit the sincere students.

5
1. Lab Exercise

Aim: -Program to convert Non-deterministic finite automaton(NFA) to


Deterministic finite automaton(DFA).

TOOLS: gcc/c compiler

STANDARD PROCEDURE:
THEORY:
FINITE AUTOMATA: Machine Model that Recognizes Regular Languages
The finite automata, (FA), machine M defined by the 5-tuple M = {Q, ∑, δ ,q0,
F}; where the
 alphabet is: ∑ = {0,1};
 Set of states is: Q = {s0, s1, s2};
 Starting state is s0; the final state is: F = {s2};
 Transitions δ are defined by the table below.
δ 0 1.
s0 s0 s1
s1 {s1, s2} s1
s2 s2 Ø

M can also be represented by the transition


graph:

0
1, 0 0
1 0
s0 s1 s2

This figure (which corresponds to the transition table above) is a non-deterministic


finite automaton, NFA. The big circles represent states and the double circles
represent accepting or final states. The state with an unlabeled arrow coming from
its left is the starting state
NFA vs. DFA

Deterministic Finite Automaton (DFA)


• For every state there is exactly one outgoing edge per alphabet symbol.

• For each symbol in the alphabet there is a corresponding output and there is only
one.
Non-Deterministic Finite Automaton (NFA)
• At least one of the states has more than one outgoing edge for the same alphabet
symbol

(The transition function is not function, it is a relation.)

• There may be e transitions (transitions that occur without the presence of an


input symbol

from alphabet.)
ALGORITHM TO SIMULATE A DFA
Algorith1

Input: A string x ended by an EOF character and a DFA defined as a 5-tuple


with s0 as its initial state and F as its set of accepting states.

Output: A “YES” if the DFA accepts x or “NO” otherwise.

Method

Apply the “pseudo code” algorithm below to the input string x. The function
move(x, c) gives the state to which there is a transition from state s on input
character c. The function nextchar returns the next character of the input string x.
s = s0;
c = nextchar;
while c =! EOF
s = move(s, c);
c=nextchar;
end
if s is in F
then return”yes”
else return “no”
end
For e.g. convert Given NFA to its equivalent DFA

q0 {q0} {q1}

q1 {q1, q2} {q1}


*q2 {q2} {Ø}

Equivalent DFA is given by-

 [qo] [q0] [q1]


[q1] [q1q2] [q1]
* [q1q2] [q1q2] [q1]

CONCLUSIONS:
With the help of given procedure and information about the Finite Automata,
we can write program to convert Non Deterministic Finite Automata to
Deterministic Finite Automata.
Aim:Program to generate lexical tokens.

STANDARD PROCEDURE:
TOOLS: gcc/c compiler

THEORY:
THE ROLE OF LEXICAL ANALYZER

The lexical analyzer is the first phase of a compiler. Its main task is to read
the input characters and produce as output a sequence of tokens that the parser uses
for syntax analysis. Upon receiving a “get next token” command from the parser,
the lexical analyzer reads input characters until it can identify the next token.

Since the lexical analyzer is the part of the compiler that reads the source
text, it may also perform certain secondary tasks at the user interface. One such
task is stripping out from the source program comments and white spaces in
the form of blank, tab, and new line characters. Another is correlating error
messages from the compiler with the source program.
Sometimes lexical analyzers are divided into a cascade of two phases
first called “scanning” and the second “lexical analysis”. The scanner is responsible
for doing simple tasks, while the lexical analyzer proper does the more complex
operations.
Algorithm:

1. Declare an array of characters, as buffer to store the tokens ,that is,’lexbuffer’;

2. Get token from user put it into character type of variable, say ‘c’.

3. If ‘c’ is blank then do nothing.

4. If ‘c’ is new line character line=line+1.

5. If ‘c’ is digit, set token_val ,the value assigned for a digit and return ‘NUMBER’.

6. If ‘c’ is proper token then assign the token value.

Print the complete table with

a. Token entered by the user

b. Associated token value.

Output:
Enter the Statement

If(a= =1) then b++;

Token Code Value

--------------------------------

If 1 -
( 5 1
A 8 Pointer to Symbol
== 6 1table
1 9 Pointer to literal
) 5 2
table
then 2 -
B 8 Pointer to literal
++ 6 2table
; 7 1
Algorithm to check the whether the string is KETWORD or not
1. Start
2. Declare the character storing the keyword
S[5][10]={“if”,”else”,”for”,”int”,”goto”,”return”} and another character
array to store the string to be compared st[], initialize integer variable l,
flag = 0 ,m
3. Input the string that is to be compared st[]
4. Repeat step 5 to 6 till counter i become equal to number of keyword
stored in an array.
5. Compared the string entered by the user with the string in the character
array by using m=strcmp(st,s[i]),where strcmp()function returns TRUE if
strings are equal
6. i=i+1
7. if flag equal to one then it is keyword
8. Else it is not keyword
9. Stop

13
Output:
Enter the string : return
It is KEYWORD

Enter the string : hello


It is not KEYWORD

Algorithm to find whether the string is CONSTANT or not


1. Start
2. Declare the character array str[] and initialize integer variable len , a= 0.
3. Input the string from user
4. Find the length of the string
5. Repeat step 6 to 7 till a<len
6. If str[a] is numeric character then a++
7. Else it is not constant and break from the loop and goto step 9.
8. if a = = len then print that entered string is constant
9. else it is not a constant
10. Stop.
Output:
Input a string : 24

It is a CONSTANT

Input a string : a34

It is a NOT CONSTANT

Conclusion:
With the help of given procedure and information about the Lexical Analyzer Phase
we can write program to perform simulation of FA for implementation of Lexical
analyzer phase.
3.Lab Exercise

Aim: Study of LEX/FLEX tool and write LEX program to identify tokens:
integer numbers, decimal numbers, identifiers, keywords, arithmetic
operators, relational operators.

STANDARD PROCEDURE:

TOOLS: Flex Tool


Operating System: Linux

THEORY:
Lex is officially known as a "Lexical Analyzer". It’s main job is to break
up an input stream into more into meaningful units, or tokens. For example,
consider breaking a text file up into individual words.
More pragmatically, Lex is a tool for automatically generating a lexer ( also
known as scanner) starting from a lex specification (*.l file 2 ).

The skeleton of a lex specification file is given in Figure 3.


The rules section is composed of tuples of the form <pattern, action>. As it can be
seen from the following examples, are specified by regular expressions.

Example
<pattern> <action to take when matched> [A-Za-z]+ printf("this is a word") ;

<pattern> <action to take when matched> [0-9]+ printf("this is a number") ;


+ matches 1 or more instance

? matches 0 or 1 of the preceding regular expression

| matches the preceding or following regular expression

[] defines a character class

() groups enclosed regular expression into a new regular expression

"..." matches everything within the " " literally

x|y x or y

{i} definition of i

x/y x, only if followed by y (y not removed from input)

x{m,n} m to n occurrences of x

ˆx x, but only at beginning of line


x$ x, but only at end of line

"s" exactly what is in the quotes (except for "\" and following character)
Algorithm:
1. Open file in text editor
2. Enter keywords, rules for identifier and constant, operators and relational
operators. In the following format
a) %{
Definition of constant /header files

%}

b) Regular Expressions

%%

Transition rules

%%
c) Auxiliary Procedure (main( ) function)
3. Save file with .l extension e.g. Mylex.l
4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool
will convert “.l” file into “.c” language code file i.e. lex.yy.c
5. Compile the fi le lex.yy.c e.g. gcc lex.yy.c .After compiling the file
lex.yy.c, this will create the output file a.out
6. Run the file a.out e.g. ./a.out
7. Give input on the terminal to the a.out file upon processing output will be
displayed.
Example:
%{
#include <stdio.h>
#include <ctype.h>
%}

%%
id[a-z0-9]+
digit[0-9]
%%
int main( )
{
If|then|else|begin|end {“Keyword is:%s”,yytext);
{id} {“identifier is :%s”,yytext);
{digit}+ {constant is :%s,yytext);
. {“invalid token”};

yylex();

Output:
For lexical analyzer
[root@localhost]# lex Mylex.l
[root@localhost]# gcc lex.yy.c
[root@localhost]# ./a.out
123
Constant is 123
a
identifier is a
Conclusion:

With the help of given procedure and information about the Lex Tool,we can write
lex program to identify different tokens.
4. Lab Exercise

Aim: Program to implement LR parser.


STANDARD PROCEDURE:
TOOLS:
Operating System:

THEORY:

LR Parser:
LR parsing is a bottom up syntax analysis technique that can be applied to a large
class of context free grammars. L is for left –to –right scanning of the input and R
for constructing rightmost derivation in reverse.

General Framework:

Let x1x2 x3… … … xk be the string to be parsed by LR parsing method.

A configuration of an LR parser is a pair whose first component is the stack and


whose second
component is the unexpended part of the input string:

s0Y1s1Y2si+1… … Yjsi+k xp… … xk

A B
part which is already part of the input string.
considered for parsing.
B

STACK

0……… |
Yjsi+k xpsi+k*|xp+1… … xk

State from the table

Reduce:
[s0………..si+gA] xp… … xk

Reduce A  α, | α | = N
Pop 2*| N | elements including states
Push A
Put a state s
[s0………..si+gAs] xp… … xk

Example:
Consider the following grammar

1. S’  S
2. S  aABe
3. A  Abc
4. A  b
5. B  d
Table for parsing action and goto function for the given grammar is:-

STATE action goto


a b c d e $ S A B
I0 s2 1
I1 acc
I2 s4 3
I3 s6 s7 5
I4 r4 r4
I5 s8
I6 s9
I7 r5
I8 r2
I9
I r3 r3

Where
1. rj means reduce by production numbered j,
2. acc means accept
3. si means shift and stack state i.
4. blank means error.

STATE input action


0 abbcbcde$ shift
0a2 bbcbcde$ shift
0a2b4 bcbcde$ Ab
0a2A3 bcbcde$ shift
0a2A3b6 cbcde$ shift
0a2A3b6c9 bcde$ A  Abc
02A3 bcde$ shift
02A3b6 cde$ shift
0a2A3b6c9 de$ A  Abc
0a2A3 de$ shift
0a2A3d7 e$ Bd
0a2A3B5 e$ shift
0a2A3B5e8 $ S  aABe
0S1 $ accepted

Moves of LR parser
Forming the Parsing table:
1. Item:
Item of a grammar G is a production of G with a dot at some position of the right
side.
The production A  XYZ has the following four items
A  .XYZ
A  X.YZ
Z  XY.Z
A  XYZ.
2. Item closure:
If A  X.YZ belongs to Is
Yβ
Then add Y  .β in Is

Item closure for the above example

I0 = {[S’  .S], [S  .aABe]}


I1 = {S’  S.}
I2 = {S  a.ABe, A  .Abc, A  .b}
I3 = {S  aA.Be, A  A.bc, B  .d}
I4 = {A  b.}
I5 = {S  aAB.e}
I6 = {A  Ab.c}
I7 = {B  d.}
I8 = {S  aABe.}
I9 = {A  Abc.}

Constructing the GOTO graph from the LR(0) Items ABOVE derived.
1. Enlist all the collection of Items: C (I0, I1,…..)
2. Put all the transitions in the between the Items in he GOTO graph.
Rules for putting the label on the transition
If there is a transition from,
A->α . Xβ to A-> αX . β
then, the transition in the GOTO Graph is labeled X
If there is a transition
A-> α . Bβ to B-> .γ
then, the transition is labeled ε, and since we take ε-closure of all Items, these
productions lie in the same item as that of A->α . Bβ to A-> αB . β.
So, the GOTO Graph of the given grammar is produced follows.
S’-> .S
a S-> a.Abe
S->.aABe b
A->.Abc

A->.b A -> b.
$ I0 I2
A
I4
S’ -> S.
S->aA.Be S->aABe.
e
A->A.bc B
b B->.d I8
I1 S->aAB.e

A->Ab.c d I3

B->d.
I5

I6 I7
c

A->Abc.

I9

The GOTO Graph for the given grammar.


From the GOTO Graph we now start filling the Parsing table for the SLR (simple
LR) parser, Which contains State i as the row index and action & goto as the
vertical column index.
The Rules for filling the action entries in the table are:
1.If [ A-> α .a β ] is in Ii and goto ( Ii , a ) = Ij, then set action [i ,a] to “shift j “.Here
a must be a terminal.
2. If [A-> α .] is in Ii, then set action[i , a ] to “reduce A -> α “ for all a in FOLLOW
(A); here A may not be in S’.
3. If [ S’-> S.] is in Ii , then set action [I, $] to accept.

If any conflicting actions are generated by the given rules, then we say the grammar
is not in SLR or LR (0) and we fail to produce a parser.
We then fill the goto entries.
The goto transitions for the state i are constructed for all the non-terminals A
using the Rule:
If goto( Ii, A ) = Ij , then goto [i , A ] = j .

Note : goto (Ii, A) is an entry in graph, whereas goto[i, A] is the corresponding


entry in parsing table .
Also the initial state of the parser is one constructed from the set of items containing
[S’-> .S]
So, the Parsing table for the grammar comes out to be.
Using this we go ahead with the parsing action.
A grammar having a SLR parsing table is said to be a SLR grammar, and it has a
lookahead of 1.
We have a noteworthy point in the grammar we just now used. We have the starting
production as
S’-> S and then S had its own set of productions. Intuitively, we see that, there arises
no need for the extra production rule S’->S and this rule might be treated as a
dummy rule. We could always do away with S’ and have S as the start state.
However, closer inspection brings to light on the fact that, while staring the parsing,
we may not have any problem, but on making reduction to start state (remember we
are parsing bottom up!), we might just end up having reduce conflicts. So, a need to
have a ‘one-level’ higher hierarchy is necessitated and is provided by the rule S’-> S,
whenever the start state S, has multiple productions.
We now look into the problems underlying the SLR grammar, and move onto
greener pastures where we could possibly cope up with these problems.
Problems with SLR parser:
The SLR parser discussed in the earlier class has certain flaws.
1. A state may include a final item (one in which the position marker is at the
end) and a non- final item. This is called a shift-reduce conflict
2. A state may include two different final items. This is called a reduce-reduce
conflict
• Can we have reduce-reduce conflicts in unambiguous grammars?
• Can we have a shift-shift conflict ?
However SLR reduces only when the next token is in Follow of the left-hand side
of the production. Such choices may help to resolve reduce-reduce
conflicts.However, still similar conflicts exist:
->Shift-reduce conflicts include the case in which
Follow of the final lhs of the final item overlaps with first of the remainder
-> Reduce-reduce conflicts include the case Follow of both left-hand-
sides overlap.

Conclusion:
With the help of given procedure and information about the LR Parse,we can write
program for implementation of LR parser.
5. Lab Exercise

Aim :Study of YACC tool.

STANDARD PROCEDURE:

TOOLS: yacc
Operating System: Linux

THEORY:
YACC stands for Yet Another Compiler Compiler. Its GNU version is called
Bison. YACC translates any grammar of a language into a parser for that
language. Grammars for YACC are described using a variant of Backus Naur
Form (BNF). A BNF grammar can be used to express Context-free languages. By
convention, a YACC file has the suffix .y.

Structure of a yacc file


A yacc file looks much like a lex file:
...definitions...
%%
...rules...
%%
...code...
Definitions All code between %{ and %} are copied to the beginning of the
resulting C file. Rules A number of combinations of pattern and action: if the
action is more than a single command it needs to be in braces.
Code This can be very elaborate, but the main ingredient is the call to
yylex, the lexical analyzer. If the code segment is left out, a default main is
used which only call yylex.
Yacc File Structure

%{

#include <stdio.h>

int yylex(void);

void yyerror(char *);

%}

%token INTEGER

%%

program:

program expr ’\n’ { printf("%d\n",$2);}

expr:

INTEGER { $$ = $1; }

| expr ’+’ expr { $$ = $1 + $3; }

| expr ’-’ expr { $$ = $1 - $3; }

%%

int main()
{
yyparse();
}
void yyerror(char *s)
{
printf("%s",s);
}

Algorithm:

1. Open file in text editor

2. Specify grammar rules and associated action in the


following format a. %{
Statements (Include statement optional)

%}

b. Lexical tokens, grammar, precedence and associated information.

%%

Grammar, rules and action

%%

c. Auxiliary Procedure (main( ) function)

3. Save file with .l extension e.g. Mylex.l

4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool
will convert

“.l” file into “.c” language code file i.e. lex.yy.c

5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file
lex.yy.c, this will create the output file a.out
7. Run the file a.out e.g. ./a.out

8. Give input on the terminal to the a.out file upon processing output will be
displayed
<parse.l>
%{
#include<stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+ {yylval.dval=atof(yytext);
return DIGIT;
}
\n|. return yytext[0];
%%

<parser.y>
%{
#include<stdio.h>
%}
%union
{
double dval;
}

%token <dval> DIGIT


%type <dval> expr
%type <dval> term
%type <dval> factor

%%
line:expr’\n’ printf(“%g\n”,$1);
expr: expr '+' term {$$=$1 + $3 ;}
| term
;

term: term '*' factor {$$=$1 * $3 ;}


| factor
;
factor: '(' expr ')' {$$=$2 ;}
| DIGIT
;

%%
int main()
{
yyparse();
}

yyerror(char *s)
{
printf("%s",s);
}

Output:

#lex parser.l
#yacc –d parser.y
#cc lex.yy.c y.tab.c –ll –lm
#./a.out
2+3
5.0000

Conclusion:
With the help of given information and procedure we can write Yacc program
for construction of compiler.
6. Lab Exercise

Aim: Program to implement any one code optimization technique.

STANDARD PROCEDURE:

Programming Language: C
Operating System: Linux or Windows

Theory:

To create an efficient target language program, a programmer needs


more than an optimizing compiler. In this section, we review the options available
to a programmer and a compiler for creating efficient target programs. We mention
of code-improving transformations that a programmer and a compiler writer can be
expected to use it improve the performance of a program.

Criteria for Code-Improving Transformations

The best program transformations are those that yield the most benefit for the
least effort. The transformations provided by an optimizing compiler should
have several properties.

First, a transformation must preserve the meaning of programs. That is, an


“optimization” must not change the output produced by a program for a given
input, or cause an error, such as division by zero, that was not present in the
original version of the source program.

Second, a transformation must, on the average, speed up programs by


measurable amount.

Third, a transformation must be worth the effort. It does not make sense for a
compiler writer to expend the intellectual effort to implement a code-improving
transformation

Getting Better Performance

Dramatic improvements in the running time of a program –such as cutting the


running time from a few hours to few seconds-are usually obtained by improving
the program at all levels.
Fig.Places for improvements by the user and the compiler

1 Eliminating Loop Invariant Computations

To eliminate loop invariant computations, we first identify the invariant


computations and then move them outside loop if the move does not lead to a
change in the program's meaning. Identification of loop invariant computation
requires the detection of loops in the program. Whether a loop exists in the
program or not depends on the program's control flow, therefore, requiring a
control flow analysis. For loop detection, a graphical representation, called a
"program flow graph," shows how the control is flowing in the program and how
the control is being used. To obtain such a graph, we must partition the
intermediate code into basic blocks. This requires identifying leader statements,
which are defined as follows:
1. The first statement is a leader statement.
2. The target of a conditional or unconditional goto is a leader.
3. A statement that immediately follows a conditional goto is a leader.

A basic block is a sequence of three-address statements that can be entered


only at the beginning, and control ends after the execution of the last statement,
without a halt or any possibility of branching, except at the end.
2 Algorithm to Partition Three-Address Code into Basic Blocks

To partition three-address code into basic blocks, we must identify the leader
statements in the three-address code and then include all the statements, starting
from a leader, and up to, but not including, the next leader. The basic blocks into
which the three-address code is partitioned constitute the nodes or vertices of the
program flow graph. The edges in the flow graph are decided as follows. If B1 and
B2 are the two blocks, then add an edge from B1 to B2 in the program flow graph,
if the block B2 follows B1 in an execution sequence. The block B2 follows B1 in
an execution sequence if and only if:

1. The first statement of block B2 immediately follows the last statement of block
B1 in the three-address code, and the last statement of block B1 is not an
unconditional goto statement.
2. The last statement of block B1 is either a conditional or unconditional goto
statement, and the first statement of block B2 is the target of the last statement of
block B1.

For example, consider the following program fragment:

Fact(x
)
{
int f = 1;
for(i = 2; i<=x; i++)

48
f = f*i;
return(f);
}
The three-address-code representation for the program fragment above is:

1. f = 1;
2. i = 2
3. if i <= x goto(8)
4. f = f *i
5. t1 = i + 1
6. i = t1
7. goto(3)
8. goto calling program
The leader statements are:
Statement number 1, because it is the first statement.
Statement number 3, because it is the target of a goto.
Statement number 4, because it immediately follows a conditional
goto statement.
Statement number 8, because it is a target of a conditional goto
statement.
Therefore, the basic blocks into which the above code can be partitioned
are as follows, and the program flow graph is shown in Figure 1.

Block B1:

Block B2:

Block B3:

Block B4:
Fig-Program Flow Graph
A loop is a cycle in the flow graph that satisfies two properties:

1. It should have a single entry node or header, so that it will be possible to


move all of the loop invariant computations to a unique place, called a
"preheader," which is a block/node placed outside the loop, just in front of
the header.
2. It should be strongly connected; that is, it should be possible to go from
any node of the loop to any other node while staying within the loop. This is
required until at least some of the loops get executed repeatedly.

If the flow graph contains one or more back edges, then only one or more loops/
xcycles exist in the program. Therefore, we must identify any back edges in the
flow graph.

4 Identification of the Back Edges


To identify the back edges in the flow graph, we compute the
dominators of every node of the program flow graph. A node a is a dominator of
node b if all the paths starting at the initial node of the graph that reach to node b go
through a. For example, consider the flow graph in Figure 2. In this flow graph, the
dominator of node 3 is only node 1, because all the paths reaching up to node 3
from node 1 do not go through node 2.

Figure : The flow graph back edges are identified by computing the dominators.

Dominator (dom) relationships have the following properties:

1. They are reflexive; that is, every node dominates itself.


2. That are transitive; that is, if a dom b and b dom c, this implies a dom c.
5 Reducible Flow Graphs

Several code-optimization transformations are easy to perform on reducible flow


graphs. A flow graph G is reducible if and only if we can partition the edges into
two disjointed groups, forward edges and back edges, with the following two
properties:

1. The forward edges form an acyclic graph in which every node can be
reached from the initial node G.
2. The back edges consist only of edges whose heads dominate their tails.

For example, consider the flow graph shown in Figure 3. This flow graph has no
back edges, because no edge's head dominates the tail of that edge. Hence, it could
have been a reducible graph if the entire graph had been acyclic. But that is not the
case. Therefore, it is not a reducible flow graph.

Figure :A flow graph with no back edges

After identifying the back edges, if any, the natural loop of every back edge must
be identified. The natural loop of a back edge a → b is the set of all those
nodes that can reach a without going through b, including node b itself.
Therefore, to find a natural loop of the back edge n →d, we start with node n and
add all the predecessors of node n to the loop. Then we add the predecessors of the
nodes that were just added to the loop; and we continue this process until we reach
node d. These nodes plus node d constitute the set of all those nodes that can reach
node n without going through node d. This is the natural loop of the edge n → d.
Therefore, the algorithm for detecting the natural loop of a back edge is:

Input : back edge n→ d.

Output: set loop, which is a set of nodes forming the natural loop of the back edge
n → d.
main()
{
loop = { d } / * Initialize by adding node d to the set loop*/
insert(n); /* call a procedure insert with the node n */
}
procedure insert(m)
{
if m is not in the loop then
{
loop = loop ∪ { m }
for every predecessor p of m do
insert(p);
}
}

For example in the flow graph shown in Figure 1, the back edges are edge B3
→ B2, and the loop is comprised of the blocks B2 and B3.

After the natural loops of the back edges are identified, the next task is to identify
the loop invariant computations. The three-address statement x = y op z, which
exists in the basic block B (a part of the loop), is a loop invariant statement if all
possible definitions of b and c that reach upto this statement are outside the loop,
or if b and c are constants, because then the calculation b op c will be the same
each time the statement is encountered in the loop. Hence, to decide whether the
statement x = b op c is loop invariant or not, we must compute the u−d chaining
information. The u−d chaining information is computed by doing a global data
flow analysis of the flow graph. All of the definitions that are capable of reaching
to a point immediately before the start of the basic block are computed, and we call
the set of all such definitions for a block B the IN(B). The set of all the definitions
capable of reaching to a point immediately after the last statement of block B will
be called OUT(B). We compute both IN(B) and OUT(B) for every block B,
GEN(B) and KILL(B), which are defined as:

GEN(B): The set of all the definitions generated in block B.


KILL(B): The set of all the definitions outside block B that define the same
variables as are defined in block B.

Consider the flow graph in Figure 4.The GEN and KILL sets for the basic blocks
are as shown in Table 1.

52
Table 1: GEN and KILL sets for Figure 4 Flow Graph
Block GEN KILL
B1 {1,2} {6,10,11}
B2 {3,4} {5,8}
B3 {5} {4,8}
B4 {6,7} {2,9,11}
B5 {8,9} {4,5,7}
B6 {10,11} {1,2,6}

Figure 4: Flow graph with GEN and KILL block sets.

IN(B) and OUT(B) are defined by the following set of equations, which are
called "data flow equations":

The next step, therefore, is to solve these equations. If there are n nodes, there will
be 2n equations in 2n unknowns. The solution to these equations is not generally

53
unique. This is because we may have a situation like that shown in Figure 5, where
a block B is a predecessor of itself.

Figure Nonunique solution to a data flow equation, where B is a predecessor of


itself.

If there is a solution to the data flow equations for block B, and if the solution is
IN(B) = IN0 and OUT(B) = OUT0, then IN0 ∪ {d} and OUT0 ∪ {d}, where d is
any definition not in IN0. OUT0 and KILL(B) also satisfy the equations, because if
we take OUT0 ∪ {d} as the value of OUT(B), since B is one of the predecessors
of itself according to IN(B) = ∪ OUT(P), d gets added to IN(B), because d is not
in the KILL(B). Hence, we get IN(B) = IN0 ∪ {d}. And according to OUT(B) =
IN(B) − KILL(B) GEN(B), OUT(B) = OUT0 ∪ {d} gets satisfied. Therefore,
IN0, OUT0 is one of the solutions, whereas IN0 ∪ {d}, OUT0 ∪ {d} is
another solution to the equations—no unique solution. What we are interested in is
finding smallest solution, that is, the smallest IN(B) and OUT(B) for every block B,
which consists of values that are in all solutions.

The algorithm for computing the smallest IN(B) and OUT(B) is as follows:

1. For each block B do


2. {
IN(B)= φ
OUT(B)= GEN(B)
}
3. flag = true
4. while (flag) do

54
5.
{
flag = false
for each block B do
{
INnew(B) = Φ

55
6. for each predecessor P
of B
INnew(B) = INnew(B) ∪ OUT(P)
if INnew(B) ≠ IN(B) then
{
flag = true
IN(B) = INnew(B)
OUT(B) = IN(B) - KILL(B) ∪ GEN(B)
}
}
}

Initially, we take IN(B) for every block that is to be an empty set, and we take
OUT(B) for GEN(B), and we compute INnew(B). If it is different from IN(B), we
compute a new OUT(B) and go for the next iteration. This is continued until IN(B)
comes out to be the same for every B in a previous or current iteration.

Conclusion:

With help of given information and procedure we can implement code


optimization for common sub-expression elimination, loop invariant code
movement.

56
7. Lab Exercise
Aim: Implementation of any one method of Intermediate Code Generator.
STANDARD PROCEDURE:

Programming language-C
Operating System: Linux or Windows

THEORY:
Program to convert Infix expression to Postfix

While translating a source program into a functionally equivalent object code


representation, a
parser may first generate an intermediate representation. This makes retargeting of
the code

possible and allows some optimizations to be carried out that would otherwise not
be possible.

The following are commonly used intermediate representations:

1. Postfix notation

2. Syntax tree

3. Three-address code

Postfix Notation

In postfix notation, the operator follows the operand. For example, in the expression
(a − b) * (c+ d) + (a − b), the postfix representation is:

Syntax Tree

The syntax tree is nothing more than a condensed form of the parse tree. The
operator and

keyword nodes of the parse tree (Figure 1) are moved to their parent, and a chain of
single

productions is replaced by single link (Figure ).

57
Figure 1: Parse tree for the string id+id*id.

Figure 2: Syntax tree for id+id*id.

Three-Address Code

Three address code is a sequence of statements of the form x = y op z. Since a


statement involves no more than three references, it is called a "three-address
statement," and a sequence of such statements is referred to as three-address code.
For example, the three-address code for the expression a + b * c + d is:

Sometimes a statement might contain less than three references; but it is still called
a threeaddress statement. The following are the three-address statements used to
represent various programming language constructs:

58
Used for representing arithmetic expressions:

Used for representing Boolean expressions:

Used for representing array references and dereferencing operations:

Used for representing a procedure call:

Infix Expression :
Any expression in the standard form like "2*3-4/5" is an Infix(Inorder) expression.
Postfix Expression :
The Postfix(Postorder) form of the above expression is "23*45/-".
Infix to Postfix Conversion :
In normal algebra we use the infix notation like a+b*c. The corresponding postfix
notation is abc*+.
The algorithm for the conversion is as follows :
 Scan the Infix string from left to right.
 Initialise an empty stack.
 If the scannned character is an operand, add it to the Postfix string. If the
scanned character is an operator and if the stack is empty Push the character to
stack.
 If the scanned character is an Operand and the stack is not empty, compare the
precedence of the character with the element on top of the stack (topStack). If
top Stack has higher precedence over the scanned character Pop the stack else
Push the scanned character to stack. Repeat this step as long as stack is not
empty and topStack has precedence over the character.
 Repeat this step till all the characters are scanned.

59
 (After all characters are scanned, we have to add any character that the stack
may have to the Postfix string.) If stack is not empty add topStack to Postfix
string and Pop the stack.
 Repeat this step as long as stack is not empty.
 Return the Postfix string.
Example :

Let us see how the above algorithm will be implemented using an example.

Infix String : a+b*c-d


Initially the Stack is empty and our Postfix string has no characters. Now, the first
character scanned is 'a'. 'a' is added to the Postfix string. The next character scanned
is '+'. It being an operator, it is pushed to the stack.
Next character scanned is 'b' which will be placed in the Postfix string. Next
character is '*'which is an operator. Now, the top element of the stack is '+' which
has lower precedence than '*', so '*' will be pushed to the stack.

The next character is 'c' which is placed in the Postfix string. Next character scanned
is '-'. The topmost character in the stack is '*' which has a higher precedence than '-'.
Thus '*' will be popped out from the stack and added to the Postfix string. Even now
the stack is not empty. Now the topmost element of the stack is '+' which has equal
priority to '-'. So pop the '+' from the stack and add it to the Postfix string. The '-' will
be pushed to the stack.

Next character is 'd' which is added to Postfix string. Now all characters have been
scanned so we must pop the remaining elements from the stack and add it to the
Postfix string. At this stage we have only a '-' in the stack. It is popped out and added
to the Postfix string. So, after all characters are scanned, this is how the stack and
Postfix string will be :

End result :
Infix String : a+b*c-d
Postfix String : abc*+d-

Algorithm:
1. Take a stack OPSTK and initialize it to be empty.
2. Read the entire string or in infix form e.g. A+B*C
3. Read the string character by character into var symbol.

60
i)If symbol is an operand Add it to the postfix string.
ii)if stack OPSTK is not empty and precedence of top of stack symbol is greater
than recently read character symbol then pop OPSTK .
topsymbol=pop(OPSTK) Add this popped topsymbol to the postfix string
iii) Repeat step iii. Till stack is not empty precedence of top of stack symbol is
greater than recently read character symbol.
iv) push symbol onto OPSTK.
4. Output any remaining operators.
Pop OPSTK till it is not empty and ad top symbol to postfix string .
Output:
------------------------------------------
Enter the Infix Notation : (A+B)*C
Postfix Notation is: AB+C*

Conclusion:

With help of given information and procedure we can implement one of the
intermediate code like infix to postfix.

61
8. Lab Exercise

Aim: Implementation of code generator.

STANDARD PROCEDURE:

Programming Language-c,c++
Operating System: Linux or Windows
THEORY:
Code generation phase-
 Final phase in the compiler model
 Takes as input:
◦ intermediate representation (IR) code
◦ symbol table information
 Produces output:
◦ semantically equivalent target program

 Compilers need to produce efficient target programs


 Includes an optimization phase prior to code generation
 May make multiple passes over the IR before generating the target program

 The back-end code generator of a compiler may generate different forms of


code, depending on the requirements:
◦ Absolute machine code (executable code)
◦ Relocatable machine code (object files for linker)
◦ Assembly language (facilitates debugging)
◦ Byte code forms for interpreters (e.g. JVM)
 Code generator has three primary tasks:
◦ Instruction selection
◦ Register allocation and assignment
◦ Instruction ordering

62
 Instruction selection
◦ choose appropriate target-machine instructions to implement the IR
statements
 Register allocation and assignment
◦ decide what values to keep in which registers
 Instruction ordering
◦ decide in what order to schedule the execution of instructions
 Design of all code generators involve the above three tasks
 Details of code generation are dependent on the specifics of IR, target
language, and run-time system

The Target Machine (Machine Model)


Working of simple code generator

 Implementing code generation requires thorough understanding of the target


machine architecture and its instruction set
 Our (hypothetical) machine:
◦ Byte-addressable (word = 4 bytes)
◦ Has n general purpose registers R0, R1, …, Rn-1
◦ Two-address instructions of the form

op source, destination
◦ Op – op-code
Source, destination – data fields
 Op-codes (op), for example
 MOV (move content of source to destination)
 ADD (add content of source to destination)
 SUB (subtract content of source from
destination)

63
There are also other ops

Mode Form Address

Absolute M M

Register R R

Indexed c(R) c+contents(R)

Indirect register *R contents(R)

Indirect indexed *c(R) contents(c+contents(R))

Literal #c N/A

Register Allocation and Assignment

 Efficient utilization of the limited set of registers is important to generate


good code
 Registers are assigned by
◦ Register allocation to select the set of variables that will reside in
registers at a point in the code
◦ Register assignment to pick the specific register that a variable will
reside in
 Finding an optimal register assignment in general is NP-complete

64
Example

Choice of Evaluation Order


When instructions are independent, their evaluation order can be changed

65
Conclusion:
With help of given information and procedure we can implement code generation
phase.

66

You might also like