Jawaharlal Nehru Engineering College: Laboratory Manual
Jawaharlal Nehru Engineering College: Laboratory Manual
Jawaharlal Nehru Engineering College: Laboratory Manual
Laboratory Manual
of
For
1
FOREWORD
As you may be aware that MGM has already been awarded with ISO 9000
certification and it is our endure to technically equip our students taking the
advantage of the procedural aspects of ISO 9000 Certification.
Faculty members are also advised that covering these aspects in initial
stage itself, will greatly relived them in future as much of the load will be taken
care by the enthusiasm energies of the students once they are conceptually clear.
Dr. S.D.Deshmukh,
Principal
2
LABORATORY MANUAL CONTENTS
This manual is intended for the Final year students of IT & CSE branches
procedural language.
This course will also helpful for student for understanding design of a
compiler. We have made the efforts to cover various aspects of the subject
Students are advised to thoroughly go through this manual rather than only
in the books.
3
MGM’s
II. Preparing graduates for higher education and research in computer science and engineering
enabling them to develop systems for society development.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader
in diverse teams, and in multidisciplinary settings.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
SUBJECT INDEX
3. Study of LEX/FLEX tool and write LEX program to identify tokens: integer
numbers, decimal numbers, identifiers, keywords, arithmetic operators, relational
operators.
4
DOs and DON’T DOs in Laboratory:
1. Make entry in the Log Book as soon as you enter the Laboratory.
2. All the students should sit according to their roll numbers starting from their left
to right.
3. All the students are supposed to enter the terminal number in the log book.
5. All the students are expected to get at least the algorithm of the program/concept
to be implemented.
1. Submission related to whatever lab work has been completed should be done
during the next lab session. The immediate arrangements for printouts related to
submission on the day of practical assignments.
2. Students should be taught for taking the printouts under the observation of lab
teacher.
5
1. Lab Exercise
STANDARD PROCEDURE:
THEORY:
FINITE AUTOMATA: Machine Model that Recognizes Regular Languages
The finite automata, (FA), machine M defined by the 5-tuple M = {Q, ∑, δ ,q0,
F}; where the
alphabet is: ∑ = {0,1};
Set of states is: Q = {s0, s1, s2};
Starting state is s0; the final state is: F = {s2};
Transitions δ are defined by the table below.
δ 0 1.
s0 s0 s1
s1 {s1, s2} s1
s2 s2 Ø
0
1, 0 0
1 0
s0 s1 s2
• For each symbol in the alphabet there is a corresponding output and there is only
one.
Non-Deterministic Finite Automaton (NFA)
• At least one of the states has more than one outgoing edge for the same alphabet
symbol
from alphabet.)
ALGORITHM TO SIMULATE A DFA
Algorith1
Method
Apply the “pseudo code” algorithm below to the input string x. The function
move(x, c) gives the state to which there is a transition from state s on input
character c. The function nextchar returns the next character of the input string x.
s = s0;
c = nextchar;
while c =! EOF
s = move(s, c);
c=nextchar;
end
if s is in F
then return”yes”
else return “no”
end
For e.g. convert Given NFA to its equivalent DFA
CONCLUSIONS:
With the help of given procedure and information about the Finite Automata,
we can write program to convert Non Deterministic Finite Automata to
Deterministic Finite Automata.
Aim:Program to generate lexical tokens.
STANDARD PROCEDURE:
TOOLS: gcc/c compiler
THEORY:
THE ROLE OF LEXICAL ANALYZER
The lexical analyzer is the first phase of a compiler. Its main task is to read
the input characters and produce as output a sequence of tokens that the parser uses
for syntax analysis. Upon receiving a “get next token” command from the parser,
the lexical analyzer reads input characters until it can identify the next token.
Since the lexical analyzer is the part of the compiler that reads the source
text, it may also perform certain secondary tasks at the user interface. One such
task is stripping out from the source program comments and white spaces in
the form of blank, tab, and new line characters. Another is correlating error
messages from the compiler with the source program.
Sometimes lexical analyzers are divided into a cascade of two phases
first called “scanning” and the second “lexical analysis”. The scanner is responsible
for doing simple tasks, while the lexical analyzer proper does the more complex
operations.
Algorithm:
2. Get token from user put it into character type of variable, say ‘c’.
5. If ‘c’ is digit, set token_val ,the value assigned for a digit and return ‘NUMBER’.
Output:
Enter the Statement
--------------------------------
If 1 -
( 5 1
A 8 Pointer to Symbol
== 6 1table
1 9 Pointer to literal
) 5 2
table
then 2 -
B 8 Pointer to literal
++ 6 2table
; 7 1
Algorithm to check the whether the string is KETWORD or not
1. Start
2. Declare the character storing the keyword
S[5][10]={“if”,”else”,”for”,”int”,”goto”,”return”} and another character
array to store the string to be compared st[], initialize integer variable l,
flag = 0 ,m
3. Input the string that is to be compared st[]
4. Repeat step 5 to 6 till counter i become equal to number of keyword
stored in an array.
5. Compared the string entered by the user with the string in the character
array by using m=strcmp(st,s[i]),where strcmp()function returns TRUE if
strings are equal
6. i=i+1
7. if flag equal to one then it is keyword
8. Else it is not keyword
9. Stop
13
Output:
Enter the string : return
It is KEYWORD
It is a CONSTANT
It is a NOT CONSTANT
Conclusion:
With the help of given procedure and information about the Lexical Analyzer Phase
we can write program to perform simulation of FA for implementation of Lexical
analyzer phase.
3.Lab Exercise
Aim: Study of LEX/FLEX tool and write LEX program to identify tokens:
integer numbers, decimal numbers, identifiers, keywords, arithmetic
operators, relational operators.
STANDARD PROCEDURE:
THEORY:
Lex is officially known as a "Lexical Analyzer". It’s main job is to break
up an input stream into more into meaningful units, or tokens. For example,
consider breaking a text file up into individual words.
More pragmatically, Lex is a tool for automatically generating a lexer ( also
known as scanner) starting from a lex specification (*.l file 2 ).
Example
<pattern> <action to take when matched> [A-Za-z]+ printf("this is a word") ;
x|y x or y
{i} definition of i
x{m,n} m to n occurrences of x
"s" exactly what is in the quotes (except for "\" and following character)
Algorithm:
1. Open file in text editor
2. Enter keywords, rules for identifier and constant, operators and relational
operators. In the following format
a) %{
Definition of constant /header files
%}
b) Regular Expressions
%%
Transition rules
%%
c) Auxiliary Procedure (main( ) function)
3. Save file with .l extension e.g. Mylex.l
4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool
will convert “.l” file into “.c” language code file i.e. lex.yy.c
5. Compile the fi le lex.yy.c e.g. gcc lex.yy.c .After compiling the file
lex.yy.c, this will create the output file a.out
6. Run the file a.out e.g. ./a.out
7. Give input on the terminal to the a.out file upon processing output will be
displayed.
Example:
%{
#include <stdio.h>
#include <ctype.h>
%}
%%
id[a-z0-9]+
digit[0-9]
%%
int main( )
{
If|then|else|begin|end {“Keyword is:%s”,yytext);
{id} {“identifier is :%s”,yytext);
{digit}+ {constant is :%s,yytext);
. {“invalid token”};
yylex();
Output:
For lexical analyzer
[root@localhost]# lex Mylex.l
[root@localhost]# gcc lex.yy.c
[root@localhost]# ./a.out
123
Constant is 123
a
identifier is a
Conclusion:
With the help of given procedure and information about the Lex Tool,we can write
lex program to identify different tokens.
4. Lab Exercise
THEORY:
LR Parser:
LR parsing is a bottom up syntax analysis technique that can be applied to a large
class of context free grammars. L is for left –to –right scanning of the input and R
for constructing rightmost derivation in reverse.
General Framework:
A B
part which is already part of the input string.
considered for parsing.
B
STACK
0……… |
Yjsi+k xpsi+k*|xp+1… … xk
Reduce:
[s0………..si+gA] xp… … xk
Reduce A α, | α | = N
Pop 2*| N | elements including states
Push A
Put a state s
[s0………..si+gAs] xp… … xk
Example:
Consider the following grammar
1. S’ S
2. S aABe
3. A Abc
4. A b
5. B d
Table for parsing action and goto function for the given grammar is:-
Where
1. rj means reduce by production numbered j,
2. acc means accept
3. si means shift and stack state i.
4. blank means error.
Moves of LR parser
Forming the Parsing table:
1. Item:
Item of a grammar G is a production of G with a dot at some position of the right
side.
The production A XYZ has the following four items
A .XYZ
A X.YZ
Z XY.Z
A XYZ.
2. Item closure:
If A X.YZ belongs to Is
Yβ
Then add Y .β in Is
Constructing the GOTO graph from the LR(0) Items ABOVE derived.
1. Enlist all the collection of Items: C (I0, I1,…..)
2. Put all the transitions in the between the Items in he GOTO graph.
Rules for putting the label on the transition
If there is a transition from,
A->α . Xβ to A-> αX . β
then, the transition in the GOTO Graph is labeled X
If there is a transition
A-> α . Bβ to B-> .γ
then, the transition is labeled ε, and since we take ε-closure of all Items, these
productions lie in the same item as that of A->α . Bβ to A-> αB . β.
So, the GOTO Graph of the given grammar is produced follows.
S’-> .S
a S-> a.Abe
S->.aABe b
A->.Abc
A->.b A -> b.
$ I0 I2
A
I4
S’ -> S.
S->aA.Be S->aABe.
e
A->A.bc B
b B->.d I8
I1 S->aAB.e
A->Ab.c d I3
B->d.
I5
I6 I7
c
A->Abc.
I9
If any conflicting actions are generated by the given rules, then we say the grammar
is not in SLR or LR (0) and we fail to produce a parser.
We then fill the goto entries.
The goto transitions for the state i are constructed for all the non-terminals A
using the Rule:
If goto( Ii, A ) = Ij , then goto [i , A ] = j .
Conclusion:
With the help of given procedure and information about the LR Parse,we can write
program for implementation of LR parser.
5. Lab Exercise
STANDARD PROCEDURE:
TOOLS: yacc
Operating System: Linux
THEORY:
YACC stands for Yet Another Compiler Compiler. Its GNU version is called
Bison. YACC translates any grammar of a language into a parser for that
language. Grammars for YACC are described using a variant of Backus Naur
Form (BNF). A BNF grammar can be used to express Context-free languages. By
convention, a YACC file has the suffix .y.
%{
#include <stdio.h>
int yylex(void);
%}
%token INTEGER
%%
program:
expr:
INTEGER { $$ = $1; }
%%
int main()
{
yyparse();
}
void yyerror(char *s)
{
printf("%s",s);
}
Algorithm:
%}
%%
%%
4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool
will convert
5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file
lex.yy.c, this will create the output file a.out
7. Run the file a.out e.g. ./a.out
8. Give input on the terminal to the a.out file upon processing output will be
displayed
<parse.l>
%{
#include<stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+ {yylval.dval=atof(yytext);
return DIGIT;
}
\n|. return yytext[0];
%%
<parser.y>
%{
#include<stdio.h>
%}
%union
{
double dval;
}
%%
line:expr’\n’ printf(“%g\n”,$1);
expr: expr '+' term {$$=$1 + $3 ;}
| term
;
%%
int main()
{
yyparse();
}
yyerror(char *s)
{
printf("%s",s);
}
Output:
#lex parser.l
#yacc –d parser.y
#cc lex.yy.c y.tab.c –ll –lm
#./a.out
2+3
5.0000
Conclusion:
With the help of given information and procedure we can write Yacc program
for construction of compiler.
6. Lab Exercise
STANDARD PROCEDURE:
Programming Language: C
Operating System: Linux or Windows
Theory:
The best program transformations are those that yield the most benefit for the
least effort. The transformations provided by an optimizing compiler should
have several properties.
Third, a transformation must be worth the effort. It does not make sense for a
compiler writer to expend the intellectual effort to implement a code-improving
transformation
To partition three-address code into basic blocks, we must identify the leader
statements in the three-address code and then include all the statements, starting
from a leader, and up to, but not including, the next leader. The basic blocks into
which the three-address code is partitioned constitute the nodes or vertices of the
program flow graph. The edges in the flow graph are decided as follows. If B1 and
B2 are the two blocks, then add an edge from B1 to B2 in the program flow graph,
if the block B2 follows B1 in an execution sequence. The block B2 follows B1 in
an execution sequence if and only if:
1. The first statement of block B2 immediately follows the last statement of block
B1 in the three-address code, and the last statement of block B1 is not an
unconditional goto statement.
2. The last statement of block B1 is either a conditional or unconditional goto
statement, and the first statement of block B2 is the target of the last statement of
block B1.
Fact(x
)
{
int f = 1;
for(i = 2; i<=x; i++)
48
f = f*i;
return(f);
}
The three-address-code representation for the program fragment above is:
1. f = 1;
2. i = 2
3. if i <= x goto(8)
4. f = f *i
5. t1 = i + 1
6. i = t1
7. goto(3)
8. goto calling program
The leader statements are:
Statement number 1, because it is the first statement.
Statement number 3, because it is the target of a goto.
Statement number 4, because it immediately follows a conditional
goto statement.
Statement number 8, because it is a target of a conditional goto
statement.
Therefore, the basic blocks into which the above code can be partitioned
are as follows, and the program flow graph is shown in Figure 1.
Block B1:
Block B2:
Block B3:
Block B4:
Fig-Program Flow Graph
A loop is a cycle in the flow graph that satisfies two properties:
If the flow graph contains one or more back edges, then only one or more loops/
xcycles exist in the program. Therefore, we must identify any back edges in the
flow graph.
Figure : The flow graph back edges are identified by computing the dominators.
1. The forward edges form an acyclic graph in which every node can be
reached from the initial node G.
2. The back edges consist only of edges whose heads dominate their tails.
For example, consider the flow graph shown in Figure 3. This flow graph has no
back edges, because no edge's head dominates the tail of that edge. Hence, it could
have been a reducible graph if the entire graph had been acyclic. But that is not the
case. Therefore, it is not a reducible flow graph.
After identifying the back edges, if any, the natural loop of every back edge must
be identified. The natural loop of a back edge a → b is the set of all those
nodes that can reach a without going through b, including node b itself.
Therefore, to find a natural loop of the back edge n →d, we start with node n and
add all the predecessors of node n to the loop. Then we add the predecessors of the
nodes that were just added to the loop; and we continue this process until we reach
node d. These nodes plus node d constitute the set of all those nodes that can reach
node n without going through node d. This is the natural loop of the edge n → d.
Therefore, the algorithm for detecting the natural loop of a back edge is:
Output: set loop, which is a set of nodes forming the natural loop of the back edge
n → d.
main()
{
loop = { d } / * Initialize by adding node d to the set loop*/
insert(n); /* call a procedure insert with the node n */
}
procedure insert(m)
{
if m is not in the loop then
{
loop = loop ∪ { m }
for every predecessor p of m do
insert(p);
}
}
For example in the flow graph shown in Figure 1, the back edges are edge B3
→ B2, and the loop is comprised of the blocks B2 and B3.
After the natural loops of the back edges are identified, the next task is to identify
the loop invariant computations. The three-address statement x = y op z, which
exists in the basic block B (a part of the loop), is a loop invariant statement if all
possible definitions of b and c that reach upto this statement are outside the loop,
or if b and c are constants, because then the calculation b op c will be the same
each time the statement is encountered in the loop. Hence, to decide whether the
statement x = b op c is loop invariant or not, we must compute the u−d chaining
information. The u−d chaining information is computed by doing a global data
flow analysis of the flow graph. All of the definitions that are capable of reaching
to a point immediately before the start of the basic block are computed, and we call
the set of all such definitions for a block B the IN(B). The set of all the definitions
capable of reaching to a point immediately after the last statement of block B will
be called OUT(B). We compute both IN(B) and OUT(B) for every block B,
GEN(B) and KILL(B), which are defined as:
Consider the flow graph in Figure 4.The GEN and KILL sets for the basic blocks
are as shown in Table 1.
52
Table 1: GEN and KILL sets for Figure 4 Flow Graph
Block GEN KILL
B1 {1,2} {6,10,11}
B2 {3,4} {5,8}
B3 {5} {4,8}
B4 {6,7} {2,9,11}
B5 {8,9} {4,5,7}
B6 {10,11} {1,2,6}
IN(B) and OUT(B) are defined by the following set of equations, which are
called "data flow equations":
The next step, therefore, is to solve these equations. If there are n nodes, there will
be 2n equations in 2n unknowns. The solution to these equations is not generally
53
unique. This is because we may have a situation like that shown in Figure 5, where
a block B is a predecessor of itself.
If there is a solution to the data flow equations for block B, and if the solution is
IN(B) = IN0 and OUT(B) = OUT0, then IN0 ∪ {d} and OUT0 ∪ {d}, where d is
any definition not in IN0. OUT0 and KILL(B) also satisfy the equations, because if
we take OUT0 ∪ {d} as the value of OUT(B), since B is one of the predecessors
of itself according to IN(B) = ∪ OUT(P), d gets added to IN(B), because d is not
in the KILL(B). Hence, we get IN(B) = IN0 ∪ {d}. And according to OUT(B) =
IN(B) − KILL(B) GEN(B), OUT(B) = OUT0 ∪ {d} gets satisfied. Therefore,
IN0, OUT0 is one of the solutions, whereas IN0 ∪ {d}, OUT0 ∪ {d} is
another solution to the equations—no unique solution. What we are interested in is
finding smallest solution, that is, the smallest IN(B) and OUT(B) for every block B,
which consists of values that are in all solutions.
The algorithm for computing the smallest IN(B) and OUT(B) is as follows:
54
5.
{
flag = false
for each block B do
{
INnew(B) = Φ
55
6. for each predecessor P
of B
INnew(B) = INnew(B) ∪ OUT(P)
if INnew(B) ≠ IN(B) then
{
flag = true
IN(B) = INnew(B)
OUT(B) = IN(B) - KILL(B) ∪ GEN(B)
}
}
}
Initially, we take IN(B) for every block that is to be an empty set, and we take
OUT(B) for GEN(B), and we compute INnew(B). If it is different from IN(B), we
compute a new OUT(B) and go for the next iteration. This is continued until IN(B)
comes out to be the same for every B in a previous or current iteration.
Conclusion:
56
7. Lab Exercise
Aim: Implementation of any one method of Intermediate Code Generator.
STANDARD PROCEDURE:
Programming language-C
Operating System: Linux or Windows
THEORY:
Program to convert Infix expression to Postfix
possible and allows some optimizations to be carried out that would otherwise not
be possible.
1. Postfix notation
2. Syntax tree
3. Three-address code
Postfix Notation
In postfix notation, the operator follows the operand. For example, in the expression
(a − b) * (c+ d) + (a − b), the postfix representation is:
Syntax Tree
The syntax tree is nothing more than a condensed form of the parse tree. The
operator and
keyword nodes of the parse tree (Figure 1) are moved to their parent, and a chain of
single
57
Figure 1: Parse tree for the string id+id*id.
Three-Address Code
Sometimes a statement might contain less than three references; but it is still called
a threeaddress statement. The following are the three-address statements used to
represent various programming language constructs:
58
Used for representing arithmetic expressions:
Infix Expression :
Any expression in the standard form like "2*3-4/5" is an Infix(Inorder) expression.
Postfix Expression :
The Postfix(Postorder) form of the above expression is "23*45/-".
Infix to Postfix Conversion :
In normal algebra we use the infix notation like a+b*c. The corresponding postfix
notation is abc*+.
The algorithm for the conversion is as follows :
Scan the Infix string from left to right.
Initialise an empty stack.
If the scannned character is an operand, add it to the Postfix string. If the
scanned character is an operator and if the stack is empty Push the character to
stack.
If the scanned character is an Operand and the stack is not empty, compare the
precedence of the character with the element on top of the stack (topStack). If
top Stack has higher precedence over the scanned character Pop the stack else
Push the scanned character to stack. Repeat this step as long as stack is not
empty and topStack has precedence over the character.
Repeat this step till all the characters are scanned.
59
(After all characters are scanned, we have to add any character that the stack
may have to the Postfix string.) If stack is not empty add topStack to Postfix
string and Pop the stack.
Repeat this step as long as stack is not empty.
Return the Postfix string.
Example :
Let us see how the above algorithm will be implemented using an example.
The next character is 'c' which is placed in the Postfix string. Next character scanned
is '-'. The topmost character in the stack is '*' which has a higher precedence than '-'.
Thus '*' will be popped out from the stack and added to the Postfix string. Even now
the stack is not empty. Now the topmost element of the stack is '+' which has equal
priority to '-'. So pop the '+' from the stack and add it to the Postfix string. The '-' will
be pushed to the stack.
Next character is 'd' which is added to Postfix string. Now all characters have been
scanned so we must pop the remaining elements from the stack and add it to the
Postfix string. At this stage we have only a '-' in the stack. It is popped out and added
to the Postfix string. So, after all characters are scanned, this is how the stack and
Postfix string will be :
End result :
Infix String : a+b*c-d
Postfix String : abc*+d-
Algorithm:
1. Take a stack OPSTK and initialize it to be empty.
2. Read the entire string or in infix form e.g. A+B*C
3. Read the string character by character into var symbol.
60
i)If symbol is an operand Add it to the postfix string.
ii)if stack OPSTK is not empty and precedence of top of stack symbol is greater
than recently read character symbol then pop OPSTK .
topsymbol=pop(OPSTK) Add this popped topsymbol to the postfix string
iii) Repeat step iii. Till stack is not empty precedence of top of stack symbol is
greater than recently read character symbol.
iv) push symbol onto OPSTK.
4. Output any remaining operators.
Pop OPSTK till it is not empty and ad top symbol to postfix string .
Output:
------------------------------------------
Enter the Infix Notation : (A+B)*C
Postfix Notation is: AB+C*
Conclusion:
With help of given information and procedure we can implement one of the
intermediate code like infix to postfix.
61
8. Lab Exercise
STANDARD PROCEDURE:
Programming Language-c,c++
Operating System: Linux or Windows
THEORY:
Code generation phase-
Final phase in the compiler model
Takes as input:
◦ intermediate representation (IR) code
◦ symbol table information
Produces output:
◦ semantically equivalent target program
62
Instruction selection
◦ choose appropriate target-machine instructions to implement the IR
statements
Register allocation and assignment
◦ decide what values to keep in which registers
Instruction ordering
◦ decide in what order to schedule the execution of instructions
Design of all code generators involve the above three tasks
Details of code generation are dependent on the specifics of IR, target
language, and run-time system
op source, destination
◦ Op – op-code
Source, destination – data fields
Op-codes (op), for example
MOV (move content of source to destination)
ADD (add content of source to destination)
SUB (subtract content of source from
destination)
63
There are also other ops
Absolute M M
Register R R
Literal #c N/A
64
Example
65
Conclusion:
With help of given information and procedure we can implement code generation
phase.
66