Vishal CD
Vishal CD
Vishal CD
Compiler Design
(3170701)
Enrolment No 200160107129
Name Vaishnav Vishal Dipakbhai
Branch Computer Engineering
Academic Term 2023-24
Institute Name Government Engineering College, Modasa
1
Government Engineering College, Modasa
Computer Engineering
CERTIFICATE
this Institute (GTU Code: 016) has satisfactorily completed the Practical / Tutorial work for
Place:
Date:
2
Compiler Design (3170701) 200160107129
Preface
Compiler Design is an essential subject for computer science and engineering students. It
deals with the theory and practice of developing a program that can translate source code
written in one programming language into another language. The main objective of this
subject is to teach students how to design and implement a compiler, which is a complex
software system that converts high- level language code into machine code that can be
executed on a computer. The design of compilers is an essential aspect of computer science,
as it helps in bridging the gap between human-readable code and machine-executable code.
This lab manual is designed to help students understand the concepts of compiler design and
develop hands-on skills in building a compiler. The manual provides step-by-step instructions
for implementing a simple compiler using C and other applicable programming language,
covering all the essential components such as lexical analyzer, parser, symbol table,
intermediate code generator, and code optimizer.
The manual is divided into several sections, each focusing on a specific aspect of compiler
design. The first section provides an introduction to finite automata, phases of compiler and
covering the basic concepts of lexical analysis. The subsequent sections cover parsing, code
generation, and study of Learning Basic Block Scheduling. Each section includes detailed
instructions for completing the lab exercises and programming assignments, along with
examples and code snippets.
The lab manual also includes a set of challenging programming assignments and quizzes that
will help students test their understanding of the subject matter. Additionally, the manual
provides a list of recommended books and online resources for further study.
This manual is intended for students studying Compiler Design and related courses. It is also
useful for software developers and engineers who want to gain a deeper understanding of
compiler design and implementation. We hope that this manual will be a valuable resource
for students and instructors alike and will contribute to the learning and understanding of
compiler design.
3
Compiler Design (3170701) 200160107129
OBJECTIVE:
This laboratory course is intended to make the students experiment on the basic techniques
of compiler construction and tools that can used to perform syntax-directed translation of
a high-level programming language into an executable code. Students will design and
implement language processors in C by using tools to automate parts of the
implementation process. This will provide deeper insights into the more advanced
semantics aspects of programming languages, code generation, machine independent
optimizations, dynamic memory allocation, and object orientation.
OUTCOMES:
Upon the completion of Compiler Design practical course, the student will be able to:
1. Understand the working of lex and yacc compiler for debugging of programs.
2. Understand and define the role of lexical analyzer, use of regular expression and transition
diagrams.
3. Understand and use Context free grammar, and parse tree construction.
4. Learn & use the new tools and technologies used for designing a compiler.
5. Develop program for solving parser problems.
6. Learn how to write programs that execute faster.
4
Compiler Design (3170701) 200160107129
DTE’s Vision
Institute’s Vision
Institute’s Mission
Department’s Vision
Department’s Mission
5
Compiler Design (3170701) 200160107129
6
Compiler Design (3170701) 200160107129
7
Compiler Design (3170701) 200160107129
Sr. CO CO CO CO
Title of experiment
No. 1 2 3 4
Implementation of Finite Automata and String Validation.
1. √
8
Compiler Design (3170701) 200160107129
Compiler Design is a vital subject in computer science and engineering that focuses on the design
and implementation of compilers. Here are some industry-relevant skills that students can
develop while studying Compiler Design:
• Proficiency in programming languages: A good understanding of programming languages is
essential for building compilers. Students should be proficient in programming languages
such as C/C++, Java, and Python.
• Knowledge of data structures and algorithms: Compiler Design involves the implementation
of various data structures and algorithms. Students should have a good understanding of data
structures such as stacks, queues, trees, and graphs, and algorithms such as lexical analysis,
parsing, and code generation.
• Familiarity with compiler tools: Students should be familiar with compiler tools such as Lex
and Yacc. These tools can help automate the process of creating a compiler, making it more
efficient and error-free.
• Debugging skills: Debugging is an essential skill for any programmer, and it is particularly
important in Compiler Design. Students should be able to use debugging tools to find and
fix errors in their code.
• Optimization techniques: Code optimization is a critical component of Compiler Design.
Students should be familiar with optimization techniques such as constant folding, dead code
elimination, and loop unrolling, which can significantly improve the performance of the
compiled code.
• Collaboration and communication skills: Compiler Design is a complex subject that requires
collaboration and communication between team members. Students should develop good
communication and collaboration skills to work effectively with their peers and instructors.
By developing these industry-relevant skills, students can become proficient in CompilerDesign
and be better equipped to meet the demands of the industry.
1. Teacher should provide the guideline with demonstration of practical to the students with
all features.
2. Teacher shall explain basic concepts/theory related to the experiment to the students before
starting of each practical
3. Involve all the students in performance of each experiment.
4. Teacher is expected to share the skills and competencies to be developed in the students and
ensure that the respective skills and competencies are developed in the students after the
completion of the experimentation.
5. Teachers should give opportunity to students for hands-on experience after the
demonstration.
6. Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from the students by concerned industry.
7. Give practical assignment and assess the performance of students based on task assigned
9
Compiler Design (3170701) 200160107129
1. Students are expected to carefully listen to all the theory classes delivered by the faculty members
and understand the COs, content of the course, teaching and examination scheme, skill set to be
developed etc.
2. Students will have to perform experiments considering C or other applicable programming
language using Lex tool or Yacc.
3. Students are instructed to submit practical list as per given sample list shown on next page.
Students have to show output of each program in their practical file.
4. Student should develop a habit of submitting the experimentation work as per the schedule and
she/he should be well prepared for the same.
1. Handle equipment with care: When working in the lab, students should handle equipment and
peripherals with care. This includes using the mouse and keyboard gently, avoiding pulling or
twisting network cables, and handling any hardware devices carefully.
2. Avoid water and liquids: Students should avoid using wet hands or having any liquids near the
computer equipment. This will help prevent damage to the devices and avoid any safety
hazards.
3. Shut down the PC properly: At the end of the lab session, students should shut down the
computer properly. This includes closing all programs and applications, saving any work, and
following the correct shutdown procedure for the operating system.
4. Obtain permission for laptops: If a student wishes to use their personal laptop in the lab, they
should first obtain permission from the Lab Faculty or Lab Assistant. They should follow
all lab rules and guidelines and ensure that their laptop is properly configured for the lab
environment.
10
Compiler Design (3170701) 200160107129
Index
(Progressive Assessment Sheet)
11
Compiler Design (3170701) 200160107129
Experiment No - 1
Aim: To study and implement Finite Automata and validate strings using it.
Date:
Objectives:
1. To understand the concept of Finite Automata.
2. To implement Finite Automata using programming language.
3. To validate strings using Finite Automata.
Theory:
Finite Automata is a mathematical model that consists of a finite set of states and a set of
transitions between these states. It is used to recognize patterns or validate strings. In a Finite
Automata, there are five components:
1. A set of states
2. An input alphabet
3. A transition function
4. A start state
5. A set of final (or accepting) states
In the implementation of Finite Automata and string validation, we need to create a Finite
Automata that recognizes a specific pattern or set of patterns. The Finite Automata consists
of states, transitions between the states, and a set of accepting states. The input string is then
validated by passing it through the Finite Automata, starting at the initial state, and following
the transitions until the string is either accepted or rejected.
String validation using Finite Automata is useful in a variety of applications, including pattern
matching, text processing, and lexical analysis in programming languages. It is an efficient
method for validating strings and can handle large inputs with minimal memory and time
complexity.
Example: Suppose a finite automaton which accepts even number of a's where ∑ = {a, b, c}
12
Compiler Design (3170701) 200160107129
Solution:
Program:
Program-1: Create a program in Python that implements a Finite Automata to validate
strings that start with 'a' and end with 'b'.
Code:
13
Compiler Design (3170701) 200160107129
def transition(state,
input_symbol): # initial state:
A
# final state: C
transition_table =
{
'A,a': 'B',
'A,b': 'N',
'B,a': 'B',
'B,b': 'C',
'C,a': 'B',
'C,b': 'C',
'N,a': 'N',
'N,b': 'N'
}
def validate(string):
state = 'A'
if state == 'C':
return True
return False
def
display(inp):
if
validate(inp):
print('{} is a valid
string'.format(inp)) else:
print('{} is not a valid string'.format(inp))
input1 =
'abbabababa' input2
= 'abbabababb'
display(input1)
display(input2)
Code:
# Define the Finite Automata
def compute_transition_function(pattern,
alphabet): m = len(pattern)
14
Compiler Design (3170701) 200160107129
transitions = {}
for q in range(m +
1):for a in
alphabet:
k = min(m + 1, q + 2)
while k > 0 and not (pattern[:q] +
a).endswith(pattern[:k-1]):k -= 1
transitions[(q, a)] =
kreturn transitions
Sample Program-1:
In this implementation of Finite Automata and string validation, I have learned how to create
a Finite Automata that recognizes a specific pattern or set of patterns, and how to validate
strings using the Finite Automata. I have also learned how to implement a Finite Automata
using programming language, and how to test it with different inputs. By using Finite
Automata, I can efficiently validate strings and recognize patterns, making it a powerful
tool in computer scienceand related fields.
Sample Program-2:
15
Compiler Design (3170701) 200160107129
Quiz:
1. What is a Finite Automata?
Finite Automaton (FA) has states, an input alphabet, and rules. Two main types are DFA and
NFA. Crucial in computer science for language theory, regular expressions, and compiler
design..
5. What are the advantages of using a finite automaton to search for patterns in text
strings?
Finite automata offer efficient text pattern matching with constant time complexity and memory
efficiency, making them ideal for recurring pattern search tasks.
Suggested Reference:
Problem
Knowledge Implementat Testing & Creativity in
Recognition
Rubrics (2) ion (2) Debugging (2) logic/code (2) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
16
Compiler Design (3170701) 200160107129
Experiment No - 2
Date:
Competency and Practical Skills: Understanding of Lex tool and its usage in
compilerdesign, understanding of regular expressions and data structures, improving
programming skill to develop programs using lex tool
Objectives:
1. To introduce students to Lex tool and its usage in compiler design
2. To provide practical knowledge of regular expressions and their use in pattern matching
3. To enhance students' understanding of data structures such as arrays, lists, and trees
4. To develop students' problem-solving skills in developing and implementing programs using
Lex tool
5. To develop students' debugging skills to identify and resolve program errors and issues
Theory:
❖ COMPILER:
• A compiler is a translator that converts the high-level language into the machine language.
• High-level language is written by a developer and machine language can be understood by
theprocessor. Compiler is used to show errors to the programmer.
• The main purpose of a compiler is to change the code written in one language without
changing the meaning of the program.
• When you execute a program which is written in HLL programming language then it
executes into two parts.
• In the first part, the source program compiled and translated into the object program (low
level language).
• In the second part, the object program translated into the target program through the
assembler.
17
Compiler Design (3170701) 200160107129
❖ LEX:
• Lex is a program that generates lexical analyzers. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of tokens.
• It reads the input stream and produces the source code as output through implementing the
lexical analyzer in the C program.
• During the first phase the compiler reads the input and converts strings in the source to
tokens.
• With regular expressions we can specify patterns to lex so it can generate code that will
allowit to scan and match strings in the input. Each pattern specified in the input to lex has an
associated action.
• Typically an action returns a token that represents the matched string for subsequent use by
the parser. Initially we will simply print the matched string rather than return a token value.
➢ Function of LEX:
• Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler runs
the lex.1 program and produces a C program lex.yy.c.
• Finally C compiler runs the lex.yy.c program and produces an object program a.out.
• a.out is a lexical analyzer that transforms an input stream into a sequence of tokens.
STEPS:
• Step 1 : An input file describes the lexical analyzer to be generated named lex.l is written in
lex language. The lex compiler transforms lex.l to C program, in a file that is always named
lex.yy.c.
• Step 2 : The C compiler compile lex.yy.c file into an executable file called a.out.
• Step 3 : The output file a.out take a stream of input characters and produce a stream of
tokens.
• Program Structure:
Rules Section: The rules section contains a series of rules in the form: pattern action and
pattern must be unintended and action begin on the same line in {} brackets. The rule
section is enclosed in “%% %%”.
Syntax:
%%
pattern action
%%
User Code Section: This section contains C statements and additional functions. We can also
compile these functions separately and load with the lexical analyzer.
Program:
Code:
%{
#include<stdio.h>
#include<string.h>
#define MAX 1000
%}
/* Declarations
*/ int count = 0;
char words[MAX][MAX];
/* Rule Section */
%%
[a-zA-Z]+ {
int i, flag = 0;
for(i=0; i<count; i++) {
if(strcmp(words[i], yytext) ==
0) {
flag = 1;
break;
}
}
if(flag == 0) {
strcpy(words[count++],
yytext);
}
}
.;
%%
/* Code Section */
int main(int argc, char **argv)
{if(argc != 2) {
printf("Usage: ./a.out <filename>\n");
return 1;
}
FILE *fp = fopen(argv[1],
"r");if(fp == NULL) {
printf("Cannot open
file!\n"); return 1;
}
yyin = fp;
yylex();
int i, j;
printf("\nWord\t\tFrequency\n
"); for(i=0; i<count; i++) {
20
Compiler Design (3170701) 200160107129
int freq = 0;
rewind(fp);
while(fscanf(fp, "%s", words[MAX-1]) ==
1) { if(strcmp(words[MAX-1], words[i])
== 0) {
freq++;
}
}
printf("%-15s %d\n", words[i], freq);
}
fclose(fp)
; return 0;
}
Code:
/* simple lex program that takes in a string of letters */
/* and outputs the Caesar Cipher version of it - note */
/* that initially the programs shifts letters by 3, but */
/* it can easily be modified to implement different shifts */
/* rule section */
%%
[a-z] { char ch =
yytext[0];ch += 3;
if (ch > 'z') ch -= ('z'+1-
'a'); printf ("%c", ch);
}
[A-Z] { char ch =
yytext[0];ch += 3;
if (ch > 'Z') ch -= ('Z'+1-'A');
printf ("%c", ch);
}
%%
int main (void) {
return yylex ();
}
int yywrap(void) {
return 1;
}
Code:
%{
#include <stdio.h>
%}
21
Compiler Design (3170701) 200160107129
%option noyywrap
%%
"//"(.|\n)* { printf("Single Line Comment: %s\n", yytext); }
"/*"([^*]|\*+[^*/])*\*+"/" { printf("Multi-line Comment: %s\n",
yytext); }
.;
%%
int main() {
char input[1000];
printf("Enter C code (type 'exit' on a new line to quit):\n");
while(1) {
fgets(input, sizeof(input), stdin);
if (strcmp(input, "exit\n") ==
0) { break;
}
yy_scan_string(input);
yylex();
}
return 0;
}
Program-1:
22
Compiler Design (3170701) 200160107129
In this program, I have implemented a histogram of words using lex tool. The program counts
the frequency of each word in a given input file. It uses an array words to store all the distinct
words and counts the frequency of each word by iterating through the words array and
comparing it with the input file. The program also checks for errors such as invalid input file.
This program can be used to analyze the most frequent words in a text file or a document.
This program can be extended to handle large files by implementing a dynamic array to store
the distinct words instead of a fixed size array.
Program-2:
Program-3:
23
Compiler Design (3170701) 200160107129
Quiz:
1. What is Lex tool used for?
Lex is a tool used for generating lexical analyzers or tokenizers, typically used in compiler
construction to break down source code into tokens.
4. What is the purpose of the "Extract single and multiline comments from C Program"
programin Lex?
It extracts comments (both single-line and multiline) from C code for documentation,
analysis, or other purposes.
Suggested Reference:
24
Compiler Design (3170701) 200160107129
Experiment No - 3
Aim: Implement following Programs Using Lex
a. Convert Roman to Decimal
b. Check weather given statement is compound or simple
c. Extract html tags from .html file
Date:
Competency and Practical Skills: Understanding of Lex tool and its usage in
compiler design, understanding of regular expressions and data structures, improving
programming skill to develop programs using lex tool
Objectives:
1. To introduce students to Lex tool and its usage in compiler design
2. To provide practical knowledge of regular expressions and their use in pattern matching
3. To enhance students' understanding of data structures such as arrays, lists, and trees
4. To develop students' problem-solving skills in developing and implementing programs using
Lex tool
5. To develop students' debugging skills to identify and resolve program errors and issues
Theory:
❖ LEX:
• Lex is a program that generates lexical analyzers. It is used with a YACC parser generator.
• The lexical analyzer is a program that transforms an input stream into a sequence of tokens.
• It reads the input stream and produces the source code as output through implementing the
lexical analyzer in the C program.
• During the first phase the compiler reads the input and converts strings in the source to
tokens.
• With regular expressions we can specify patterns to lex so it can generate code that will
allowit to scan and match strings in the input. Each pattern specified in the input to lex has an
associated action.
• Typically an action returns a token that represents the matched string for subsequent use by
the parser. Initially we will simply print the matched string rather than return a token value.
➢ Function of LEX:
• Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler runs
the lex.1 program and produces a C program lex.yy.c.
• Finally C compiler runs the lex.yy.c program and produces an object program a.out.
• a.out is a lexical analyzer that transforms an input stream into a sequence of tokens.
25
Compiler Design (3170701) 200160107129
is as follows:
%{ definitions %}
%%
{ rules }
%%
{ user subroutines }
• Definitions include declarations of constant, variable and regular definitions.
• Rules define the statement of form p1 {action1} p2 {action2}. pn {action}.
• Where pi describes the regular expression and action1 describes the actions the lexical
analyzershould take when pattern pi matches a lexeme.
• User subroutines are auxiliary procedures needed by the actions. The subroutine can be
loadedwith the lexical analyzer and compiled separately.
STEPS:
• Step 1 : An input file describes the lexical analyzer to be generated named lex.l is written in
lex language. The lex compiler transforms lex.l to C program, in a file that is always named
lex.yy.c.
• Step 2 : The C compiler compile lex.yy.c file into an executable file called a.out.
• Step 3 : The output file a.out take a stream of input characters and produce a stream of
tokens.
• Program Structure:
Rules Section: The rules section contains a series of rules in the form: pattern action and
pattern must be unintended and action begin on the same line in {} brackets. The rule
section is enclosed in “%% %%”.
Syntax:
%%
pattern action
%%
User Code Section: This section contains C statements and additional functions. We can also
compile these functions separately and load with the lexical analyzer.
26
Compiler Design (3170701) 200160107129
How to run the program:
To run the program, it should be first saved with the extension .l or .lex. Run the below
commands on terminal in order to run the program file.
• Step 1: lex filename.l or lex filename.lex depending on the extension file is saved with
name.lextension.
• Step 2: gcc lex.yy.c
• Step 3: ./a.out
• Step 4: Provide the input to program in case it is required
Program:
Code:
%{
#include <stdio.h>
/* Rules section */
%%
I { prev_value = 1; } // If the symbol is 'I', set prev_value to 1
IV { decimal += 3; } // If the symbol is 'IV', add 3 to the decimal
value V { decimal += 5; } // If the symbol is 'V', add
5 to the decimal value IX { decimal += 8; } // If the symbol is 'IX',
add 8 to the decimal value X { decimal += 10; } // If the
symbol is 'X', add 10 to the decimal value
XL { decimal += 30; } // If the symbol is 'XL', add 30 to the decimal
valueL { decimal += 50; } // If the symbol is 'L', add 50 to the decimal
value XC { decimal += 80; } // If the
symbol is 'XC', add 80 to the decimal value C { decimal += 100; }
CD { decimal += 300; } // If the symbol is 'CD', add 300 to the decimal
valueD { decimal += 500; } // If the symbol is 'D', add 500 to the decimal
value CM { decimal += 800; } // If the
symbol is 'CM', add 800 to the decimal valueM { decimal +=
1000; } // If the symbol is 'M', add 1000 to the decimal
value
. { printf("Invalid Roman numeral\n"); exit(1); } // If any other symbol is
encountered, exit with an error message
%%
/* Code section
*/ int main()
{
yylex();
printf("Decimal value: %d\n", decimal);
return 0;
}
27
Compiler Design (3170701) 200160107129
Program-2: Check weather given statement is compound or simple
Code:
/*Program to recognize whether a given sentence is simple or compound.*/
%{
#include<stdio.h
>int flag=0;
%}
%%
and
|or |
but |
because |
if |
then
|
nevertheless { flag=1; }
.;
\n { return 0; }
%%
int main()
{
printf("Enter the sentence:\n");
yylex();
if(flag==0)
printf("Simple sentence\n");
else
printf("compound sentence\n");
}
28
Compiler Design (3170701) 200160107129
int yywrap( )
{
return 1;
}
Code:
%{
#include<stdio.h>
%}
%%
\<[^>]*\> fprintf(yyout,"%s\n",yytext);
.|\n;
%%
int yywrap()
{
return 1;
}
int main()
{
yyin=fopen("input7.html","r");
yyout=fopen("output7.txt","w");
yylex();
return 0;
}
Program-1:
The Lex program scans the input Roman numeral and converts it into decimal by matching
each symbol with the corresponding decimal value. If an invalid symbol is encountered, the
program exits with an error message. Lex provides an efficient and easy way to define the
rules for the conversion.
Program-2:
29
Compiler Design (3170701) 200160107129
Program-3:
Quiz:
1. What is Lex tool?
Lex is a tool for tokenizing input text in compiler construction and text processing.
2. What is the purpose of Lex tool?
Lex is used to create tokenizers for breaking down text into meaningful units known as
tokens,which is vital in tasks like compiler construction and text processing.
4. How does the program check whether a given statement is compound or simple?
The program distinguishes between compound and simple statements by examining the
syntax, particularly the presence of block delimiters or specific language constructs.
5. What is the purpose of the program to extract HTML tags from an HTML file?
The program extracts HTML tags from an HTML file, enabling tasks like web scraping,
contentanalysis, and data extraction.
Suggested Reference:
1. Aho, A.V., Sethi, R., & Ullman, J.D. (1986). Compilers: Principles, Techniques, and Tools.
Addison-Wesley.
2. Levine, J.R., Mason, T., & Brown, D. (2009). lex & yacc. O'Reilly Media, Inc.
3. Lex - A Lexical Analyzer Generator. Retrieved
fromhttps://www.gnu.org/software/flex/manual/
4. Lexical Analysis with Flex. Retrieved from https://www.geeksforgeeks.org/flex-fast-lexical-
analyzer-generator/
5. The Flex Manual. Retrieved from https://westes.github.io/flex/manual/
31
Compiler Design (3170701) 200160107129
Experiment No - 4
Aim: Introduction to YACC and generate Calculator Program
Date:
Objectives:
By the end of this experiment, the students should be able to:
➢ Understand the concept of YACC and its significance in compiler construction
➢ Write grammar rules for a given language
➢ Implement a calculator program using YACC
Theory:
YACC (Yet Another Compiler Compiler) is a tool that is used for generating parsers. It is
used in combination with Lex to generate compilers and interpreters. YACC takes a set of
rules and generates a parser that can recognize and process the input according to those rules.
The grammar rules that are defined using YACC are written in BNF (Backus-Naur Form)
notation. These rules describe the syntax of a programming language.
INPUT FILE:
→ The YACC input file is divided into three parts.
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....
Definition Part:
→ The definition part includes information about the tokens used in the syntax definition.
Rule Part:
→ The rules part contains grammar definition in a modified BNF form. Actions is C code in
{ } and can be embedded inside (Translation schemes).
Auxiliary Routines Part:
→ The auxiliary routines part is only C code.
→ It includes function definitions for every function needed in the rules part.
→ It can also contain the main() function definition if the parser is going to be run as a
program.
→ The main() function must call the function yyparse().
32
Compiler Design (3170701) 200160107129
Quiz:
1. What is YACC?
YACC, or "Yet Another Compiler Compiler," generates parsers for processing
programming language syntax based on context-free grammars.
2. What is the purpose of YACC?
YACC generates parsers for processing programming language syntax, aiding in compiler
construction, syntax analysis, and abstract syntax tree generation.
3. What is the output of YACC?
The primary output of YACC is a parser implemented in C, which parses input according
to defined grammar rules.
4. What is a syntax analyzer?
A syntax analyzer (parser) checks code structure against language grammar, creating a
structured representation for further processing and reporting syntax errors.
Suggested Reference:
1. "Lex & Yacc" by John R. Levine, Tony Mason, and Doug Brown
2. "The Unix Programming Environment" by Brian W. Kernighan and Rob Pike
References used by the students:
"Lex & Yacc" by John R. Levine, Tony Mason, and Doug Brown
34
Compiler Design (3170701) 200160107129
Experiment No - 5
Aim: Implement a program for constructing
a. LL(1) Parser
b. Predictive Parser
Date:
Objectives:
By the end of this experiment, the students should be able to:
➢ Understand the concept parsers and its significance in compiler construction
➢ Write first and follow set for given grammar
➢ Implement a LL(1) and predictive grammar using top down parser
Software/Equipment: C compiler
Theory:
❖ LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from the
Left to Right manner and the second L shows that in this parsing technique, we are going to
use the Left most Derivation Tree. And finally, the 1 represents the number of look-ahead,
which means how many symbols are you going to see when you want to makea decision.
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
Both the stack and the input contains an end symbol $ to denote that the stack is empty and
the input is consumed. The parser refers to the parsing table to take any decision on the input
and stack element combination.
In recursive descent parsing, the parser may have more than one production to choose from
for a single instance of input, whereas in predictive parser, each step has at most one
production to choose. There might be instances where there is no production matching the
input string, making the parsing procedure to fail.
Program-1:
#include<stdio.h> #include<string.h> #define TSIZE 128
// table[i][j] stores
// the index of production that must be applied on
// ith varible if the input is
// jth nonterminal
int table[100][TSIZE];
// stores all list of terminals
36
Compiler Design (3170701) 200160107129
// the ASCII value if use to index terminals
// terminal[i] = 1 means the character with
// ASCII value is a terminal char terminal[TSIZE];
// stores all list of terminals
// only Upper case letters from 'A' to 'Z'
// can be nonterminals
// nonterminal[i] means ith alphabet is present as
// nonterminal is the grammarchar nonterminal[26];
// structure to hold each production
// str[] stores the production
// len is the length of production struct product {
char str[100];int len;
}pro[20];
// no of productions in form A->ßint no_pro;
char first[26][TSIZE]; char follow[26][TSIZE];
// stores first of each production in form A->ß char first_rhs[100][TSIZE];
// check if the symbol is nonterminal int isNT(char c) {
return c >= 'A' && c <= 'Z';
}
// reading data from the filevoid readFromFile() {
FILE* fptr;
fptr = fopen("text.txt", "r"); char buffer[255];
int i; int j;
while (fgets(buffer, sizeof(buffer), fptr)) { printf("%s", buffer);
j = 0;
nonterminal[buffer[0] - 'A'] = 1;
for (i = 0; i < strlen(buffer) - 1; ++i) { if (buffer[i] == '|') {
++no_pro;
pro[no_pro - 1].str[j] = '\0'; pro[no_pro - 1].len = j;
pro[no_pro].str[0] = pro[no_pro - 1].str[0]; pro[no_pro].str[1] = pro[no_pro - 1].str[1];
pro[no_pro].str[2] = pro[no_pro - 1].str[2]; j = 3;
}
else {
pro[no_pro].str[j] = buffer[i];
++j;
if (!isNT(buffer[i]) && buffer[i] != '-' && buffer[i] != '>') { terminal[buffer[i]] = 1;
}
}
}
pro[no_pro].len = j;
++no_pro;
}
}
void add_FIRST_A_to_FOLLOW_B(char A, char B)
{
int i;
for (i = 0; i < TSIZE; ++i)
{
if (i != '^'){
37
Compiler Design (3170701) 200160107129
follow[B - 'A'][i] = follow[B - 'A'][i] || first[A - 'A'][i];
}
}
void add_FOLLOW_A_to_FOLLOW_B(char A, char B)
{
int i;
for (i = 0; i < TSIZE; ++i)
{
if (i != '^')
follow[B - 'A'][i] = follow[B - 'A'][i] || follow[A - 'A'][i];
}
}
void FOLLOW()
{
int t = 0; int i, j, k, x;
while (t++ < no_pro)
{
for (k = 0; k < 26; ++k) {
if (!nonterminal[k]) continue; char nt = k + 'A';
for (i = 0; i < no_pro; ++i) {
for (j = 3; j < pro[i].len; ++j) {
if (nt == pro[i].str[j]) {
for (x = j + 1; x < pro[i].len; ++x) { char sc = pro[i].str[x];
if (isNT(sc)) {
add_FIRST_A_to_FOLLOW_B(sc, nt); if (first[sc - 'A']['^'])
continue;
}
e
l follow[nt - 'A'][sc] = 1;
s
break; e
} {
38
Compiler Design (3170701) 200160107129
void
FIRST() {
int i, j;
int t = 0;
while (t <
no_pro) {
39
Compiler Design (3170701) 200160107129
continue;
}
else { first_rhs[i][sc] = 1;
}
if (j == pro[i].len)
first_rhs[i]['^'] = 1;
}
++t;
}
}
{
40
Compiler Design (3170701) 200160107129
readFromFile(); follow[pro[0].str[0] - 'A']['$'] = 1; FIRST();
FOLLOW(); FIRST_RHS();
int i, j, k;
// display first of each variableprintf("\n");
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) { char c = pro[i].str[0];
printf("FIRST OF %c: ", c);for (j = 0; j < TSIZE; ++j) {
if (first[c - 'A'][j]) {
printf("%c ", j);
}
}
printf("\n");
}
}
// display follow of each variable printf("\n");
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) { char c = pro[i].str[0];
printf("FOLLOW OF %c: ", c);for (j = 0; j < TSIZE; ++j) {
if (follow[c - 'A'][j]) { printf("%c ", j);
}
}
printf("\n");
}
}
// display first of each variable ß
// in form A->ß printf("\n");
for (i = 0; i < no_pro; ++i) { printf("FIRST OF %s: ", pro[i].str);
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j]) {
printf("%c ", j);
}
}
printf("\n");
}
// the parse table contains '$'
// set terminal['$'] = 1
// to include '$' in the parse tableterminal['$'] = 1;
// the parse table do not read '^'
// as input
// so we set terminal['^'] = 0
// to remove '^' from terminals terminal['^'] = 0;
// printing parse table printf("\n");
printf("\n\t**************** LL(1) PARSING TABLE *******************\n");
printf("\t \n");
printf("%-10s", "");
for (i = 0; i < TSIZE; ++i) {
if (terminal[i]) printf("%-10c", i);
}
printf("\n"); int p = 0;
41
Compiler Design (3170701) 200160107129
for (i = 0; i < no_pro; ++i) {
if (i != 0 && (pro[i].str[0] != pro[i - 1].str[0]))p = p + 1;
for (j = 0; j < TSIZE; ++j) {
if (first_rhs[i][j] && j != '^') { table[p][j] = i + 1;}
else if (first_rhs[i]['^']) {
for (k = 0; k < TSIZE; ++k) {
if (follow[pro[i].str[0] - 'A'][k]) { table[p][k] = i + 1;}
}}
}
}
k = 0;
for (i = 0; i < no_pro; ++i) {
if (i == 0 || (pro[i - 1].str[0] != pro[i].str[0])) {
printf("%-10c", pro[i].str[0]); for (j = 0; j < TSIZE; ++j) {
if (table[k][j]) {
printf("%-10s", pro[table[k][j] - 1].str);
}
else if (terminal[j]) {
printf("%-10s", "");
}
}
++k;
printf("\n");}
}
}
Observations and Conclusion:
Program -1:
42
Compiler Design (3170701) 200160107129
In the above example, the grammar is given as input and first set and follow set of
nonterminals are identified.Further the LL1 parsing table is constructed .
Program-2:
#include <stdio.h> #include <string.h>
char prol[7][10] = { "S", "A", "A", "B", "B", "C", "C" };
char pror[7][10] = { "A", "Bb", "Cd", "aB", "@", "Cc", "@" };
char prod[7][10] = { "S->A", "A->Bb", "A->Cd", "B->aB", "B->@", "C->Cc", "C->@" };
char first[7][10] = { "abcd", "ab", "cd", "a@", "@", "c@", "@" };
char follow[7][10] = { "$", "$", "$", "a$", "b$", "c$", "d$" }; char table[5][6][10];
int numr(char c)
{switch (c)
{
case 'S': return 0;
case 'A': return 1;
case 'B': return 2;
case 'C': return 3;
case 'a': return 0;
case 'b': return 1;
case 'c': return 2;
case 'd': return 3;
case '$': return 4;
}return (2);
}
int main()
{
int i, j, k;
strcpy(table[0][1], "a");
strcpy(table[0][2], "b");
strcpy(table[0][3], "c");
strcpy(table[0][4], "d");
strcpy(table[0][5], "$");
strcpy(table[1][0], "S");
strcpy(table[2][0], "A");
strcpy(table[3][0], "B");
strcpy(table[4][0], "C");
printf("\n \n");
for (i = 0; i < 5; i++) for (j = 0; j < 6; j++)
{
printf("%-10s", table[i][j]); if (j == 5)
printf("\n \n");
}
}
Quiz:
1. What is a parser and state the Role of it?
A parser analyzes the syntax of programming language source code, checking for
correctness, reporting errors, and creating structured representations (e.g., parse trees or
ASTs). It also handles operator precedence and may perform semantic actions. Its role is
crucial in the compilation process, serving as a bridge between source code and further
compilation phases.
44
Compiler Design (3170701) 200160107129
2. LL Parser: ANTLR (e.g., ANTLR4).
3. LR Parser: Bison, Yacc.
4. LALR Parser: LALR parser generator in GNU Bison.
5. GLR Parser: Elkhound.
6. Earley Parser: Marpa.
7. Chart Parser: CYK, Earley.
8. Top-Down vs. Bottom-Up Parsers: Recursive descent parsers (top-down) and LR parsers
(bottom-up).
3. What are the Tools available for implementation?
Tools for parser and compiler implementation include Yacc/Bison, ANTLR, Lex/Flex,
JavaCC, PLY, Jison, PEG.js, Coco/R, Elkhound, Marpa, Ragel, Parsec, and ANTLR4:JS.
These tools cover a range of programming languages and parsing needs.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. Geeks for geeks: https://www.geeksforgeeks.org/construction-of-ll1-parsing-table/
3. http://www.cs.ecu.edu/karl/5220/spr16/Notes/Top-down/LL1.html
Knowled
Problem Completen
ge of Code Presentati
implement ess and
Rubric Parsing Quality on (2) Tota
ation (2) accuracy
s techniqu (2) l
(2)
es
(2)
Go Av Go Av Go Av Go Av Go Av
od g. od g. od g. od g. od g.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
45
Compiler Design (3170701) 200160107129
Experiment No - 06
Aim: Implement a program for constructing
a. Recursive Decent Parser (RDP)
b. LALR Parser
Date:
Objectives:
By the end of this experiment, the students should be able to:
➢ Understand the RDP ,broad classification of bottom up parsers and its significance in
compiler construction
➢ Verifying whether the string is accepted for RDP, a given grammar is parsed using LR
parsers.
➢ Implement a RDP and LALR parser
➢ Software/Equipment: C compiler
➢ Theory:
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It
can be defined as a Parser that uses the various recursive procedure to process the input string
with no backtracking. It can be simply performed using a Recursive language. The first
symbol of the string of R.H.S of production will uniquely determine the correct alternative
to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a
procedure. The objective of each procedure is to read a sequence of input characters that can
be produced by the corresponding non-terminal, and return a pointer to the root of the parse
tree for the non- terminal. The structure of the procedure is prescribed by the productions for
the equivalent non- terminal.
The recursive procedures can be simply to write and adequately effective if written in a
language that executes the procedure call effectively. There is a procedure for each non-
terminal in the grammar. It can consider a global variable lookahead, holding the current input
token and a procedure match (Expected Token) is the action of recognizing the next token in
the parsing process and advancing the input stream pointer, such that lookahead points to the
next token to be parsed. Match () is effectively a call to the lexical analyzer to get the next
token.
46
Compiler Design (3170701) 200160107129
For example, input stream is a + b$. lookahead == a
match() lookahead == + match () lookahead == b
……………………….
……………………….
In this manner, parsing can be done.
LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the
canonical collection of LR (1) items.
In the LALR (1) parsing, the LR (1) items which have same productions but different look
ahead are combined to form a single set of items
LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
Example
S → AA
A → aAA → b
Add Augment Production, insert '•' symbol at the first position for every production in Gand
also add the look ahead.
S` → •S, $ S → •AA, $
A → •aA, a/b A → •b, a/b I0 State:
Add Augment production to the I0 State and Compute the ClosureL I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-terminal.
So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followed by thenon-
terminal. So, the I0 State becomes.
I0= S` → •S, $ S → •AA, $ A → •aA, a/b A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $ I2= Go to (I0, A) = closure ( S → A•A,
$)
Add all productions starting with A in I2 State because "•" is followed by the non-terminal.
So, the I2 State becomes
I2= S → A•A, $ A → •aA, $ A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "•" is followed by the non- terminal.
So, the I3 State becomes
I3= A → a•A, a/b A → •aA, a/b A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3) Go to (I3, b) = Closure (A → b•, a/b)
= (same as I4) I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b I5= Go to (I2, A) =
Closure (S → AA•, $) =S → AA•, $I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "•" is followed by the non-terminal.
So, the I6 State becomes
I6 = A → a•A, $A → •aA, $ A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6) Go to (I6, b) = Closure (A → b•, $) =
(same as I7) I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b I9= Go to (I6, A) = Closure (A
→ aA•, $) A → aA•, $
47
Compiler Design (3170701) 200160107129
If we analyze then LR (0) items of I3 and I6 are same but they differ only in theirlookahead.
I3 = { A → a•A, a/b A → •aA, a/b
A → •b, a/b
}
I6= { A → a•A, $A → •aA, $ A → •b, $
}
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can
combine them and called as I36.
I36 = { A → a•A, a/b/$ A → •aA, a/b/$
A → •b, a/b/$
}
The I4 and I7 are same but they differ only in their look ahead, so we can combine them and
called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead, so we can combine them and
called as I89.
I89 = {A → aA•, a/b/$} Drawing DFA:
Program-1:
#include<stdio.>
#include<string.h>
#include<ctype.h>
void E()
{
T();
Eprime();
}
void Eprime()
{
if(input[i]=='+')
{ i++; T();
Eprime();
}
}
void T()
{
49
Compiler Design (3170701) 200160107129
F();
Tprime();
}
void Tprime()
{
if(input[i]=='*')
{
i++; F();
Tprime();
}
}
v
o
i
d
F
(
)
{
50
Compiler Design (3170701) 200160107129
if(isalnum(input[i]))i++; else if(input[i]=='(')
{ i++; E();
if(input[i]==')')i++;
else error=1;
}
else error=1;
}
Program-2:
int goto_table[][2] = {
{1, 2}, // State 0
{-1, -1}, // State 1
{-1, -1} // State 2
};
// Input tokens
int input[] = {TERMINAL_A, TERMINAL_A, TERMINAL_B, TERMINAL_B,
TERMINAL_B};
51
Compiler Design (3170701) 200160107129
}
int main() {
int current_state = 0; int i = 0;
current_state = stack[stack_top];
}
printf("Parsing successful\n");
return 0;
}
Program -1:
In the above output, as pe the grammar provided and as per calling procedure , the tree is
parsed and thereby the inputted strings are mapped w.r.t calling procedure ; and the string/s
which are successfully parsed are accepted and others rejected.
Program-2:
52
Compiler Design (3170701) 200160107129
Quiz:
1. What do you mean by shift reduce parsing?
Shift-reduce parsing is a bottom-up parsing method used in language processing. It
involves shifting input symbols onto a stack and then reducing them based on grammar
rules, gradually building the parse tree or abstract syntax tree for the input.
1. Identify sets with similar core items (production rules with position markers).
2. Combine item sets with identical core items.
3. Compute new lookahead symbols by uniting the lookahead sets.
4. Update transitions between sets as needed.
5. Continue merging until no further merges are possible.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
53
Compiler Design (3170701) 200160107129
Experiment No - 07
Aim: Implement a program for constructing Operator Precedence Parsing.
Date:
Objectives:
By the end of this experiment, the students should be able to:
➢ Understand the concept of OPG its significance in compiler construction
➢ Write precedence relations for grammar
➢ Implement a OPG using C compiler
➢ Software/Equipment: C compiler
➢ Theory:
Operator Precedence Parsing is also a type of Bottom-Up Parsing that can be used to a
class ofGrammars known as Operator Grammar.
Relation Meaning
54
Compiler Design (3170701) 200160107129
Depending upon these precedence Relations, we can decide which operations will be
executed or parsed first.
Association and Precedence Rules
• If operators have different precedence Since * has higher precedence than +
Example−
In a statement a + b * c
∴ + <. *
In statement a * b + c
∴∗.>+
• If operators have Equal precedence, then use Association rules.
(a) Example minus; In statement a + b + c here + operators are having equal precedence. As
'+' is left Associative in a + b + c
∴ (a + b) will be computed first, and then it will be added to c.i.e., (a + b) + c
+ .> +
Similarly, '*' is left Associative in a * b * c
(b) Example − In a statement a ↑ b ↑ c here, ↑ is the Right Associative operator
∴ It will become a ↑ (b ↑ c)
∴ (b ↑ c) will be computed first.
∴ ↑<. ↑
• Identifier has more precedence then all operators and symbols.
∴ θ <. id $ <. id
id . > θ id . > $ id . >)
(<. id.
• $ has less precedence than all other operators and symbols.
$ <. ( id . > $
$ <. + ). > $
$ <.*
Example 2 – Construct the Precedence Relation table for the Grammar.E → E + E | E ∗ E/id
Solution
Operator-Precedence Relations
I + * $
d
I . . .
d > > >
+ < . < .
. > . >
* < . . .
. > > >
$ < < <
. . .
Advantages of Operator Precedence Parsing
• It is accessible to execute. Disadvantages of Operator Precedence Parsing
• Operator Like minus can be unary or binary. So, this operator can have differentprecedence’s
in different statements.
• Operator Precedence Parsing applies to only a small class of Grammars.
55
Compiler Design (3170701) 200160107129
Program:
else {
flag = 0; f();
}
if (c == '$') {
flag = 0; f();
}
c = grm[i][++j];
}
}
if (flag == 1)
printf("Operator grammar");
}
In the above example ,the grammar is analysed as per operator grammar rules and the output
is against the rules of OPG so, it is not an operator grammar.
Input :2A=A/A B=A+A
Compiler Design (3170701) 200160107129
In the above example ,the grammar is analysed as per operator grammar rules and the output
favors the rules of OPG(operator present between two non terminals) so, it is not an operator
grammar.
Quiz:
1. Define operator grammar.
An operator grammar defines how operators and operands combine in expressions,
specifying rules for parsing and evaluating expressions in programming languages. It
focuses on the structure and interactions of operators within expressions.
Suggested Reference:
1. https://www.gatevidyalay.com/operator-precedence-parsing/
2. https://www.geeksforgeeks.org/role-of-operator-precedence-parser/
Knowledge Knowledge
Completeness
of parsing of Implementat Presentation
and accuracy
Rubrics techniques precedence ion (2) (2) Total
(2)
(2) table (2)
Good Avg. Good Avg. Good66 Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
Compiler Design (3170701) 200160107129
Experiment No - 08
Aim: Generate 3-tuple intermediate code for given infix expression.
Date:
Objectives:
By the end of this experiment, the students should be able to:
➢ Understand the different intermediate code representations and its significance in compiler
construction
➢ Write intermediate code for given infix expression
Software/Equipment: C compiler
Theory:
Three address code is a type of intermediate code which is easy to generate and can be easily
converted to machine code. It makes use of at most three addresses and one operator to
represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.
General representation –
a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries
and op represents the operator
Example-2: Write three address code for following code for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}
Advantage –
• Easy to rearrange code for global optimization.
• One can quickly access value of temporary variables using symbol table.
Compiler Design (3170701) 200160107129
Disadvantage –
• Contain lot of temporaries.
• Temporary variable creation increases time and space complexity. Example –
2. Triples – This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to
thattriple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
• Temporaries are implicit and difficult to rearrange code.
• It is difficult to optimize because optimization involves moving intermediate code. When a
triple is moved, any other triple referring to it must be updated also. With help of pointer
one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c
3. Indirect Triples – This representation makes use of pointer to the listing of all references
to computations which is made separately and stored. Its similar in utility as compared to
quadruple representation but requires less space than it. Temporaries are implicit and easier
to rearrange code.
Example – Consider expression a = b * – c + b * – c
Compiler Design (3170701) 200160107129
Question – Write quadruple, triples and indirect triples for following expression : (x + y) *
(y +z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y t2 = y + z t3 = t1 * t2t4 = t1 + z t5 = t3 + t4
Program:
void main()
{
printf("Enter the expression:"); scanf("%s",j);
printf("\tThe Intermediate code is:\n"); small();
}
if(j[i]=='*')
printf("\tt%d=%s*%s\n",c,a,b);if(j[i]=='/')
printf("\tt%d=%s/%s\n",c,a,b);if(j[i]=='+')
printf("\tt%d=%s+%s\n",c,a,b);if(j[i]=='-')printf("\tt%d=%s-%s\n",c,a,b);
if(j[i]=='=')
printf("\t%c=t%d",j[i-1],--c);sprintf(ch,"%d",c);
j[i]=ch[0];c++;
small();
}
void small()
{
pi=0;l=0; for(i=0;i<strlen(j);i++)
{
for(m=0;m<5;m++)if(j[i]==sw[m]) if(pi<=p[m])
{
pi=p[m]; l=1;
k=i;
}
}
Compiler Design (3170701) 200160107129
Observations and Conclusion:
In the above example the user is asked to write an infix expression and the output is
generated intermediate code(3- address code).
Quiz:
1. What are the different implementation methods for three-address code?
Different implementation methods for three-address code (TAC) include quadruples,
triples, indirect triples, and direct execution. These methods vary in how they represent
and manipulate intermediate code in compilers.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft, Rajeev
Motwani, and Jeffrey D. Ullman.
2. https://www.geeksforgeeks.org/introduction-to-intermediate-representationir/
3.
Rubric wise marks obtained:
Problem Completeness
Knowledge Logic Ethics (2)
(2) Recognition Building (2) and accuracy
Rubrics Total
(2) (2)
Goo Avg. Goo Avg. Good Avg. Goo Avg Goo Avg.
d (1) d (1) (2) (1) d . d (1)
(2) (2) 71 (2) (1) (2)
Marks
Compiler Design (3170701) 200160107129
Experiment No - 09
Aim: Extract Predecessor and Successor from given Control Flow Graph.
Date:
Objectives:
By the end of this experiment, the students should be able to:
➢ Understand the concept control structure (in blocks) in compiler
➢ Write the predecessor and successor for given graph.
Theory:
A basic block is a simple combination of statements. Except for entry and exit, the basic
blocks do not have any branches like in and out. It means that the flow of control enters at the
beginning and it always leaves at the end without any halt. The execution of a set of instructions
of a basic block always takes place in the form of a sequence.
The first step is to divide a group of three-address codes into the basic block. The new basic
block always begins with the first instruction and continues to add instructions until it reaches
a jump or a label. If no jumps or labels are identified, the control will flow from one instruction
to the next in sequential order.
The algorithm for the construction of the basic block is described below step by step:
Algorithm: The algorithm used here is partitioning the three-address code into basic blocks.
Input: A sequence of three-address codes will be the input for the basic blocks.
Output: A list of basic blocks with each three address statements, in exactly one block,
is considered as the output.
Method: We’ll start by identifying the intermediate code’s leaders. The following are some
guidelines for identifying leaders:
1. The first instruction in the intermediate code is generally considered as a leader.
2. The instructions that target a conditional or unconditional jump statement can beconsidered
as a leader.
3. Any instructions that are just after a conditional or unconditional jump statement can be
considered as a leader.
Each leader’s basic block will contain all of the instructions from the leader until the
instruction right before the following leader’s start.
Example of basic block:
Three Address Code for the expression a = b + c – d is:
T1 = b + c T2 = T1 - d a = T2
This represents a basic block in which all the statements execute in a sequence one after the
other.
63
Compiler Design (3170701) 200160107129
Basic Block Construction:
Let us understand the construction of basic blocks with an example:
Example:
1. PROD = 0
2. I = 1
3. T2 = addr(A) – 4
4. T4 = addr(B) – 4
5. T1 = 4 x I
6. T3 =
T2[T1]7. T5
= T4[T1]
8. T6 = T3 x T5
9. PROD = PROD + T6
10. I = I + 1
11. IF I <=20 GOTO (5)
Using the algorithm given above, we can identify the number of basic blocks in the above
three- address code easily-
There are two Basic Blocks in the above three-address code:
• B1 – Statement 1 to 4
• B2 – Statement 5 to 11
Transformations on Basic blocks:
Transformations on basic blocks can be applied to a basic block. While transformation, we
don’tneed to change the set of expressions computed by the block.
There are two types of basic block transformations. These are as follows:
1. Structure-Preserving Transformations
Structure preserving transformations can be achieved by the following methods:
1. Common sub-expression elimination
2. Dead code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements
2. Algebraic Transformations
In the case of algebraic transformation, we basically change the set of expressions into
analgebraically equivalent set.
For example, and
expression x:= x + 0
or x:= x *1
This can be eliminated from a basic block without changing the set of expressions.
Flow Graph:
A flow graph is simply a directed graph. For the set of basic blocks, a flow graph shows
the flow of control information. A control flow graph is used to depict how the program
control is being parsed among the blocks. A flow graph is used to illustrate the flow of
control between basic blocks once an intermediate code has been partitioned into basic blocks.
When the beginning instruction of the Y block follows the last instruction of the X block,
an edge might flow from one block X to another block Y.
Let’s make the flow graph of the example that we used for basic block formation:
64
Compiler Design (3170701) 200160107129
Firstly, we compute the basic blocks (which is already done above). Secondly, we assign
the flowcontrol information.
Program:
// C++ program to find predecessor and successor in a
BST #include <iostream>
using namespace std;
// BST Node
struct Node
{
int key;
struct Node *left, *right;
};
// If key is present at
rootif(root->key ==
key)
{
// the maximum value in left subtree is
65
Compiler Design (3170701) 200160107129
predecessorif (root->left != NULL)
{
Node* tmp = root->left;
while (tmp->right)
tmp = tmp->right;pre
= tmp ;
}
66
Compiler Design (3170701) 200160107129
newNode(key);if (key < node-
>key)
node->left = insert(node->left, key);
else
node->right = insert(node->right, key);
return node;
}
Node *root =
NULL; root =
insert(root, 50);
insert(root, 30);
insert(root, 20);
insert(root, 40);
insert(root, 75);
insert(root, 60);
insert(root, 80);
Node* pre = NULL, *suc =
NULL; findPreSuc(root, pre,
suc, key);
if (pre != NULL)
cout << "Predecessor is " << pre->key <<
endl; else
cout << "No Predecessor";
if (suc != NULL)
cout << "Successor is " << suc-
>key; else
cout << "No
Successor"; return 0;
}
67
Compiler Design (3170701) 200160107129
Observations and Conclusions:
In the above example user gets predecessor and successor of a given specific node .
Quiz:
1. What is flowgraph?
A flowgraph is a graphical representation of a program's control flow or execution flow. It
typically depicts the sequence of statements, functions, or program components, along with
the decision points and control transfers (e.g., loops, branches, and function calls) between
them. Flowgraphs are used for analyzing, visualizing, and understanding the structure and
behavior of software programs, making them a valuable tool in software engineering and
program analysis.
2. Define DAG.
A Directed Acyclic Graph (DAG) is a finite directed graph that contains no directed cycles.
In a DAG, edges have a direction, meaning they go from one vertex to another, but you cannot
follow a sequence of edges and return to the same vertex. DAGs are often used in various
computer science and mathematical applications, including scheduling algorithms, dependency
analysis, and topological sorting.
3. Define Backpatching .
Backpatching is a compiler optimization technique used in code generation. It involves
inserting code or updating addresses in a generated intermediate code or assembly code to
resolve and connect control flow statements, such as conditional branches or jumps, to their
corresponding target locations. Backpatching is particularly useful for handling jumps to labels
or addresses that are not known at the time the code is initially generated. It ensures that
control flow is correctly directed to the intended destinations during program execution.
Suggested Reference:
1. Introduction to Automata Theory, Languages and Computation by John E. Hopcroft,
Rajeev Motwani, and Jeffrey D. Ullman
2. https://www.geeksforgeeks.org/data-flow-analysis-compiler/
Documentati
Problem
Knowledg implementa Correctnes o n and
Recognitio
Rubrics e(2) ti on (2) s (2) Presentation Total
n (2)
(2)
Goo Avg Goo Avg Goo Avg Goo Av Goo Avg
d . d . d . d g. d .
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
68
Compiler Design (3170701) 200160107129
Experiment No - 10
Aim: Study of Learning Basic Block Scheduling Heuristics from Optimal Data.
Date:
Objectives:
By the end of this experiment, the students should be able to:
➢ Understanding the concept of basic block scheduling and its importance in
compileroptimization.
➢ Understanding the various heuristics used for basic block scheduling.
➢ Analyzing optimal data to learn the basic block scheduling heuristics.
➢ Comparing the performance of the implemented basic block scheduler with other
commonly used basic block schedulers.
Theory:
Instruction scheduling is an important step for improving the performance of object code
produced by a compiler. Basic block scheduling is important in its own right and also as a
building block for scheduling larger groups of instructions such as superblocks. The basic
block instruction scheduling problem is to find a minimum length schedule for a basic
block a straight-line sequence of codewith a single-entry point and a single exit point subject
to precedence, latency, and resource constraints. Solving the problem exactly is known to be
difficult, and most compilers use a greedy list scheduling algorithm coupled with a heuristic.
The heuristic is usually hand-crafted, a potentially time-consuming process. Modern
architectures are pipelined and can issue multiple instructions per time cycle. On such
processors, the order that the instructions are scheduled can significantly impact performance.
The basic block instruction scheduling problem is to find a minimum length schedule for a
basic block a straight-line sequence of code with a single entry point and a single exit point
subject to precedence, latency, and resource constraints. Instruction scheduling for basic
blocks is known to be NP-complete for realistic architectures. The most popular method for
scheduling basic blocks continues to be list scheduling.
For e.g.: We consider multiple-issue pipelined processors. On such processors, there are
multiple functional units and multiple instructions can be issued (begin execution) each clock
cycle. Associated with each instruction is a delay or latency between when the instruction is
issued and when the result is available for other instructions which use the result. In this paper,
we assume that all functional units are fully pipelined and that instructions are typed.
Examples of types of instructions are load/store, integer, floating point, and branch
69
Compiler Design (3170701) 200160107129
instructions. We use the standard labelled directed acyclic graph (DAG) representation of a
basic block (see Figure 1(a)). Each node corresponds to an instruction and there is an edge
from i to j labelled with a positive integer l (i, j)if j must not be issued until i has executed
for l (i, j) cycles. Given a labelled dependency DAG for
a basic block, a schedule for a multiple-issue processor specifies an issue or start time for each
instruction or node such that the latency constraints are satisfied and the resource constraints
are satisfied. The latter are satisfied if, at every time cycle, the number of instructions of each
type issued at that cycle does not exceed the number of functional units that can execute
instructions of that type. The length of a schedule is the number of cycles needed for the
schedule to complete; i.e., each instruction has been issued at its start time and, for each
instruction with no successors, enough cycles have elapsed that the result for the instruction
is available. The basic block instruction scheduling problem is to construct a schedule with
minimum length.
Instruction scheduling for basic blocks is known to be NP-complete for realistic architectures.
The most popular method for scheduling basic blocks continues to be list scheduling. A list
scheduler takes a set of instructions as represented by a dependency DAG and builds a
schedule using a best- first greedy heuristic. A list scheduler generates the schedule by
determining all instructions that can be scheduled at that time step, called the ready list, and
uses the heuristic to determine the best instruction on the list. The selected instruction is then
added to the partial schedule and the scheduler determines if any new instructions can be
added to the ready list.
The heuristic in a list scheduler generally consists of a set of features and an order for testing
the features. Some standard features are as follows. The path length from a node i to a node
j in a DAG is the maximum number of edges along any path from i to j. The critical-path
distance from a nodei to a node j in a DAG is the maximum sum of the latencies along any
path from i to j. Note that both the path length and the critical-path distance from a node i to
itself is zero. A node j is a descendant of a node i if there is a directed path from i to j; if the
path consists of a single edge, j is also called an immediate successor of i. The earliest start
time of a node i is a lower bound on the earliest cycle in which the instruction i can be
scheduled.
In supervised learning of a classifier from examples, one is given a training set of instances,
where each instance is a vector of feature values and the correct classification for that instance,
and is to induce a classifier from the instances. The classifier is then used to predict the class
of instances that it has not seen before. Many algorithms have been proposed for supervised
learning. One of the most widely used is decision tree learning. In a decision tree the internal
nodes of the tree are labelled with features, the edges to the children of a node are labelled
with the possible values of the feature, and the leaves of the tree are labelled with a
classification. To classify a new example, one starts at the root and repeatedly tests the feature
at a node and follows the appropriate branch until a leaf is reached. The label of the leaf is
the predicted classification of the new example.
70
Compiler Design (3170701) 200160107129
takes a set of instructions as represented by a dependency DAG and builds a schedule using a
Algorithm:
This document is on automatically learning a good heuristic for basic block scheduling using
supervised machine learning techniques. The novelty of our approach is in the quality of the
training data we obtained training instances from very large basic blocks and we performed
an extensive and systematic analysis to identify the best features and to synthesize new
features— and in our emphasis on learning a simple yet accurate heuristic.
Quiz:
1. What is the basic block instruction scheduling problem?
The basic block instruction scheduling problem involves reordering and optimizing
71
Compiler Design (3170701) 200160107129
takes a set of instructions as represented by a dependency DAG and builds a schedule using a
instructions within a basic block of code to maximize hardware resource usage, reduce
execution latency, optimize control flow, manage registers efficiently, and handle instruction
dependencies. It's a crucial compiler optimization technique for improving program
performance on modern processors.
3. What are the constraints that need to be considered in solving the basic block
instruction scheduling problem?
Constraints in basic block instruction scheduling include data dependencies, resource
limitations, control flow, instruction timing, register pressure, code size, readability, and
target architecture compatibility.
Suggested Reference:
1. https://dl.acm.org/doi/10.5555/1105634.1105652
2. https://www.worldcat.org/title/1032888564
Problem
Knowledge Documentati Presentation
Recognition Ethics (2)
Rubrics (2) on (2) (2) Total
(2)
Good Avg. Good Avg. Good Avg. Good Avg. Good Avg.
(2) (1) (2) (1) (2) (1) (2) (1) (2) (1)
Marks
72