CD PPTS 2

Lexical Analysis
The role of lexical analyzer
Source
program
Lexical
Analyzer
token
Parser
getNextToken
Symbol
table
To semantic
analysis
Why to separate Lexical analysis

and parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes

A token is a pair a token name and
an optional token value
A pattern is a description of the form
that the lexemes of a token may take
A lexeme is a sequence of characters
in the source program that matches
the pattern for a token
Example
Token
if
else
Informal description
Sample lexemes
if
Characters i, f
Characters e, l, s, e
else
<=, !=
comparison
< or > or <= or >= or == or !=
id Letter followed by letter and digits

pi, score, D2
number
Any numeric constant
3.14159, 0, 6.02e23
literal Anything but sorrounded by core dumped
printf(total = %d\n, score);
Attributes for tokens

E = M * C ** 2
< id, pointer to symbol table entry for

E>
<assign-op>
<id, pointer to symbol table entry for
M>
<mult-op>
<id, pointer to symbol table entry for
C>
<exp-op>
<number, integer value 2 >
Input buffering
Sometimes lexical analyzer needs to

look ahead some symbols to decide
about the token to return
In C language: we need to look after -, = or
< to decide what token to return
In Fortran: DO 5 I = 1.25
Lexical Analyzer Read its input from input

Buffer
Scheme used to buffer input

Buffer divided in two halves
One pointer marks the beginning of token
Lookahead pointer scans ahead till token is
discovered
Lookahead can be large
Declare(a1,a2,a3,a4 ) in PL/I program
Declare is a keyword , arrayname ?
Token beginning
lookahead pointer
Two buffer scheme
To handle large look-aheads safely
Switch (lookahead++)
{
case declare :
if (lookahead is at end of first buffer)
{
reload second buffer;
lookahead = beginning of second buffer;
}
else if
{
lookahead is at end of second buffer)
{
reload first buffer;\
lookahead = beginning of first buffer;
}
else /
break;
cases for the other characters;
}
Specification of tokens
Regular expressions are used to formalize the
specification of tokens
Regular expressions are means for specifying
regular languages
Example:
Identifiers = Letter(letter | digit)*
Keyword = begin | end | if | then | else
Constant = digit +
Relop
= < | <=| =| <> | > | >=
Each regular expression is a pattern specifying the
form of strings
Token Recognized
Transition diagrams(kind of flowchart )
Transition diagram for reserved

words and identifiers
Keywords
Identifier
Constants
Rel operator
Code for Transition diagram for

identifier
State 9: C= Getchar();
if letter(C ) then goto state 10
else fail ( );
Letter() : procedure which returns true iff C
is a letter
Fail() : Routine which retracts the lookahead
pointer and start next transition diagram
State 10 :
C = Getchar ()
if letter ( C) or digit ( C) then goto 10
else if Delimiter ( C) then goto 11
else fail ()
Delimiter() : Procedure that returns true
whenever C is a character that could follow
identifier
State 11 : retract ()
return( id , Install() )
Regular expressions = specification

Finite automata = implementation
A finite automaton consists of
An input alphabet
A set of states S
A start state n
A set of accepting states F S
A set of transitions state input state
Transition diagram is finite automation

Nondeterministic Finite Automation (NFA)
A set of states
A set of input symbols
A transition function, move(), that maps statesymbol pairs to sets of states.
A start state S0
A set of states F as accepting (Final) states.
a
a
start
0
b
NFA recognizing the language (a | b ) * abb
The set of states = {0,1,2,3}
Input symbol = {a,b}
Start state is S0, accepting state is S3
Language defined by NFA is the set of strings it
accepts
Transition Function
Transition function can be implemented as a transition
table.
State
Input Symbol
a
{0,1}
{0}
--
{2}
--
{3}
Converting a RE to an
Automata
We can convert a RE to an NFA
Inductive construction
Start with a simple basis, use that to build
more complex parts of the NFA
RE to NFA
Basis:
a
R=a
R=
R=S+T
R=ST
R=S
RE to -NFA Example
Convert R= (ab+a)* to an NFA

We proceed in stages, starting from
simple elements and working our way
up
a
b
ab
RE to NFA Example
ab+a
a
(ab+a)*

CD PPTS 2

Uploaded by

Copyright:

Available Formats

CD PPTS 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD PPTS 2

Uploaded by

Copyright:

Available Formats

Lexical Analysis

The role of lexical analyzer

Why to separate Lexical analysis

Tokens, Patterns and Lexemes

id Letter followed by letter and digits

Any numeric constant

literal Anything but sorrounded by core dumped

printf(total = %d\n, score);

Attributes for tokens

< id, pointer to symbol table entry for

Sometimes lexical analyzer needs to

Lexical Analyzer Read its input from input

Scheme used to buffer input

Two buffer scheme

To handle large look-aheads safely

Transition diagrams(kind of flowchart )

Transition diagram for reserved

Code for Transition diagram for

Regular expressions = specification

Transition diagram is finite automation

Convert R= (ab+a)* to an NFA

You might also like