0% found this document useful (0 votes)

729 views14 pages

Compiler Design Chapter 2

The document discusses lexical analysis, which is the first phase of compiler design. It breaks down lexical analysis into scanning and lexical analysis. Scanning removes whitespace and comments, while lexical analysis produces a stream of tokens from the source code. Regular expressions are used to specify patterns for tokens like keywords, identifiers, operators, and punctuation. The lexical analyzer identifies lexemes that match token patterns and returns tokens with attribute values to the parser. It can encounter lexical errors if no token pattern matches the input.

Uploaded by

Vuggam Venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

729 views14 pages

Compiler Design Chapter 2

Uploaded by

Vuggam Venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

CoSc3112 Compiler Design Chapter II 2.

CHAPTER II LEXICAL ANALYSIS 2.1 NEED AND ROLE

OF LEXICAL ANALYZER
Lexical Analysis is the first phase of compiler. It reads the input characters from left to right, one
character at a time, from the source program.
It generates the sequence of tokens for each lexeme. Each token is a logical cohesive unit such as
identifiers, keywords, operators and punctuation marks.
It needs to enter that lexeme into the symbol table and also reads from the symbol table.
These interactions are suggested in Figure 2.1.

Figure 2.1: Interactions between the lexical analyzer and the parser
Since the lexical analyzer is the part of the compiler that reads the source text, it may perform
certain other tasks besides identification of lexemes. One such task is stripping out comments and
whitespace (blank, newline, tab). Another task is correlating error messages generated by the
compiler with the source program.
Needs / Roles / Functions of lexical analyzer
 It produces stream of tokens.
 It eliminates comments and whitespace.
 It keeps track of line numbers.
 It reports the error encountered while generating tokens.
 It stores information about identifiers, keywords, constants and so on into symbol table.
Lexical analyzers are divided into two processes:
a) Scanning consists of the simple processes that do not require tokenization of the input, such
as deletion of comments and compaction of consecutive whitespace characters into one.
b) Lexical analysis is the more complex portion, where the scanner produces the sequence of
tokens as output.
Lexical Analysis versus Parsing / Issues in Lexical analysis
1. Simplicity of design: It is the most important consideration. The separation of lexical and
syntactic analysis often allows us to simplify tasks. whitespace and comments removed by
the lexical analyzer.
2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply
specialized techniques that serve only the lexical task, not the job of parsing. In addition,
specialized buffering techniques for reading input characters can speed up the compiler
significantly.
3. Compiler portability is enhanced. Input-device-specific peculiarities can be restricted to
the lexical analyzer.
Tokens, Patterns, and Lexemes
A token is a pair consisting of a token name and an optional attribute value. The token name is an
abstract symbol representing a kind of single lexical unit, e.g., a particular keyword, or a
CoSc3112 Compiler Design Chapter II 2.2

sequence of input characters denoting an identifier. Operators, special symbols and constants are
also typical tokens.
A pattern is a description of the form that the lexemes of a token may take. Pattern is set of rules
that describe the token. A lexeme is a sequence of characters in the source program that matches the
pattern for a token.
Table 2.1: Tokens and Lexemes
TOKEN INFORMAL DESCRIPTION SAMPLE LEXEMES
(PATTERN)
if characters i, f if
else characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=
id Letter, followed by letters and digits pi, score, D2, sum, id_1, AVG
number any numeric constant 35, 3.14159, 0, 6.02e23
literal anything surrounded by “ ” “Core”, “Design” “Appasami”,

In many programming languages, the following classes cover most or all of the tokens:
1. One token for each keyword. The pattern for a keyword is the same as the keyword itself.
2. Tokens for the operators, either individually or in classes such as the token comparison
mentioned in table 2.1.
3. One token representing all identifiers.
4. One or more tokens representing constants, such as numbers and literal strings.
5. Tokens for each punctuation symbol, such as left and right parentheses, comma, and
semicolon
Attributes for Tokens
When more than one lexeme can match a pattern, the lexical analyzer must provide the subsequent
compiler phases additional information about the particular lexeme that matched.
The lexical analyzer returns to the parser not only a token name, but an attribute value that
describes the lexeme represented by the token.
The token name influences parsing decisions, while the attribute value influences translation of
tokens after the parse.
Information about an identifier - e.g., its lexeme, its type, and the location at which it is first found
(in case an error message) - is kept in the symbol table.
Thus, the appropriate attribute value for an identifier is a pointer to the symbol-table entry for that
identifier.
Example: The token names and associated attribute values for the Fortran statement E=M
* C ** 2 are written below as a sequence of pairs.
<id, pointer to symbol-table entry for E>
< assign_op >
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2 >
Note that in certain pairs, especially operators, punctuation, and keywords, there is no need for an
attribute value. In this example, the token number has been given an integer-valued attribute.
CoSc3112 Compiler Design Chapter II 2.3

2.2 LEXICAL ERRORS

It is hard for a lexical analyzer to tell that there is a source-code error without the aid of other
components.
Consider a C program statement fi ( a == f(x)). The lexical analyzer cannot tell whether fi is a
misspelling of the keyword if or an undeclared function identifier. Since fi is a valid lexeme for the
token id, the lexical analyzer must return the token id to the parser.
The lexical analyzer is unable to proceed because none of the patterns for tokens matches any
prefix of the remaining input. The simplest recovery strategy is "panic mode" recovery.
We delete successive characters from the remaining input, until the lexical analyzer can find a
well-formed token at the beginning of what input is left.
Other possible error-recovery actions are:
1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character by another character.
4. Transpose two adjacent characters.
Transformations like these may be tried in an attempt to repair the input. The simplest such strategy
is to see whether a prefix of the remaining input can be transformed into a valid lexeme by a single
transformation.
In practice most lexical errors involve a single character. A more general correction strategy is to
find the smallest number of transformations needed to convert the source program into one that
consists only of valid lexemes.

2.3 EXPRESSING TOKENS BY REGULAR

EXPRESSIONS Specification of Tokens
Regular expressions are an important notation for specifying lexeme patterns. We cannot express
all possible patterns, they are very effective in specifying those types of patterns that we actually
need for tokens.
Strings and Languages
An alphabet is any finite set of symbols. Examples of symbols are letters, digits, and punctuation.
The set {0,1) is the binary alphabet. ASCII is an important example of an alphabet.
A string (sentence or word) over an alphabet is a finite sequence of symbols drawn from that
alphabet. The length of a string s, usually written |s|, is the number of occurrences of symbols in s.
For example, banana is a string of length six. The empty string, denoted ε, is the string of length
zero.
A language is any countable set of strings over some fixed alphabet. Abstract languages like Φ,
the empty set, or { ε }, the set containing only the empty string, are languages under this
definition.
Parts of Strings:
1. A prefix of string s is any string obtained by removing zero or more symbols from the
end of s. For example, ban, banana, and ε are prefixes of banana.
2. A suffix of string s is any string obtained by removing zero or more symbols from the
beginning of s. For example, nana, banana, and ε are suffixes of banana.
3. A substring of s is obtained by deleting any prefix and any suffix from s. For instance,
banana, nan, and ε are substrings of banana.
4. The proper prefixes, suffixes, and substrings of a string s are those, prefixes, suffixes,
and substrings, respectively, of s that are not ε or not equal to s itself.
5. A subsequence of s is any string formed by deleting zero or more not necessarily
consecutive positions of s. For example, baan is a subsequence of banana.
CoSc3112 Compiler Design Chapter II 2.4

6. If x and y are strings, then the concatenation of x and y, denoted xy, is the string formed
by appending y to x.
Operations on Languages
In lexical analysis, the most important operations on languages are union, concatenation, and
closure, which are defined in table 2.2.
Table 2.2: Definitions of operations on languages

Example: Let L be the set of letters {A, B, . . . , Z, a, b, . . . , z) and let D be the set of digits {0,1,..
.9). Other languages that can be constructed from languages L and D
1. L U D is the set of letters and digits - strictly speaking the language with 62 strings of
length one, each of which strings is either one letter or one digit.
2. LD is the set df 520 strings of length two, each consisting of one letter followed by one
digit.
3. L4 is the set of all 4-letter strings.
4. L* is the set of ail strings of letters, including e, the empty string.
5. L(L U D)* is the set of all strings of letters and digits beginning with a letter.
6. D+ is the set of all strings of one or more digits.
Regular expression
Regular expression can be defined as a sequence of symbols and characters expressing a string or
pattern to be searched.
Regular expressions are mathematical representation which describes the set of strings of specific
language.
Regular expression for identifiers represented by letter_ ( letter_ | digit )*. The vertical bar means
union, the parentheses are used to group subexpressions, and the star means "zero or more
occurrences of".
Each regular expression r denotes a language L(r), which is also defined recursively from the
languages denoted by r's subexpressions.
The rules that define the regular expressions over some alphabet Σ.
Basis rules:
1. ε is a regular expression, and L(ε) is { ε }.
2. If a is a symbol in Σ , then a is a regular expression, and L(a) = {a}, that is,
the language with one string of length one.
Induction rules: Suppose r and s are regular expressions denoting languages L(r) and L(s),
respectively.
1. (r) | (s) is a regular expression denoting the language L(r) U L(s).
2. (r) (s) is a regular expression denoting the language L(r) L(s) .
3. (r) * is a regular expression denoting (L (r)) * .
4. (r) is a regular expression denoting L(r). i.e., Additional pairs of parentheses
around expressions.

Example: Let Σ = {a, b}.

CoSc3112 Compiler Design Chapter II 2.5

Regular Language Meaning

expression
a|b {a, b} Single ‘a’ or ‘b’
(a|b) (a|b) {aa, ab, ba, bb} All strings of length two over the alphabet Σ
a* { ε, a, aa, aaa, …} Consisting of all strings of zero or more a's
(a|b)* {ε, a, b, aa, ab, ba, bb, set of all strings consisting of zero or more
aaa, …} instances of a or b
a|a*b {a, b, ab, aab, aaab, …} String a and all strings consisting of zero or
more a's and ending in b
A language that can be defined by a regular expression is called a regular set. If two regular
expressions r and s denote the same regular set, we say they are equivalent and write r = s. For
instance, (a|b) = (b|a), (a|b)*= (a*b*)*, (b|a)*= (a|b)*, (a|b) (b|a) =aa|ab|ba|bb.

Algebraic laws
Algebraic laws that hold for arbitrary regular expressions r, s, and t:
LAW DESCRIPTION
r|s = s|r | is commutative
r(s|t) = (r|s)t | is associative
r(st) = (rs)t Concatenation is associative
r(s|t) = rs|rt; (s|t)r = sr|tr Concatenation distributes over |
εr=rε=r ε is the identity for concatenation
r* = (r |ε)* ε is guaranteed in a closure
r** = r* * is idempotent

Extensions of Regular Expressions

Few notational extensions that were first incorporated into Unix utilities such as Lex that are
particularly useful in the specification lexical analyzers.
1. One or more instances: The unary, postfix operator + represents the positive closure of
a regular expression and its language. If r is a regular expression, then (r)+ denotes the
+ + +
language (L(r)) . The two useful algebraic laws, r* = r |ε and r = rr* = r*r.
2. Zero or one instance: The unary postfix operator ? means "zero or one occurrence."
That is, r? is equivalent to r|ε , L(r?) = L(r) U {ε}.
3. Character classes: A regular expression a1|a2|…|an, where the ai's are each symbols of
the alphabet, can be replaced by the shorthand [a1, a2, …an]. Thus, [abc] is shorthand
for a|b|c, and [a-z] is shorthand for a|b|…|z.
Example: Regular definition for C identifier
Letter_  [A-Z a-z_]
digit  [0-9]
id letter_ ( letter_ | digit )*
Example: Regular definition unsigned integer
digit  [0-9]
+
digits  digit
number  digits ( . digits)? ( E [+ -]? digits )?
Note: The operators *, +, and ? has the same precedence and associativity.
CS6660 Compiler Design Unit II 2.6

2.4 CONVERTING REGULAR EXPRESSION TO DFA

To construct a DFA directly from a regular expression, we construct its syntax tree and then
compute four functions: nullable, firstpos, lastpos, and followpas, defined as follows. Each
definition refers to the syntax tree for a particular augmented regular expression (r)#.
1. nullable(n) is true for a syntax-tree node n if and only if the subexpression represented
by n has ε in its language. That is, the subexpression can be "made null" or the empty
string, even though there may be other strings it can represent as well.
2. firstpos(n) is the set of positions in the subtree rooted at n that correspond to the first
symbol of at least one string in the language of the subexpression rooted at n.
3. lastpos(n) is the set of positions in the subtree rooted at n that correspond to the last
symbol of at least one string in the language of the subexpression rooted at n.
4. followpos(p), for a position p, is the set of positions q in the entire syntax tree such that
there is some string x = a1a2 …an in L((r)#) such that for some i, there is a way to
explain the membership of x in L((r)#) by matching ai to position p of the syntax tree
and ai+1 to position q.
We can compute nullable, firstpos, and lastpos by a straightforward recursion on the height of the
tree. The basis and inductive rules for nullable and firstpos are summarized in table.
The rules for lastpos are essentially the same as for firstpos, but the roles of children c1 and c2 must
be swapped in the rule for a cat-node.
There are only two ways to compute followpos.
1. If n is a cat-node with left child cl and right child c2, then for every position i in
lastpos(c1), all positions in firstpos(c2) are in followpos(i).
2. 2. If n is a star-node, and i is a position in lastpos(n), then all positions in firstpos(n) are
in followpos(i) .

Converting a Regular Expression Directly to a DFA

Algorithm: Construction of a DFA from a regular expression
r. INPUT: A regular expression r. OUTPUT: A DFA D that
recognizes L(r). METHOD:

1. Construct a syntax tree T from the augmented regular expression (r)#.

2. Compute nullable, firstpos, lastpos, and followpos for T.
3. Construct Dstates, the set of states of DFA D, and Dtran, the transition function for D,
CoSc3112 Compiler Design Chapter II 2.7

initialize Dstates to contain only the unmarked state firstpos(no),

where no is the root of syntax tree T for (r)#;
while ( there is an unmarked state S in Dstates )
{
mark S;
for ( each input symbol a )
{
let U be the union of followpos(p) for all p in S that correspond to a;
if ( U is not in Dstates )
add U as an unmarked state to Dstates;
Dtran[S, a] = U
}
} By
the above procedure. The states of D are sets of positions in T. Initially, each state is "unmarked,"
and a state becomes "marked" just before we consider its out-transitions. The start state of D is
firstpos(no), where node no is the root of T. The accepting states are those containing the position
for the endmarker symbol #.

Example: Construct a DFA for the regular expression r = (a|b)*abb

Figure 2.2: Syntax tree for (a|b)*abb#

Figure 2.3 : firstpos and lastpos for nodes in the syntax tree for (a|b)*abb#
CoSc3112 Compiler Design Chapter II 2.8

We must also apply rule 2 to the star-node. That rule tells us positions 1 and 2 are in both
followpos(1) and followpos(2), since both firstpas and lastpos for this node are {1,2}. The complete
sets followpos are summarized in table
NODE n Followpos(n)
1 {1, 2, 3}
2 {1, 2, 3}
3 {4}
4 {4}
5 {4}
6 {}

Figure 2.4: Directed graph for the function followpos

nullable is true only for the star-node, and we exhibited firstpos and lastpos in Figure 2.3. The value
of firstpos for the root of the tree is {1,2,3}, so this set is the start state of D. all this set of states A.
We must compute Dtran[A, a] and Dtran[A, b]. Among the positions of A, 1 and 3 correspond to a,
while 2 corresponds to b. Thus, Dtran[A, a] = followpos(1) U followpos(3) = {1, 2,3,4}, and
Dtran[A, b] = followpos(2) = {1,2,3}.

Figure 2.5: DFA constructed for (a|b)*abb#

The latter is state A, and so does not have to be added to Dstates, but the former, B = {1,2,3,4}, is
new, so we add it to Dstates and proceed to compute its transitions. The omplete DFA is shown in
Figure 2.5.
Example: Construct NFA ε for (alb)*abb and convert to DFA by subset construction.

Figure 2.6: NFA ε for (a|b)*abb

CoSc3112 Compiler Design Chapter II 2.9

Figure 2.7: NFA for (a|b)*abb

Figure 2.8 Result of applying the subset construction to Figure 2.6

2.5 MINIMIZATION OF DFA

There can be many DFA's that recognize the same language. For instance, the DFAs of Figure 2.5
and 2.8 both recognize the same language L((a|b)*abb).
We would generally prefer a DFA with as few states as possible, since each state requires entries in
the table that describes the lexical analyzer.
Algorithm: Minimizing the number of states of a DFA.
INPUT: A DFA D with set of states S, input alphabet Σ, initial state so, and set of accepting states
F.
OUTPUT: A DFA D' accepting the same language as D and having as few states as possible.
METHOD:
1. Start with an initial partition II with two groups, F and S - F, the accepting and
nonaccepting states of D.
2. Apply the procedure of Fig. 3.64 to construct a new partition anew.
initially, let Πnew = Π;
for ( each group G of Π )
{
partition G into subgroups such that two states s and t are in the same subgroup if and only if for all
input symbols a, states s and t have transitions on a to states in the same group of Π; /* at
worst, a state will be in a subgroup by itself */ replace G in IInew by the set of all subgroups
formed;
}
3. If Π new = Π, let Π final = Π and continue with step (4). Otherwise, repeat step (2) with Π
new in place of If Π.
4. Choose one state in each group of Π final as the representative for that group. The
representatives will be the states of the minimum-state DFA D'.
5. The other components of D' are constructed as follows:
(a) The state state of D' is the representative of the group containing the start state of D.
(b) The accepting states of D' are the representatives of those groups that contain an
accepting state of D.
CoSc3112 Compiler Design Chapter II 2.10

(c) Let s be the representative of some group G of Πfina, and let the transition of D from
s on input a be to state t. Let r be the representative of t's group H. Then in D', there
is a transition from s to r on input a.
Example: Let us reconsider the DFA of Figure 2.8 for minimization.
STATE a b
A B C
B B D
C B C
D B E
(E) B C
The initial partition consists of the two groups {A, B, C, D} {E}, which are respectively the
nonaccepting states and the accepting states.
To construct Π new, the procedure considers both groups and inputs a and b. The group {E}
cannot be split, because it has only one state, so (E} will remain intact in Π new.
The other group {A, B, C, D} can be split, so we must consider the effect of each input symbol. On
input a, each of these states goes to state B, so there is no way to distinguish these states using
strings that begin with a. On input b, states A, B, and C go to members of group {A, B, C, D},
while state D goes to E, a member of another group.
Thus, in Π new, group {A, B, C, D} is split into {A, B, C}{D}, and Π new for this round is {A, B,
C){D){E}.
In the next round, we can split {A, B, C} into {A, C}{B}, since A and C each go to a
member of {A, B, C) on input b, while B goes to a member of another group, {D}. Thus, after the
second round, Π new = {A, C} {B} {D} {E).
For the third round, we cannot split the one remaining group with more than one state, since A and
C each go to the same state (and therefore to the same group) on each input. We conclude that
Πfinal = {A, C}{B){D){E).
Now, we shall construct the minimum-state DFA. It has four states, corresponding to the four
groups of Πfinal, and let us pick A, B, D, and E as the representatives of these groups. The
initial state is A, and the only accepting state is E.
Table : Transition table of minimum-state DFA
STATE a b
A B A
B B D
C B E
(E) B A

2.6 LANGUAGE FOR SPECIFYING LEXICAL ANALYZERS-LEX

There are wide range f tools for construction of lexical analyzer based on regular expressions. Lex
is a tool (Computer program) that generates lexical analyzers.
Lex is a lexical analyzer based tool by specifying regular expressions to describe patterns for token.
Lex tool is referred to as the Lex language and the tool itself is the Lex compiler.

Use of Lex
 The Lex compiler transforms the input patterns into a transition diagram and
generates code.
CoSc3112 Compiler Design Chapter II 2.11

 An input file “lex.l” is written in the Lex language and describes the lexical analyzer
to be generated. The Lex compiler transforms “lex.l” to a C program, in a file that is
always named “lex.yy.c”.
 The file “lex.yy.c” is compiled by the C – Compiler and converted into a file
“a.out”. The C-compiler output is a working lexical analyzer that can take a stream
of input characters and produce a stream of tokens.
 The attribute value, whether it be another numeric code, a pointer to the symbol
table, or nothing, is placed in a global variable yylval which is shared between the
lexical analyzer and parser

Figure 2.9: Creating a lexical analyzer with Lex

Structure of Lex Programs

A Lex program has the following form:
declarations
%%
translation rules
%%
auxiliary functions

The declarations section includes declarations of variables, manifest constants (identifiers declared
to stand for a constant, e.g., the name of a token), and regular definitions.
The translation rules of lex program statement have the form Pattern { Action }
Pattern P1 { Action A1}
Pattern P2 { Action A2}
…
Pattern Pn { Action An}
Each pattern is a regular expression. The actions are fragments of code typically written in C
language.
The third section holds whatever additional functions are used in the actions. Alternatively, these
functions can be compiled separately and loaded with the
lexical analyzer.
The lexical analyzer begins reading its remaining input, one character at a time, until it finds the
longest prefix of the input that matches one of the patterns Pi. It then executes the associated action
Ai. Typically, Ai will return to the parser, but if it does not (e.g., because Pi describes whitespace or
comments), then the lexical analyzer proceeds to find additional lexemes, until one of the
corresponding actions causes a return to the parser. The lexical analyzer returns a single value, the
token name, to the parser, but uses the shared, integer variable yylval to pass additional information
about the lexeme found.
CoSc3112 Compiler Design Chapter II 2.12

2.7 DESIGN OF LEXICAL ANALYZER FOR A SAMPLE LANGUAGE

The lexical-analyzer generator such as Lex is architected with an automation simulator. The
implementation of Lex compiler can be based on either NFA or DFA.
2.7.1 The Structure of the Generated Analyzer
Figure 2.10 shows the architecture of a lexical analyzer generated by Lex. A Lex program is
converted into a transition table and actions which are used by a finite Automaton simulator.
The program that serves as the lexical analyzer includes a fixed program that simulates an
automaton; the automaton is deterministic or nondeterministic. The rest of the lexical analyzer
consists of components that are created from the Lex program by Lex itself.

Figure 2.10: A Lex program is turned into a transition table and actions, which are used by a finite-
automaton simulator

These components are:

1. A transition table for the automaton.
2. Those functions that are passed directly through Lex to the output.
3. The actions from the input program, which appear as fragments of code to be invoked at the
appropriate time by the automaton simulator.

2.7.2 Pattern Matching Based on NFA's

To construct the automation for several regular expressions, we need to combine all NFAs into one by
introducing a new start state with ε-transitions to each of the start states of the NFA's N i
for pattern pi as shown in figure 2.11.

Figure 2.11: An NFA constructed from a Lex program

Example: Consider the atern
CoSc3112 Compiler Design Chapter II 2.13

a { action Al for pattern pl }

abb { action A2 for pattern p2 }
a*b+ { action A3 for pattern p3}

Figure 2.12: NFA's for a, abb, and a*b+

Figure 2.13: Combined NFA

Figure 2.14: Sequence of sets of states entered when processing input aaba
Figure 2.15: Transition graph for DFA handling the patterns a, abb, and a*b+

Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter-1 Compiler Design
100% (1)
Chapter-1 Compiler Design
13 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Wolkite University: Complexity Theory (Cosc4131) Chapter Two
100% (1)
Wolkite University: Complexity Theory (Cosc4131) Chapter Two
67 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
CD - Sem 7 - GTU - Study Material - 15112016 - 100740AM PDF
50% (2)
CD - Sem 7 - GTU - Study Material - 15112016 - 100740AM PDF
100 pages
DAA Unit - 1
No ratings yet
DAA Unit - 1
68 pages
Final Exam For Finite Automata
50% (2)
Final Exam For Finite Automata
4 pages
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
No ratings yet
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
64 pages
Chapter 1
No ratings yet
Chapter 1
58 pages
Unit2 Assembler
No ratings yet
Unit2 Assembler
165 pages
Chapter - 1 Introduction To Object-Oriented Programming
100% (1)
Chapter - 1 Introduction To Object-Oriented Programming
90 pages
Chapter 7 Symbol Tables and Error Handler
No ratings yet
Chapter 7 Symbol Tables and Error Handler
34 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Compiler Design Chapter-4
100% (2)
Compiler Design Chapter-4
77 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Module-4 Lex Yacc
No ratings yet
Module-4 Lex Yacc
67 pages
Model of MINI UPS System
50% (4)
Model of MINI UPS System
38 pages
Function Oriented Design PDF
No ratings yet
Function Oriented Design PDF
71 pages
Compiler Design Practical File
No ratings yet
Compiler Design Practical File
49 pages
Chapter Five: Type Checking
100% (1)
Chapter Five: Type Checking
48 pages
Computer Graphics (CG CHAP 3)
0% (1)
Computer Graphics (CG CHAP 3)
15 pages
Services Guide
No ratings yet
Services Guide
72 pages
Ooad Notes
No ratings yet
Ooad Notes
92 pages
Lec 02 - Recursive Definition
No ratings yet
Lec 02 - Recursive Definition
33 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Compiler Design Chapter-1
No ratings yet
Compiler Design Chapter-1
41 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
CD Assignment-2
No ratings yet
CD Assignment-2
16 pages
Compiler Construction Past Paper 2022 Solution
No ratings yet
Compiler Construction Past Paper 2022 Solution
12 pages
Q1 LE TLE 8 Lesson 8 Week 8
100% (1)
Q1 LE TLE 8 Lesson 8 Week 8
14 pages
Rift Valley University Department of Computer Science
No ratings yet
Rift Valley University Department of Computer Science
35 pages
Simplification of CFG: Presented To Presented by
100% (2)
Simplification of CFG: Presented To Presented by
12 pages
5.tokens, Patterns, and Lexemes
No ratings yet
5.tokens, Patterns, and Lexemes
7 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
8 pages
The Role of The Lexical Analyzer
100% (1)
The Role of The Lexical Analyzer
15 pages
CS8602 Compiler Design Two Marks Questions 1
No ratings yet
CS8602 Compiler Design Two Marks Questions 1
22 pages
Chapter 3 Regular Expression
No ratings yet
Chapter 3 Regular Expression
25 pages
Chapter 2 - Query Processing and Optimization
No ratings yet
Chapter 2 - Query Processing and Optimization
16 pages
Compiler Design Important Questions
100% (1)
Compiler Design Important Questions
1 page
Presentations PPT Unit-1 27042019073920AM
100% (1)
Presentations PPT Unit-1 27042019073920AM
42 pages
Excel 2019 Manual 1660979397
100% (1)
Excel 2019 Manual 1660979397
196 pages
Compiler Design
No ratings yet
Compiler Design
31 pages
ch-9 Advanced Classes
No ratings yet
ch-9 Advanced Classes
28 pages
Dev 1 Boomi
100% (1)
Dev 1 Boomi
11 pages
Prolog Lab Sheets
No ratings yet
Prolog Lab Sheets
36 pages
Chapter One: Introduction To Object-Oriented Programming
No ratings yet
Chapter One: Introduction To Object-Oriented Programming
16 pages
Closure Properties of Context-Free Languages: Osama Awwad
No ratings yet
Closure Properties of Context-Free Languages: Osama Awwad
25 pages
Compiler Design Chapter-6
No ratings yet
Compiler Design Chapter-6
83 pages
Compiler Design Worksheet
No ratings yet
Compiler Design Worksheet
2 pages
The Mid-Term Exam of Compiler (2017521460090 - ALVIN)
No ratings yet
The Mid-Term Exam of Compiler (2017521460090 - ALVIN)
10 pages
Data Structures - 2 Marks
No ratings yet
Data Structures - 2 Marks
23 pages
DAA - Questions BANK
No ratings yet
DAA - Questions BANK
8 pages
Chapter Seven: Code Generation
No ratings yet
Chapter Seven: Code Generation
33 pages
SAP SuccessFactors Onboarding Role-Based Permission Guidance - v1.3
No ratings yet
SAP SuccessFactors Onboarding Role-Based Permission Guidance - v1.3
31 pages
2 Syntax Directed Transiation
No ratings yet
2 Syntax Directed Transiation
9 pages
Devops (Unit IV &V)
No ratings yet
Devops (Unit IV &V)
76 pages
Design and Implementation of Simple Scientific Calculator
No ratings yet
Design and Implementation of Simple Scientific Calculator
7 pages
Unit 04 Modern Approach To Software Project and Economics
No ratings yet
Unit 04 Modern Approach To Software Project and Economics
35 pages
Compiler Design Laboratory I
No ratings yet
Compiler Design Laboratory I
6 pages
Compiler Design - 2 Marks Question Set With Answers - Tutor4cs PDF
No ratings yet
Compiler Design - 2 Marks Question Set With Answers - Tutor4cs PDF
15 pages
Simple Sorting and Searching Algorithms 2.1searching: Pseudocode
No ratings yet
Simple Sorting and Searching Algorithms 2.1searching: Pseudocode
7 pages
HW 31712
No ratings yet
HW 31712
22 pages
Arid Agriculture University, Rawalpindi: (Theory)
No ratings yet
Arid Agriculture University, Rawalpindi: (Theory)
6 pages
@CD - ch2 Compiler Design
No ratings yet
@CD - ch2 Compiler Design
26 pages
Lex and Yacc Programs
No ratings yet
Lex and Yacc Programs
8 pages
S J C Institute of Technology Dept. of Computer Science & Engineering SUB: Data Structures & Applications (18CS32) Class: Tutorial 3
No ratings yet
S J C Institute of Technology Dept. of Computer Science & Engineering SUB: Data Structures & Applications (18CS32) Class: Tutorial 3
2 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
Computer Graphics: Geometry and Line Generation
No ratings yet
Computer Graphics: Geometry and Line Generation
5 pages
Practice Final Exam For Natural Language Processing
No ratings yet
Practice Final Exam For Natural Language Processing
9 pages
ExaDB-D Dbaascli 20220512 20231121 240424
No ratings yet
ExaDB-D Dbaascli 20220512 20231121 240424
477 pages
CHAP IInew
No ratings yet
CHAP IInew
29 pages
EC2303 Computer Architecture and Organization QUESTION PAPER
No ratings yet
EC2303 Computer Architecture and Organization QUESTION PAPER
4 pages
Recursively Enumerable Languages
No ratings yet
Recursively Enumerable Languages
8 pages
Web Programming Final Exam: 1. Answer The Following Questions
No ratings yet
Web Programming Final Exam: 1. Answer The Following Questions
4 pages
Computer Graphics (CG CHAP 1)
No ratings yet
Computer Graphics (CG CHAP 1)
12 pages
Online TV Shows Pitch Deck by Slidesgo
No ratings yet
Online TV Shows Pitch Deck by Slidesgo
51 pages
Computer Graphics (CG CHAP 2)
No ratings yet
Computer Graphics (CG CHAP 2)
32 pages
CHAP - I Microprocessor
No ratings yet
CHAP - I Microprocessor
49 pages
CG Chap 1
No ratings yet
CG Chap 1
3 pages
AI CH 4
No ratings yet
AI CH 4
19 pages
PM Debug Info
No ratings yet
PM Debug Info
1,001 pages
Sololearn Ruby Language
No ratings yet
Sololearn Ruby Language
32 pages
BPSEcoConfigurator BPSC22 BPSC23 BPSC24 SoftwareManual en 515940091
No ratings yet
BPSEcoConfigurator BPSC22 BPSC23 BPSC24 SoftwareManual en 515940091
136 pages
Rack Access Control Solution
No ratings yet
Rack Access Control Solution
8 pages
IO Mount QuickStartGuide
No ratings yet
IO Mount QuickStartGuide
6 pages
How To Install
No ratings yet
How To Install
2 pages
NetSec-Generalist Dumps - Palo Alto Networks Network Security Generalist
No ratings yet
NetSec-Generalist Dumps - Palo Alto Networks Network Security Generalist
12 pages
MC Ty Completingsquare2 2009 1
No ratings yet
MC Ty Completingsquare2 2009 1
5 pages
E1 Unit1 ExtMemory MG
No ratings yet
E1 Unit1 ExtMemory MG
24 pages
Chapter 2
No ratings yet
Chapter 2
95 pages
Mbox 2 Mini Quick Setup
No ratings yet
Mbox 2 Mini Quick Setup
2 pages
CSI News Letter AUG2011
No ratings yet
CSI News Letter AUG2011
6 pages
How To Configure Client Authentication For The REST Adapter Sender Channel
No ratings yet
How To Configure Client Authentication For The REST Adapter Sender Channel
2 pages
FALLSEM2022-23 CSE2006 ETH VL2022230103866 Reference Material I 22-08-2022 8255
No ratings yet
FALLSEM2022-23 CSE2006 ETH VL2022230103866 Reference Material I 22-08-2022 8255
41 pages
Computer Networking MQP by Nitin Paliwal
No ratings yet
Computer Networking MQP by Nitin Paliwal
5 pages
Supervised Learning in Healthcare
No ratings yet
Supervised Learning in Healthcare
6 pages
How To Extend A Data Volume in Windows Server 2003, in Windows XP, in Windows 2000, and in Windows Server 2008
No ratings yet
How To Extend A Data Volume in Windows Server 2003, in Windows XP, in Windows 2000, and in Windows Server 2008
4 pages
Information Technology
No ratings yet
Information Technology
9 pages
EMC Statement of Volatility - Isilon S2xx, X2xx, X4xx, NL4xx - January 2016
No ratings yet
EMC Statement of Volatility - Isilon S2xx, X2xx, X4xx, NL4xx - January 2016
7 pages
The Doctor of The Future Will Use Electronic Prescriptions
No ratings yet
The Doctor of The Future Will Use Electronic Prescriptions
1 page

Compiler Design Chapter 2

Uploaded by

Compiler Design Chapter 2

Uploaded by

CoSc3112 Compiler Design Chapter II 2.

CHAPTER II LEXICAL ANALYSIS 2.1 NEED AND ROLE

2.2 LEXICAL ERRORS

2.3 EXPRESSING TOKENS BY REGULAR

Example: Let Σ = {a, b}.

Regular Language Meaning

Extensions of Regular Expressions

2.4 CONVERTING REGULAR EXPRESSION TO DFA

Converting a Regular Expression Directly to a DFA

1. Construct a syntax tree T from the augmented regular expression (r)#.

initialize Dstates to contain only the unmarked state firstpos(no),

Example: Construct a DFA for the regular expression r = (a|b)*abb

Figure 2.2: Syntax tree for (a|b)*abb#

Figure 2.4: Directed graph for the function followpos

Figure 2.5: DFA constructed for (a|b)*abb#

Figure 2.6: NFA ε for (a|b)*abb

Figure 2.7: NFA for (a|b)*abb

Figure 2.8 Result of applying the subset construction to Figure 2.6

2.5 MINIMIZATION OF DFA

2.6 LANGUAGE FOR SPECIFYING LEXICAL ANALYZERS-LEX

Figure 2.9: Creating a lexical analyzer with Lex

Structure of Lex Programs

2.7 DESIGN OF LEXICAL ANALYZER FOR A SAMPLE LANGUAGE

These components are:

2.7.2 Pattern Matching Based on NFA's

Figure 2.11: An NFA constructed from a Lex program

a { action Al for pattern pl }

Figure 2.12: NFA's for a, abb, and a*b+

Figure 2.13: Combined NFA

You might also like