0% found this document useful (0 votes)
50 views20 pages

Compiler Design Part 2

This document discusses compiler design and lexical analysis. It covers the following key points: - Lexical analysis is the first phase of a compiler that identifies lexemes in source code and produces tokens. It is implemented using patterns and finite automata. - The main tasks of a lexical analyzer are to read source characters, group them into lexemes, and produce tokens that are passed to the parser. - Regular expressions are used to define patterns for tokens. Finite automata are constructed to recognize regular languages defined by patterns. - Attributes may be associated with tokens to provide additional information to later compiler phases. Lexical errors are also handled. - Methods for minimizing the number of states in

Uploaded by

KDCreatives
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views20 pages

Compiler Design Part 2

This document discusses compiler design and lexical analysis. It covers the following key points: - Lexical analysis is the first phase of a compiler that identifies lexemes in source code and produces tokens. It is implemented using patterns and finite automata. - The main tasks of a lexical analyzer are to read source characters, group them into lexemes, and produce tokens that are passed to the parser. - Regular expressions are used to define patterns for tokens. Finite automata are constructed to recognize regular languages defined by patterns. - Attributes may be associated with tokens to provide additional information to later compiler phases. Lexical errors are also handled. - Methods for minimizing the number of states in

Uploaded by

KDCreatives
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

COMPILER DESIGNING

INTRODUCTION
Finite automata and lexical Analysis:
⚫A lexical analyzer automatically by specifying the
lexeme patterns to a lexical-analyzer generator and
compiling those patterns into code that functions as a
lexical analyzer.
⚫It also speeds up the process of implementing the
lexical analyzer, since the programmer species the
software at the very high level of patterns and relies on
the generator to produce the detailed code.
⚫A lexical-analyzer generator called Lex.
The Role of the Lexical Analyzer:
⚫As the first phase of a compiler, the main task of the lexical
analyzer is to read the input characters of the source program,
group them into lexemes, and produce as output a sequence
of tokens for each lexeme in the source program. The stream
of tokens is sent to the parser for syntax analysis.
⚫Commonly, the interaction is implemented by having the
parser call the lexical analyzer.
⚫The call, suggested by the getNextToken command, causes
the lexical analyzer to read characters from its input until it
can identify the next lexeme and produce for it the next
token, which it returns to the parser.
The Role of the Lexical Analyzer:

⚫ Since the lexical analyzer is the part of the compiler that reads the source
text, it may perform certain other tasks besides identification of lexemes.
⚫ One such task is stripping out comments and whitespace (blank, newline,
tab, and perhaps other characters that are used to separate tokens in the
input).
lexical analyzers are divided into a cascade of two
processes:

⚫ Scanning consists of the simple processes that do not require


tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.
⚫ Lexical analysis proper is the more complex portion, which
produces tokens from the output of the scanner.
A) Lexical Analysis Versus Parsing
⚫ There are a number of reasons why the analysis portion of a
compiler is normally separated into lexical analysis and parsing
(syntax analysis) phases.
I. Simplicity of design is the most important consideration.
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks.
II. Compiler efficiency is improved.
III.Compiler portability is enhanced. Input-device-specific
peculiarities can be restricted to the lexical analyzer.
B) Tokens, Patterns, and Lexemes
⚫ A token is a pair consisting of a token name and an optional attribute
value. The token name is an abstract symbol representing a kind of
lexical unit. The token names are the input symbols that the parser
processes..
⚫ A pattern is a description of the form that the lexemes of a token
may take. In the case of a keyword as a token, the pattern is just the
sequence of characters that form the keyword. For identifiers and
some other tokens, the pattern is a more complex structure that is
matched by many strings.
⚫ A lexeme is a sequence of characters in the source program that
matches the pattern for a token and is identified by the lexical
analyzer as an instance of that token.
Example
⚫ To see how these concepts are used in practice, in the C
statement
printf("Total = %d\n", score);
both printf and score are lexemes matching the pattern for token
id, and "Total = %d\n" is a lexeme matching literal.
C) Attributes for Tokens
⚫ One lexeme can match a pattern, the lexical analyzer must provide the
subsequent compiler phases additional information about the
particular lexeme that matched.
⚫ Example

E = M * C**2

⚫ are written below as a sequence of pairs.


D) Lexical Errors
⚫ It is hard for a lexical analyzer to tell, without the aid of other
components, that there is a source-code error.
⚫ For instance, if the string fi is encountered for the first time in a C
program in the context:
fi ( a == f(x)) ...
⚫ A lexical analyzer cannot tell whether fi is a misspelling of the
keyword if or an undeclared function identifier.
⚫ Other possible error-recovery actions are:
a) Delete one character from the remaining input.
b) Insert a missing character into the remaining input.
c) Replace a character by another character.
d) Transpose two adjacent characters.
Regular Expression
⚫ Notations
If r and s are regular expressions denoting the
languages L(r) and L(s), then
⚫ Union : (r)|(s) is a regular expression denoting L(r) U
L(s)
⚫ Concatenation : (r)(s) is a regular expression
denoting L(r)L(s)
⚫ Kleene closure : (r)* is a regular expression denoting
(L(r))*(r) is a regular expression denoting L(r)
Precedence and Associativity

⚫ *, concatenation (.), and | (pipe sign) are left


associative
⚫ * has the highest precedence
⚫ Concatenation (.) has the second highest
precedence.
⚫ | (pipe sign) has the lowest precedence of all.
Finite Automata Construction
⚫ States : States of FA are represented by circles.
State names are written inside circles.
⚫ Start state : The state from where the automata
starts is known as the start state. Start state has an
arrow pointed towards it.
⚫ Intermediate states : All intermediate states
have at least two arrows; one pointing to and
another pointing out from them.
⚫ Final state : If the input string is successfully parsed, the
automata is expected to be in this state. Final state is
represented by double circles. It may have any odd number
of arrows pointing to it and even number of arrows
pointing out from it. The number of odd arrows are one
greater than even, i.e. odd = even+1.
⚫ Transition : The transition from one state to another state
happens when a desired symbol in the input is found. Upon
transition, automata can either move to the next state or
stay in the same state. Movement from one state to another
is shown as a directed arrow, where the arrows point to the
destination state. If automata stays on the same state, an
arrow pointing from a state to itself is drawn.
Minimizing the Number of states of a DFA

⚫ There are two method for minimizing the number of


states and transitions:-

1. Empirical Method
2. Formal Method
Empirical Method
⚫ This method minimized the number of states
and transitions according to the following
steps:-
❑ We merge two states in DFA into one state if they
have the same important states. The important
state is the state with no ε-transition
❑ We merge two states if they either both include or
both exclude accepting states of the NFA.
Example: Minimize the DFA to the Empirical
method.
2- Formal Method
⚫ In this method we don’t need an experience for deciding
which of the state that are legal for merging, instead of
that we must use the following algorithm:-
⚫ Algorithm: - Minimizing the number of states of a DFA
⚫ Input: - A DFA M with set of states S, inputs Σ, transitions
defined for all states and inputs, initial state S0, and set
the final states F.
⚫ Output: - A DFA M' accepting the same language as M and
having as few states as possible.
THANK YOU

You might also like