0% found this document useful (0 votes)
26 views39 pages

CDUnit 1

Uploaded by

kusumelekhaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views39 pages

CDUnit 1

Uploaded by

kusumelekhaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

UNIT-I: Introduction:

Language Processors
The Structure of a Compiler
 Lexical Analysis: The Role of the Lexical
Analyzer
Specification of Tokens
 Recognition of Tokens
 The Lexical-Analyzer Generator Lex.

Compiler Design 1
Compiler :

A compiler is a program that can read a program in one


language i.e. source language and translate it into an
equivalent program in another language i.e. target
language

Compiler Design 2
If the target program is an executable machine-language
program, it can then be called by the user to process inputs
and produce outputs

Interpreter :
An interpreter is another common kind of language
processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute the
operations specified in the source program on inputs
supplied by the user

Compiler Design 3
For example Java language processors combine compilation and
interpretation A Java source program may first be compiled into an
intermediate form called bytecodes. The bytecodes are then
interpreted by a virtual machine.

A benefit of this arrangement is that bytecodes compiled on one


machine can be interpreted on another machine, perhaps across a
network. In order to achieve faster processing of inputs to outputs.

Compiler Design 4
Language Processors:
In addition to a compiler, several other programs may be required to
create an executable target program as shown in Fig

Compiler Design 5
Preprocessor :
The preprocessor may also expand shorthands, called macros, into
source language statements. The modified source program is then fed
to a compiler.

Compiler :
The compiler may produce an assembly-language program as its
output, because assembly language is easier to produce as output and
is easier to debug.

Assembler :
The assembly language is then processed by a program called
an assembler that
produces relocatable machine code as its output.

Linkers and Loaders :


Large programs are often compiled in pieces, so the relocatable
machine code may have to be linked together with other relocatable
object files and library files into the code that actually runs on the
machine.
The linker resolves external memory addresses, where the code in one
file may refer to a Compiler
location in another file.
Design 6
Structure of a compiler :

There are two major parts of a compiler:


Analysis
Synthesis

In analysis phase, an intermediate representation is created from the


given source program.
Lexical Analyzer ,Syntax Analyzer and Semantic Analyzer are the
parts of this phase.
In synthesis phase, the equivalent target program is created from this
intermediate representation.
Intermediate Code Generator, Code Generator, and Code Optimizer
are the parts of this phase.

Compiler Design 7
Phases of a compiler:

• Compiler consists of 6 phases

•Each phase transforms the source program from one representation


into another representation.

• They communicate with error handlers.

• They communicate with the symbol table.

Compiler Design 8
Compiler Design 9
Lexical Analysis :

•Lexical analyzer phase is the first phase of compilation process.


•Lexical Analyzer reads the stream of characters making up the source
program and group the characters into meaningful sequences called
Lexeme

•For each lexeme, the lexical analyzer produces a token of the form
that it passes on to the subsequent phase, syntax analysis
<token-name, attribute-value>

Token-name: an abstract symbol is used during syntax analysis, an


attribute-value: points to an entry in the symbol table for this token

•Puts information about identifiers into the symbol table.


•Example: position =initial + rate * 60

Compiler Design 10
Syntax analysis :

•Syntax analysis is the second phase of compilation process.


• It takes tokens as input and generates a parse tree as output. In
syntax analysis phase, the parser checks that the expression made
by the tokens is syntactically correct or not.

•A typical representation is a syntax tree in which each interior


node represents an operation and the children of the node
represent the arguments of the operation

Compiler Design 11
Semantic analysis :

•Semantic analysis is the third phase of compilation process.


•It checks whether the parse tree follows the rules of language.
•Semantic analyzer keeps track of identifiers, their types and
expressions.

•The output of semantic analysis phase is the annotated tree syntax.

Compiler Design 12
Intermediate Code Generation :

•In the intermediate code generation, compiler generates the source


code into the intermediate code.

•Intermediate code is generated between the high-level language and


the machine language.

•The intermediate code should be generated in such a way that you can
easily translate it into the target machine code.

Compiler Design 13
Code Optimization :

•Code optimization is used to improve the intermediate code so that


the output of the program could run faster and take less space.
•It removes the unnecessary lines of the code and arranges the
sequence of statements in order to speed up the program execution.

Code Generation :

•Code generation is the final stage of the compilation process. It takes


the optimized intermediate code as input and maps it to the target
machine language.
• Code generator translates the intermediate code into the machine
code of the specified computer.

Compiler Design 14
Compiler Design 15
Lexical Analysis :
•The first phase of a compiler
•The main task of the lexical analyzer is to read the input characters of
the source program, group them into lexemes , and produce as output
a sequence of tokens for each lexeme in the source program.

•The stream of tokens is sent to the parser for syntax analysis


•The lexical analyzer to interact with the symbol table
•One such task is stripping out comments and whitespace (blank,
newline, tab, and perhaps other characters that are used to separate
tokens in the input).

Compiler Design 16
The role of lexical analyzer :

token
Source Lexical Parse
Parser
program Analyzer tree
getNextToken

Symbol
table

Compiler Design 17
Lexical Analysis Versus Parsing
 Simplicity of design is the most important consideration.
 Compiler efficiency is improved
 Compiler portability is enhanced.

Compiler Design 18
Tokens, Patterns and Lexemes
A token is a pair a token name and an optional
token value
A pattern is a description of the form that the
lexemes of a token may take
A lexeme is a sequence of characters in the
source program that matches the pattern for a
token

Compiler Design 19
Example:
In many programming languages, the following classes cover most or all of
the tokens:

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2


number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);

Compiler Design 20
Attributes for tokens
E = M * C ** 2
<id, pointer to symbol table entry for E>
<assign-op>
<id, pointer to symbol table entry for M>
<mult-op>
<id, pointer to symbol table entry for C>
<exp-op>
<number, integer value 2>

Compiler Design 21
Lexical errors
Some errors are out of power of lexical analyzer
to recognize:
fi (a == b) …
However it may be able to recognize errors like:
d = 2r
Such errors are recognized when no pattern for
tokens matches a character sequence

Compiler Design 22
Error recovery
Panic mode: successive characters are ignored
until we reach to a well formed token
Delete one character from the remaining input
Insert a missing character into the remaining
input
Replace a character by another character
Transpose two adjacent characters

Compiler Design 23
Specification of tokens
In theory of compilation regular expressions are
used to formalize the specification of tokens
Regular expressions are means for specifying
regular languages
Example:
 Letter(letter | digit)*

Each regular expression is a pattern specifying


the form of strings

Compiler Design 24
Regular expressions
 Ɛ is a regular expression, L(Ɛ) = {Ɛ}
 If a is a symbol in ∑then a is a regular expression, L(a) =
{a}
 (r) | (s) is a regular expression denoting the language L(r) ∪
L(s)
 (r)(s) is a regular expression denoting the language
L(r)L(s)
 (r)* is a regular expression denoting (L(r))*
 (r) is a regular expression denoting L(r)

Compiler Design 25
Regular definitions
d1 -> r1
d2 -> r2

dn -> rn

Example:
letter -> A | B | … | Z | a | b | … | Z |
digit -> 0 | 1 | … | 9
id -> letter (letter| digit)*

Compiler Design 26
Extensions
One or more instances: (r)+
Zero of one instances: r?
Character classes: [abc]

Example:
letter_ -> [A-Za-z_]
digit -> [0-9]
id -> letter(letter|digit)*

Compiler Design 27
Recognition of tokens
Starting point is the language grammar to understand the
tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt

expr -> term relop term
| term
term -> id
| number

Compiler Design 28
Recognition of tokens (cont.)
 The next step is to formalize the patterns:
digit -> [0-9]
digits -> digit+
number -> digit(.digits)? (E[+-]? digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
 We also need to handle whitespaces:
ws -> (blank | tab | newline)+

Compiler Design 29
Transition diagrams
Transition diagram for relop
Relop -> < | > | <= | >= | = | <>

Compiler Design 30
Transition diagrams (cont.)
Transition diagram for reserved words and
identifiers
id -> letter (letter|digit)*

Compiler Design 31
Transition diagrams (cont.)
Transition diagram for unsigned numbers
number -> digit(.digits)? (E[+-]? digit)?

Compiler Design 32
Transition diagrams (cont.)
 Transition diagram for whitespace
ws -> (blank | tab | newline)+

Compiler Design 33
Lexical Analyzer Generator - Lex

 LEX is a tool that allows one to specify a Lexical Analyzer by


specifying RE to describe patterns for tokens.

 Input Notation-Lex language(Specification)

 Lex Compiler-Transforms Input patterns into a Transition


diagram and generates code in a file called lex.yy.c

Compiler Design 34
Lexical analyzer with LEX

Lex Source program


lex.l Lexical Compiler lex.yy.c

lex.yy.c C a.out
compiler

Input stream Sequence


a.out
of tokens

Compiler Design 35
Structure of Lex programs :

Lex program has the following


form:
declarations
%%
translation rules
%%
auxiliary functions

•The translation rules each have the form


Pattern {Action}

Compiler Design 36
• The declarations section includes declarations of variables, manifest
constants (identifiers declared to stand for a constant, e.g., the name of a
token), and regular definitions.
• The translation rules each have the form
• Pattern { Action }
• pattern is a regular expression
• Action-Fragment of code written in C.
• Third Section-holds whatever additional functions are used in the actions.
• Alternatively, these functions can be compiled separately and loaded with
the lexical analyser

Compiler Design 37
Conflict Resolution in Lex
There are two rules that Lex uses to decide on the proper lexeme to
select, when several prefixes of the input match one or more
patterns:

1. Always prefer a longer prefix to a shorter prefix.


2. If the longest possible prefix matches two or more
patterns, prefer the pattern listed first in the Lex program.

Compiler Design 38
The Lookahead Operator
 Lex automatically reads one character ahead of the last character that forms
the selected lexeme, and then retracts the input so only the lexeme itself is
consumed from the input.

 However, sometimes, we want a certain pattern to be matched to the input only


when it is followed by a certain other characters. If so, we may use the slash in
a pattern to indicate the end of the part of the pattern that matches the lexeme.

 What follows / is additional pattern that must be matched before we can decide
that the token in question was seen, but what matches this second pattern is not
part of the lexeme.

Compiler Design 39

You might also like