0% found this document useful (0 votes)
19 views36 pages

Unit 1 CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views36 pages

Unit 1 CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT - 1

CD
(Compiler Design)
What is Compiler?
Compiler is a software which converts a
program written in high level language called
Source Language to low level language
(Object/Target/Machine Language).

It can be shown with the help of following


diagram:
Phases of a Compiler
There are two major phases of compilation, which
in turn have many parts. Each of them take input
from the output of the previous level and work in
a coordinated way.

• Analysis Phase – An intermediate representation


is created from the give source code :
• Synthesis Phase – Equivalent target program is
created from the intermediate representation.
Machine Code
Analysis Phase includes:
• Lexical Analyzer
• Syntax Analyzer
• Semantic Analyzer
• Intermediate Code Generator
A LANGUAGE FOR SPECIFYING LEXICAL
ANALYZER
There is a wide range of tools for constructing lexical analyzers.
Lex
YACC

Lex is a computer program that generates lexical analyzers. Lex is commonly used
with the yacc parser generator.
Creating a lexical analyzer

• First, a specification of a lexical analyzer is prepared by creating a


program lex.l in the Lex language. Then, lex.l is run through the Lex compiler to
produce a C program lex.yy.c.

• Finally, lex.yy.c is run through the C compiler to produce an object progra


m a.out, which is the lexical analyzer that transforms an input stream into a
sequence of tokens.

Fig1.11 Creating a lexical analyzer with lex

Lex Specification
A Lex program consists of three parts:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
Definitions include declarations of variables, constants, and regular
definitions

Ø Rules are statements of the form


p1 {action1}
p2 {action2}

pn {actionn}
where pi is regular expression and actioni describes what action the lexical analyzer
should take
when pattern pi matches a lexeme. Actions are written in C code.


User subroutinesare auxiliary procedures needed by the actions. These can be
compiledseparately and loaded with the lexical analyzer.
Flex regular expressions

In addition to the usual regular expressions, Flex introduces some new notations.

[abcd]

A bracketed expression describes a set of characters. Expression [abcd] is equivalent to


(a|b|c|d).

[0-9]

In brackets, a dash indicates a range of characters. For example, [a-zA-Z] matches any
single letter. If you want a dash as one of the characters, put it first.

[^abcd]

This indicates any character except a, b, c or d. For example, [^a-zA-Z] matches any
nonletter.
The input character is read from secondary storage. But reading in this
way from secondary storage is costly. Hence buffering technique is used
A block of data is first read into a buffer, and then scanned by lexical
analyzer
There are two methods used in this context
1. One Buffer Scheme
2. Two Buffer Scheme
One Buffer Scheme:
In this scheme, only one buffer is used to store the input string. But the
problem with this scheme is that if lexeme is very long then it crosses the
buffer boundary, to scan rest of the lexeme the buffer has to be refilled,
that makes overwriting the first part of lexeme.
Two Buffer Scheme:
To overcome the problem of one buffer scheme, in this method two buffers
are used to store the input string. The first buffer and second buffer are
scanned alternately. When end of current buffer is reached the other
buffer is filled.
Initially both the bp and fp are pointing to the first character of first
buffer. Then the fp moves towards right in search of end of lexeme. as
soon as blank character is recognized, the string between bp and fp is
identified as corresponding token. To identify, the boundary of first
buffer end of buffer character should be placed at the end first buffer.
Similarly end of second buffer is also recognized by the end of buffer
mark present at the end of second buffer. When fp encounters first eof,
then one can recognize end of first buffer and hence filling up second
buffer is started. in the same way when second eof is obtained then it
indicates of second buffer. Alternatively both the buffers can be filled up
until end of the input program and stream of tokens is identified.
This eof character introduced at the end is calling Sentinel which is used
to identify the end of buffer.

You might also like