0% found this document useful (0 votes)
58 views28 pages

UNIT I BKS Lesson 3 Lexical Analysis and Role of Lexical Analyzer

Compiler notes

Uploaded by

siyadogra98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views28 pages

UNIT I BKS Lesson 3 Lexical Analysis and Role of Lexical Analyzer

Compiler notes

Uploaded by

siyadogra98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Learn Compiler Design: From B. K.

Sharma

UNIT I

Lexical Analysis: Token, Lexeme


Pattern and Role of Lexical
Analyzer
Learn Compiler Design: From B. K. Sharma

Unit I: Syllabus
• Introduction to Compiler
• Structure of a compiler
• Lexical Analysis
• Role of Lexical Analyzer
• Input Buffering
• Specification of Tokens
• Recognition of Tokens
Learn Compiler Design: From B. K. Sharma

Unit I: Syllabus
• Lex
• Finite Automata
• Regular
• Expressions to Automata
• Minimizing DFA.
Learn Compiler Design: From B. K. Sharma

Active Learning Activity: Diagnostic


Assessment

One- Minute Paper

List 6 phases of compiler.

List 8 components of structure of compiler.


Learn Compiler Design: From B. K. Sharma
Summary of Lesson 2: Structure of a Compiler:
The compiler's structure is modular consisting of following
8 components:
1: Lexical Analyzer takes as input a stream of characters and
produces as output a stream of words or phrases or tokens along
with their associated syntactic categories

2: Syntax Analyzer takes as input a stream of tokens and


recognizes the structure of tokens according to grammar of the
language producing parse tree or syntax tree as output

3: Semantic analyzer checks the static semantics of the


language and annotates the syntax tree with type information

4: Intermediate Code Generator produces code that is machine


independent which is portable code.
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 2: Structure of a Compiler:
The compiler's structure is modular consisting of following
8 components:

5: Code Optimizer produces better /semantically equivalent


code

6: Target Code Generator produces final target low-level code


which in assembly language or machine language.

7: Symbol Table is a data structure used to store information


about identifiers, their types, and other attributes.

8: Error Handler ensures the compiler can detect and


report errors to the programmers, aiding in code
correction and improvement.
Learn Compiler Design: From B. K. Sharma

Mapping of Lesson with Course Outcome


(CO)
Lesson CO
Lesson 3: Lexical Apply the knowledge of
Analysis and Role of theory of computation in
Lexical Analyzer specifying and
recognizing tokens
Learn Compiler Design: From B. K. Sharma
Tokens, Lexeme or Pattern
Token:
a group of characters having a collective meaning.

Smallest individual unit of a language that are


recognized.

Token Types or classes:

Identifiers Keywords Operators

Numbers Delimiters Parentheses


Learn Compiler Design: From B. K. Sharma
Lexical Analysis and Lexical Analyzer

Tokens IR
Source Scanner Parser
code

Errors

Lexical Analysis:
The task concerned with breaking an input into its
smallest meaningful units, called tokens.
Lexical Analyzer (Scanner):
Program that reads input characters and produces
a sequence of tokens as output.
Learn Compiler Design: From B. K. Sharma

Tasks or Role of Lexical Analyzer: Scanner


Main Role:
Read the characters of source language (a stream
of characters) and break it up into tokens, the
smallest meaningful units of the source language.

Other Roles:
Remove the white space/tab Remove the comments

Interpret the compiler directives. Insert Tokens into ST

Generate Errors Send Tokens to Parser


Learn Compiler Design: From B. K. Sharma
Tokens, Lexeme or Pattern
Lexeme:

Lexeme is a particular instant of a token.

e.g. token: identifier, lexeme: a, x,y,2.5 etc.

Token: operators, lexeme: +, -, *, /

Token: Parentheses, lexeme: (, )

Token: keywords, lexeme: main, int


Learn Compiler Design: From B. K. Sharma
Tokens, Lexeme or Pattern
Pattern:

the rule describing how a token can be formed.

e.g.: identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])*

letter followed by letters and digits

e.g.: Number: [0-9]+


Learn Compiler Design: From B. K. Sharma

Lexical Analysis: Token Example


Let us consider a C-Language statement:

if (x==3)
Tokens are:
Keyword, LPAR, IDENT, EQ, NUMBER, RPAR

Token and Lexeme pairs are


<Key, “if”> <LPAR, “(“> <IDENT, “x”>

<LOP, “==”> <NUMB, “3”> <RPAR,”)”>


Learn Compiler Design: From B. K. Sharma

Lexical Analysis: Token Example

i f ( x = = 3 ) scanner IF, LPAR, IDENT, EQ, NUMBER,


RPAR, ..., EOF

character stream token stream


(must end with EOF)
Learn Compiler Design: From B. K. Sharma

Lexical Analysis: Token Attribute Example

y := 31 + 28*x Lexical analyzer

<id, “y”> <assign, “:=“> <num, 31> <+, > <num, 28> <*, > <id, “x”>

token

tokenval
(token attribute)
Parser
Learn Compiler Design: From B. K. Sharma

Lexical Analysis: Token Example


Let us consider another C statement:
if (x==y)
z =12;
else
z = 3;
i f ( x = = y ) \n \t z = 1 2 ; \n e l s e \n \t z = 3 ; \n

<KEY, “if”> <LPAR> <ID, “x”> <LOP, “==”> <ID, “y”>

<RPAR> <ID, “z”> <OP, “=“> <INT, “12“> <SEMIC>

<KEY, “else“> <ID, “z“> <OP, “=“> <INT, “3“> <SEMIC>


Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper
Consider the following code in C Language:
printf (“i=%d, j=%f, &i=%x\n”, i, j, &i);

The number of tokens find by the lexical analyzer


is ?

a) 10 b) 35 c) 12 d) 46
Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper

Everything inside “ ” in printf() is counted as a


single token.
Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper
Consider the following code in C Language:
#include<stdio.h>
int main()
{
printf(“%d + %d =%d”,3,1,4);
return 0;
}

The number of lexemes after pre-processing is ?


a) 10 b)12 c) 20 d) 5
Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper
During pre-processing, file inclusion, macro substitution
pre-processing directives, comments are removed.
Learn Compiler Design: From B. K. Sharma

What is the need for separating parser


from scanner?

Source Scanner Tokens Parser


code

Errors

Lexical analyzer does not have to be an individual


phase.
But having a separate phase simplifies the design
and improves the efficiency and portability.
Learn Compiler Design: From B. K. Sharma

What is the need for separating parser


from scanner?
Reasons for separating both analysis:

1) Simpler design.

Separation allows the simplification of one or the other.

Example: A parser with comments or white spaces is more


complex

2) Compiler efficiency is improved.

Optimization of lexical analysis because a large amount of


time is spent reading the source program and partitioning it
into tokens.
Learn Compiler Design: From B. K. Sharma

What is the need for separating parser


from scanner?
3) Compiler portability is enhanced.
Only the scanner requires to communicate with the outside
world

Input alphabet peculiarities and other device-specific


anomalies can be restricted to the lexical analyzer.

4) Specialization

Specialized techniques can be applied to improves the


lexical analysis process
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 3: Lexical Analysis and Role of
Lexical Analyzer
1: A group of characters having a collective meaning or
smallest individual unit of a language that are recognized,
is called token.
2: Lexeme is a particular instant of a token.
3: The rule describing how a token can be formed is called
pattern.
4: The primary task of Lexical Analyzer is Read the
characters of source language (a stream of characters) and
break it up into tokens, the smallest meaningful units of the
source language.
5:The secondary tasks of Lexical Analyzer are: remove white
space and tab, remove comments, interpret compiler
directives, insert tokens into symbol table, lexical errors,
send tokens to parser.
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 3: Lexical Analysis and Role of
Lexical Analyzer
6: Separating scanner from parser simplifies the design and
improves the efficiency and portability.
Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper

When expression sum=3+2 is tokenized then what


is the token category of 3?

a) Identifier b) Assignment operator

c) Integer Literal d) Addition Operator


Learn Compiler Design: From B. K. Sharma
Lexical Analysis: Questions

Explain the term token, lexeme and Pattern with


examples.

What is the need for separating the parser from


scanner?
Learn Compiler Design: From B. K. Sharma

Active Learning Activity: After Leaning


Process to Check Important Take away of
Things and Feedback for Improvement in
teaching and learning process
One-Minute Paper
1) What was the most interesting part of the session?
2) What was the most confusing part of the session?
3) Give a score out of 10 for this session:
If it is not 10 out of 10 tell me why not?

You might also like