0% found this document useful (0 votes)
32 views31 pages

Lexical Analyzer

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views31 pages

Lexical Analyzer

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Lexical Analyzer

Using Lex
Lexical Analysis
• First phase of a Compiler
• Also called Scanning
• Scans the character stream of the Source
program
• Groups them into meaningful sequences
– Output: A sequence of token
Role of Lexical Analyzer
Identify Tokens
Remove Whitespace
Install lexme in symbol table
Returns token to parser

Token To semantic
Lexical analysis
Source
Parser
Program Analyzer

getNextToken

Symbol
Table
Lexical Analyzer

Performs those functions

Tokens
Source Program A Program in any
language

do { DO
do Install in Symbol Table
Study; ID
Study
}while (t_CGPA<
3.90)
Lexical Analyzer

• No need to write the code


• Tools that produce the analyzer
– Lex
Lex Tool

Lex source Lex lex.yy.c


lex.l Compiler

C a.out
lex.yy.c Compiler

Source a.out Tokens


program
Lex Tool
Token, Pattern, Lexeme
• Token: Set of strings that represent a
particular construct in source language.
• Pattern: Rules that describe that string set
– It match each string in the set

• Lexeme: sequence of characters that is


matched by a pattern for a token
Example
Token Sample Lexemes Pattern Description

WHILE while while

RELOP <, <=, >, >=, <>, == < or <= or > or >= or
<> or ==
ID count, account, flag2 letter followed by letters
and digits

C comment /* hubi jabi/* aro habi jabi */ anything between /* and


*/

NUM 3.14, 3.2E+5, 5.9E-2 sequence of digits


having fraction and
exponent
Structure of Lex Programs
%{ #include<stdio.h>
// anything here is directly copied to lex.yy.c
int Word_count;
%}
Declarations
// regular definitions

%% // token matching & actions


Transition rules
%%
// any other functions
auxiliary functions/ User Subroutines
Transition rules
• Pattern { Action }

Regular expressions C code


to to
Match the token Do the functions
Regular Expressions
• Specifies a set of strings to match
• One expression for each token pattern
• Some expression
– [ \t\n] //for delimiter
– [ \t\n]+ // for white space
– a(b)* //a followed by zero or more occurrence of b
//a, ab, abb, abbb
Actions
• Specify what to do if a rule matches a token
• Basically C code
• Examples
%%
[a-zA-z] {
printf(“I found a letter”);
}
[0-9] {
printf(“I found a digit”)
}
[ \t\n] {
// actually I do nothing
}
%%
Structure of Lex Programs
%{
#include<stdio.h>

%} int Word_count;

Declarations // regular definitions

%%

[0-9] {
printf(“I found a digit”);
}
%%
auxiliary functions // any other functions
regular definitions
• Give symbolic name to regular expressions
• ( declaration )
• Examples

delim [ \t\n]
ws {delim}+
digit [0-9]
number {digit}+
Complete Lex Source
%{
#include<stdio.h>
int word_count = 0;
%}
delim [ \t\n]
digit [0-9]
%%
{delim}+ { } //no action
{digit}+ { printf(“Here I found a digit”);
word_count++ }
%%
Printf(“Total Count: %d”,word_count);
Assignment
• Write a lexical analyzer for a C program that-
– Ignore white space
– Match all identifiers (keywords, variables etc )
• Insert variables in symbol table
• No need to insert keywords just show it in console

– Match all numbers and insert it in symbol table


– Find all comments
– Find all double quoted strings
– Count line numbers
Assignment
– Variables start with a letter or underscore (_)
• Ex : a, a9bc, _abc but not 8cde.
– Numbers may contain optional fraction or
exponent
• Ex: 3, 3.056, 3.45E5, 3.45E-2, 3E+2
– comments starts with // and ends with a
newline
– Relational operators are =, <>, < , <=, >=, >
• Insert the lexeme in symbol table and print the token RELOP
Assignment
– Addition operators are + - or
– Multiplier operators are * / div mod and
• Insert lexeme and print token ADMULOP
– Other tokens to match
• := // (assignment operator, token name is ASSIGNOP)
• [ , ] , ( , ) , .. // token name is DOTDOT
• ,
• ;,:
Assignment
• Keywords to match
– program
– if
– not
– end Print the corresponding token name and line no of occurring
– begin
– else
– then Token name for parser is keyword name with capital letter
– do
– while
– function
– Procedure
– integer
– real
– var
– oh
– array
– write
Compilation code
• flex example.l // lex source
• g++ lex.yy.c – o example –ll //object file
• ./example <file.txt> <target.txt>
// file.txt contains the source program

You might also like