0% found this document useful (0 votes)
116 views22 pages

Lexical Analyzer: Using Flex by Dr. S. M. Farhad

The document describes a lexical analyzer and how it works with Flex. It discusses that a lexical analyzer is the first phase of a compiler that scans the character stream and groups them into meaningful tokens. It outputs a sequence of tokens. Flex is a tool that can generate a lexical analyzer from a file with regular expressions rules. The document provides examples of token patterns, actions, and the overall structure of a lexical analysis program in Flex.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views22 pages

Lexical Analyzer: Using Flex by Dr. S. M. Farhad

The document describes a lexical analyzer and how it works with Flex. It discusses that a lexical analyzer is the first phase of a compiler that scans the character stream and groups them into meaningful tokens. It outputs a sequence of tokens. Flex is a tool that can generate a lexical analyzer from a file with regular expressions rules. The document provides examples of token patterns, actions, and the overall structure of a lexical analysis program in Flex.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

Lexical Analyzer

Using Flex
By
Dr. S. M. Farhad
Lexical Analysis
• First phase of a Compiler
• Also called Scanning
• Scans the character stream of the Source
program
• Groups them into meaningful sequences
– Output: A sequence of token
Role of Lexical Analyzer
Identify Tokens
Remove Whitespace
Install lexme in symbol table
Returns token to parser

Token To semantic
Lexical analysis
Source
Parser
Program Analyzer

getNextToken

Symbol
Table
Lexical Analyzer

Performs those functions

Tokens
Source Program A Program in any
language

do {
Install in Symbol Table
Study;

}while (t_CGPA<
3.90)
Lexical Analyzer

• No need to write the code


• Tools that produce the analyzer
– Lex
Lex Tool

Lex source Lex lex.yy.c


lex.l Compiler

C a.out
lex.yy.c Compiler

Source a.out Tokens


program
Token, Pattern, Lexeme
• Token: Set of strings that represent a
particular construct in source language

• Pattern: Rules that describe that string set


– It matches each string in the set

• Lexeme: sequence of characters that is


matched by a pattern for a token
Example
Token Sample Lexemes Pattern Description

WHILE while while

RELOP <, <=, >, >=, <>, == < or <= or > or >= or
<> or ==

ID count, account, flag2 letter followed by letters


and digits

C comment /* hubi jabi/* aro habi jabi */ anything between /* and


*/

NUM 3.14, 3.2E+5, 5.9E-2 sequence of digits


having fraction and
exponent
Structure of Lex Programs
%{ #include<stdio.h>
// anything here is directly copied to lex.yy.c
int Word_count;
%}
Declarations // regular definitions
%%
Transition rules // token matching & actions
%%
auxiliary functions // any other functions
Transition rules
• Pattern { Action }

Regular expressions C code


to to
Match the token Do the functions
Regular Expressions
• Specifies a set of strings to match
• One expression for each token pattern
• Some expression
– [ \t\n] //for delimiter
– [ \t\n]+ // for white space
– a(b)* //a followed by zero or more occurrence of b
//a, ab, abb, abbb
Actions
• Specify what to do if a rule matches a token
• Basically C code
• Examples
%%
[a-zA-z] {
printf(“I found a letter”);
}
[0-9] {
printf(“I found a digit”)
}
[ \t\n] {
// actually I do nothing
}
%%
Structure of Lex Programs
%{
#include<stdio.h>

%} int Word_count;

Declarations // regular definitions

%%

[0-9] {
printf(“I found a digit”);
}
%%
auxiliary functions // any other functions
Regular Definitions
• Give symbolic name to regular expressions
• Examples

delim [ \t\n]
ws {delim}+
digit [0-9]
number {digit}+
Complete Lex Source
%{
#include<stdio.h>
int word_count = 0;
%}
delim [ \t\n]
digit [0-9]
%%
{delim}+ { } //no action
{digit}+ { printf(“Here I found a digit”);
word_count++ }
%%
Printf(“Total Count: %d”,word_count);
Assignment
• Write a lexical analyzer for a subset of Pascal.
– Ignore white space
– Match all identifiers (keywords, variables etc )
• Insert variables in symbol table
• No need to insert keywords just show it in console

– Match all numbers and insert it in symbol table


– Find all comments
– Find all double quoted strings
– Count line numbers
Assignment
– Variables start with a letter or underscore (_)
• Ex : a, a9bc, _abc but not 8cde.
– Numbers may contain optional fraction or
exponent
• Ex: 3, 3.056, 3.45E5, 3.45E-2, 3E+2
– Comments are anything between { }, they
may not contain a { and appear after any
token
– Relational operators are =, <>, < , <=, >=, >
• Insert the lexeme in symbol table and print the token RELOP
Assignment
– Addition operators are + - or
– Multiplier operators are * / div mod and
• Insert lexeme and print token MULOP
– Other tokens to match
• := // (assignment operator, token name is ASSIGNOP)
• [ , ] , ( , ) , .. // token name is DOTDOT
• ,
• ;,:
Assignment
• Keywords to match
– program
– if
– not
– end Print the corresponding token name and line no of occurring
– begin
– else
– then Token name for parser is keyword name with capital letter
– do
– while
– function
– Procedure
– integer
– real
– var
– oh
– array
– write
Additional Requirement

• Identify Multiple line comments in C


– /* abrabrabr*/
– /*abrabrabr/*abrabrabr*****abr**/
Compilation code
flex -t sample.l >sample.c
g++ -c -o sample.o sample.c
g++ -o samp sample.o -ll
./samp <in.txt>out.txt

You might also like