Compiler Construction Lec 1b

The document discusses the role of lexical analysis in compiler construction, detailing its purpose of tokenization, whitespace removal, and error detection. It explains the components involved, such as the lexer, tokens, and symbol table, as well as challenges like ambiguity and efficiency. Additionally, it covers lexical errors and their handling, providing examples of common errors encountered during the lexical analysis phase.

Uploaded by

usamajaved425

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views37 pages

Compiler Construction Lec 1b

Uploaded by

usamajaved425

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Compiler Construction

Lecture 1b
Instructor: Alisha Farman
LEXICAL ANALYZER
Lexical analysis is the first phase of the compiler design process. It
involves converting a sequence of characters from the source code into a
sequence of tokens.
Purpose of Lexical Analysis
1. Tokenization: The primary goal is to break down the source
code into tokens. Tokens are the smallest units of meaning,
such as keywords, identifiers, operators, and symbols.
2. Removing Whitespace and Comments: The lexical analyzer
(or lexer) strips out unnecessary whitespace and comments,
which are not needed for further analysis.
3. Error Detection: It identifies and reports lexical errors, such
as illegal characters or malformed tokens.
Components of Lexical Analysis
Lexer: The lexer processes the input buffer to recognize and extract
tokens. It uses patterns to identify different types of tokens.
Token: A token is a categorized unit of the source code. Each token
has a type (e.g., keyword, identifier) and a value (e.g., the actual
string of the identifier).
Symbol Table: A symbol table is used to keep track of identifiers and
their attributes (e.g., data type, scope).
Input buffer
An input buffer is a storage area used by the compiler's lexical
analyzer to read source code efficiently.
The lexical analyzer scans the input from left to right one character
at a time. It uses two pointers begin ptr(bp) and forward ptr(fp) to
keep track of the pointer of the input scanned.
One Buffer Scheme: In this scheme, only one buffer is used to store the input string but the problem with
this scheme is that if lexeme is very long then it crosses the buffer boundary, to scan rest of the lexeme the
buffer has to be refilled, that makes overwriting the first of lexeme.
Two Buffer Scheme: To overcome the problem of one buffer scheme, in this method
two buffers are used to store the input string. the first buffer and second buffer are
scanned alternately. when end of current buffer is reached the other buffer is filled.
Buffering with EOF

● When EOF is encountered, the lexer stops processing further input.

● In double buffering, EOF can be used as a sentinel character to
indicate the end of the input.
● This eof character introduced at the end is calling Sentinel which is
used to identify the end of buffer.
int x = 10; int y = 20; // (long line, buffer is full)
Challenges in Lexical Analysis
1. Ambiguity: Tokens must be unambiguously defined to avoid
confusion. For example, distinguishing between an identifier
and a keyword.
2. Efficiency: The lexer should efficiently handle large inputs and
complex token patterns.
3. Error Handling: Properly identifying and reporting errors while
continuing the analysis process.
C tokens are of this types. They are, (For All
Languages)
1. Integer(Identifier ID)(a—z)(A----Z)
2. Assignment operator(=,+=,-=,*=)
3. Arithmetic operator(*,+,/,%)
4. Integer constant(O----9DECI)
5. Relational operator(==,<,>,<=,>,=<,!=)
6. Semicolon(;) seperator
7. Float constant(66.78)
8. Strings(“alisha”)
9. keyword(int, float, main…..)
LEXICAL ANALYZER
• Lexical Analyzer is First Phase Of Compiler.
• Input to Lexical Analyzer is “Source Code“
• Lexical Analysis Identifies Different Lexical Units in a Source Code.
• Different Lexical Classes or Tokens or Lexemes
– Identifiers
– Constants
– Keywords
– Operators
• Example : sum = num1 + num2 ;
So Lexical Analyzer Will Produce following Symbol Table

Token Type
Sum Id

= op

Num1 id

+ op

Num2 id

; Separator
Example 1
int main() {
int x = 10;
printf("Hello, world!");
}
int (Keyword)
main (Identifier)
( (Left Parenthesis)
) (Right Parenthesis)
{ (Left Brace)
int (Keyword)
x (Identifier)
= (Assignment Operator)
10 (Integer Literal)
; (Semicolon)
printf (Identifier)
( (Left Parenthesis)
"Hello, world!" (String Literal)
) (Right Parenthesis)
; (Semicolon)
} (Right Brace)
Example:
#include<stdio.h>
void main()
{
int a,b,c;
printf(“Enter “);
while(a<b)
a=b+c;
c=a;
}
TASK 1:CREATE TOKEN OF THE CODE
main()
{
int a=“abcxyz=55”;
if(a>b)
a+=b;
//c-variable//
}
TASK 2:CREATE TOKEN OF THE CODE
int main()
{
int x, y, total;
x = 10, y = 20;
total = x + y;
Printf (“Total = %d \n”, total);
}
LEXICAL ERROR
Lexical errors occur during the lexical analysis phase of
compilation when the lexer encounters a sequence of characters
that do not match any valid token in the programming language.
These errors typically arise from typos, incorrect usage of
symbols, or invalid characters.
How Error Tokens Are Handled
1. Discarding the Invalid Token
○ The lexer detects an invalid token and discards it, possibly issuing a
warning or error message.
2. Replacing with a Special Token
○ Some compilers generate a special error token (ERROR_TOKEN) to
handle the issue in later stages.
3. Attempting Recovery
○ The compiler might attempt to recover by suggesting corrections or
ignoring minor mistakes.
int 3var = 10;
Example 1: Invalid Character
int main() {
int a = 5;
@a = a + 1; // Invalid character '@'
return 0;
}
Error Description: The '@' character is not recognized as a valid token in C,
leading to a lexical error.
Example 2: Unrecognized Token
int main() {
int a = 5;
int b = a $ 3; // Unrecognized token '$'
return 0;
}

Error Description: The '$' symbol is not a valid operator in C, causing a lexical error.
Example 3: Incomplete String Literal
int main() {
char* str = "Hello; // Incomplete string literal
return 0;
}
Error Description: The string literal is not properly closed with a double quote,
leading to a lexical error.
Example 4: Invalid Number Format
int main() {
int num = 123abc; // Invalid number format
return 0;
}
Error Description: The sequence '123abc' is not a valid integer literal in C,
resulting in a lexical error.
Example 5: Unmatched Comment Delimiter
int main() {
int a = 5;
/* This is a comment
return 0;
}
Error Description: The comment is not properly closed with '*/', causing the
lexer to throw a lexical error.
Example 6: Use of Reserved Keyword
int main() {
int for = 10; // 'for' is a reserved keyword
return 0;
}
Error Description: The keyword 'for' is reserved for loop constructs and cannot be
used as an identifier, leading to a lexical error.
Handling Lexical Errors
Lexical analyzers typically handle these errors by:
● Reporting the Error: Providing a meaningful error message indicating
the type of error and its location in the source code.
● Skipping the Invalid Token: Moving past the invalid token to continue
scanning the rest of the source code.
Example:
int main() {

int a = 5;

inttt b = 10; // Invalid token 'inttt'

return 0;

}
Total Count

● Valid Tokens: 18
● Invalid Tokens: 1
Example:
int main() {
int a = 5;
@a = a + 1; // Invalid character '@'
return 0;
}

Total Count
● Valid Tokens: 19
● Invalid Tokens: 1

Itt459 Individual Assignment
No ratings yet
Itt459 Individual Assignment
28 pages
Compiler Construction Lec 1b
No ratings yet
Compiler Construction Lec 1b
27 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Compiler Group 4
No ratings yet
Compiler Group 4
28 pages
HW 31712
No ratings yet
HW 31712
22 pages
Lexical Analysis
No ratings yet
Lexical Analysis
35 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
L4 - Lexical Analysis (Introduction)
No ratings yet
L4 - Lexical Analysis (Introduction)
11 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
@CD - ch2 Compiler Design
No ratings yet
@CD - ch2 Compiler Design
26 pages
CD Notes
No ratings yet
CD Notes
7 pages
Unit 2
No ratings yet
Unit 2
14 pages
BC200405108
No ratings yet
BC200405108
5 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
18 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
CS606 Assignment 1
No ratings yet
CS606 Assignment 1
4 pages
Lecture 04 05 PDF
No ratings yet
Lecture 04 05 PDF
8 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
CS606 1
No ratings yet
CS606 1
3 pages
Lexical Analysis
No ratings yet
Lexical Analysis
14 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Comp Final
No ratings yet
Comp Final
16 pages
CD Unit I Part II Lexical Analysis
No ratings yet
CD Unit I Part II Lexical Analysis
58 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
CD UNIT-1
No ratings yet
CD UNIT-1
60 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Lexical Analysis in Compiler Design With Example
No ratings yet
Lexical Analysis in Compiler Design With Example
8 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
68 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
56 pages
Compilers and Translators Assignment
No ratings yet
Compilers and Translators Assignment
3 pages
Pcdunit2 Class
No ratings yet
Pcdunit2 Class
21 pages
Compiler - Design - Module2-Print
No ratings yet
Compiler - Design - Module2-Print
16 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
76 pages
Lexical Analysis Examples
No ratings yet
Lexical Analysis Examples
11 pages
Compiler Easy Notes - Hamza Zahoor
No ratings yet
Compiler Easy Notes - Hamza Zahoor
37 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Lexical Analysis - Compiler Design: Token, Pattern and Lexeme
No ratings yet
Lexical Analysis - Compiler Design: Token, Pattern and Lexeme
5 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
2 Lexical Analyzer
No ratings yet
2 Lexical Analyzer
21 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Learning Materials, CD, Unit-2 (Lexical Analysis)
No ratings yet
Learning Materials, CD, Unit-2 (Lexical Analysis)
13 pages
Lexical Analyzer: Design and Implementation With LEX Tool
No ratings yet
Lexical Analyzer: Design and Implementation With LEX Tool
13 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
2.1lexical Analysis
No ratings yet
2.1lexical Analysis
29 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Python ch4
No ratings yet
Python ch4
23 pages
Gree Vireo Gen3 Submittal 9mbh 230v A
No ratings yet
Gree Vireo Gen3 Submittal 9mbh 230v A
6 pages
Aegis El RG 4m El RG 4k Manual 21-03-31
No ratings yet
Aegis El RG 4m El RG 4k Manual 21-03-31
8 pages
ASME B31 1 2016 Power Piping ASME Code For Pressure Piping B31 1st Edition Asme - Get The Ebook in PDF Format For A Complete Experience
No ratings yet
ASME B31 1 2016 Power Piping ASME Code For Pressure Piping B31 1st Edition Asme - Get The Ebook in PDF Format For A Complete Experience
41 pages
Roller Chains Catalogue en Kettenwulf
No ratings yet
Roller Chains Catalogue en Kettenwulf
146 pages
IRCTC Retiring Room
No ratings yet
IRCTC Retiring Room
1 page
500.electronics For Computing 1 Assignment J
No ratings yet
500.electronics For Computing 1 Assignment J
4 pages
EEPROM Cross Reference (In Detail)
No ratings yet
EEPROM Cross Reference (In Detail)
11 pages
Iot 220112132928
No ratings yet
Iot 220112132928
31 pages
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
No ratings yet
The Pinch Library and Community Center - John Lin + Olivier Ottevaere - ArchDaily
14 pages
Skyjack 4740 Parts Manual
No ratings yet
Skyjack 4740 Parts Manual
148 pages
Kolom Distilasi Tinjauan Umum
No ratings yet
Kolom Distilasi Tinjauan Umum
22 pages
A Queueing Model With Server Breakdowns Repairs Va
No ratings yet
A Queueing Model With Server Breakdowns Repairs Va
13 pages
Taoufik Hachi Mi
No ratings yet
Taoufik Hachi Mi
11 pages
DGTL-BRKACI-2690-How To Extend Your ACI Fabric To Public Cloud
No ratings yet
DGTL-BRKACI-2690-How To Extend Your ACI Fabric To Public Cloud
79 pages
Evaluation of Information Systems
No ratings yet
Evaluation of Information Systems
29 pages
Major Final Project - BW
No ratings yet
Major Final Project - BW
80 pages
Handwriting Recognition Software
No ratings yet
Handwriting Recognition Software
10 pages
Tips For Managing Virtual Teams 24 03 PDF
No ratings yet
Tips For Managing Virtual Teams 24 03 PDF
1 page
The Design of The Center of Pressure Apparatus With Demonstration
No ratings yet
The Design of The Center of Pressure Apparatus With Demonstration
22 pages
Data Quality Model
No ratings yet
Data Quality Model
107 pages
European Catalog Solenoid Valve Flow Data Asco en 6867432
No ratings yet
European Catalog Solenoid Valve Flow Data Asco en 6867432
8 pages
One
No ratings yet
One
41 pages
Chapter 5 Conduction Shape Factor
No ratings yet
Chapter 5 Conduction Shape Factor
10 pages
Specifications Guide Electric Range EN
No ratings yet
Specifications Guide Electric Range EN
2 pages
S21 Sorting Algorithms Activity-1
No ratings yet
S21 Sorting Algorithms Activity-1
2 pages
Prepare Installer Lesson
No ratings yet
Prepare Installer Lesson
25 pages
The Virtual File System (VFS)
No ratings yet
The Virtual File System (VFS)
60 pages
Project 2
No ratings yet
Project 2
4 pages

Compiler Construction Lec 1b

Uploaded by

Compiler Construction Lec 1b

Uploaded by

Compiler Construction

● When EOF is encountered, the lexer stops processing further input.

inttt b = 10; // Invalid token 'inttt'

You might also like