Lexical and Syntax Analysis

Programming languages

Uploaded by

Christian Madelozo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lexical and Syntax Analysis

Programming languages

Uploaded by

Christian Madelozo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Lexical and Syntax Analysis

Lexical analysis involves reading the source code character from left to right and organizing
them into tokens. It aims to read the input code and break it into meaningful elements called
tokens for a computer to understand easily. It eliminates comments and whitespace within the
source code. A lexical analyzer collects characters into logical groupings and assigns internal
codes to the groupings based on their structure. These logical groupings are called lexemes, and
the internal codes for categories of these groupings are the tokens.

In programming language, tokens can be described using regular expressions. A lexical analyzer
uses a Deterministic Finite Automaton (DFA) to recognize these tokens, as they can identify
regular languages. Each final state of the DFA corresponds to a specific token type, allowing the
analyzer to classify the input. The process of creating a DFA from regular expressions can be
automated to make handling token recognition easier. Specifically, a lexical analyzer works
based on the following processes: • Input Preprocessing: Involves cleaning up the input text and
preparing it for lexical analysis. This covers the removal of comments, whitespaces, and other
non-essential characters from the input text. • Tokenization: Involves the process of breaking
the input text into a sequence of tokens. This is done by matching the characters in the input
text against a set of patterns or regular expressions that define the different types of tokens. •
Token Classification: The analyzer determines the type of each token. For instance, the analyzer
might classify the keywords, identifiers, operators, and punctuation symbols as separate token
types. • Token Validation: The analyzer checks if each token is valid based on the rules of the
programming language. For instance, the analyzer might check that a variable name is a valid
identifier or that an operator has the correct syntax. • Output Generation: The analyzer
generates the output of the lexical analysis process, typically a list or sequence of tokens (token
stream). The list of tokens can then be passed to the next stage of compilation or interpretation,
which will be sent to the parser for syntax analysis.

Tokens can be individual words or symbols in a sentence, such as keywords, variable names,
numbers, and punctuation. Tokens can be specified in different sets: • Alphabets: All the
numbers and alphabets are considered hexadecimal alphabets by language. Strings: The
collection of different alphabets occurring continuously. The string length is defined by the
number of characters or alphabets occurring together. For example, the length of |STIisthebest|
is 12 since there are 12 characters. • Symbols: High-level programming languages contain
special symbols

• Non-tokens: Comments, preprocessor directive, macros, blanks, tabs, newlines. Lexemes are
the sequence of characters matched by a pattern to form the token or a sequence of input
characters that comprises a single token. Additionally, lexemes are recognized by matching the
input character string against character string patterns, while tokens are represented as integer
values. Using this assignment statement as an example: result = oldsum – value / 50;

Using this program as an example: int main(){ // 2 variables int x, y; x = 10; return 0; } There are
18 valid tokens in this program: 'int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';' 'a' '=' '10' ';' 'return' '0' ';' '}'.
Notice how the comments are omitted. Note that everything inside a double quote ("") in
print() statements is counted as a single token. For example, println("Walking is a good
exercise"); has five (5) tokens: 'println' '(' '"Walking is a good exercise"' ')' and ';'

The code snippet below has 27 tokens: int main() { int x = 15, y = 40; printf("sum is:%d", x + y);
return 0; } As mentioned, the output (token stream) generated from lexical analysis will be sent
to the syntax analyzer for syntax analysis.

Parsing Syntax analysis, or parsing, is the process of analyzing a string of symbols according to
the rules of formal grammar. It checks the source code to ensure that it follows the correct
syntax of the programming language it is written. Syntax errors are identified and flagged in this
phase and must be corrected before the program can be successfully compiled. As mentioned,
and as seen in Figure 2, it is the phase after the lexical analysis in the compiling process. A
syntax analyzer or parser takes the token streams from a lexical analyzer and analyzes them
against production rules to detect errors in the code. A parse tree or Abstract Syntax Tree (AST)
is the output of this phase, representing the program’s structure.

A lexical analyzer can identify tokens using regular expressions and pattern rules. Still, it cannot
check the syntax of a given sentence since regular expressions cannot check balancing tokens
such as parenthesis. As a result, syntax analysis uses context-free grammar (CFG) to define the
syntax rules of a programming language. They include production rules that describe how valid
strings (token streams) are formed. CFGs also specify the grammar of a language to ensure that
the source code adheres to the language’s syntax.

The parser accomplishes the following steps: • Parsing: The tokens are analyzed based on the
grammar rules of the programming language. A parse tree or AST is constructed to represent
the hierarchical structure of the program. • Error Handling: If the input program contains syntax
errors, the syntax analyzer detects and flags them to the user, indicating where the error
occurred. • Symbol Table Creation: The syntax analyzer creates a symbol table, a data structure
that stores information about the identifiers used in the program, such as type, scope, and
location. Derivation It is the process of applying the rules of Context-Free Grammar to generate
a sequence of tokens to form a valid structure. Simply, it is a sequence of production rules to get
the input string for the parser. There are two (2) decisions for some sentential form of input
during parsing: o Deciding on the non-terminal to be replaced o Deciding the production rule by
which the non-terminal will be replaced There are two (2) options to use in deciding which non-
terminal to be replaced with the production rule: Left-most and right-most derivation. • It is
called left-most derivation if the sentential form of an input is scanned and replaced from left to
right. Its derived sentential form is called the left-sentential form. • It is called the right-most
derivation if the input is scanned and replaced with production rules. Its derived sentential form
is called the right-sentential form.

Parse Tree It is the graphical representation of a derivation. It is convenient to see how strings
are derived from the start symbol, which becomes the root of the parse tree. In a parse tree, all
leaf nodes are terminals, while all interior nodes are non-terminals. Also, the in-order traversal
gives the original input string. A parse tree represents associativity and precedence of
operators. The deepest sub-tree is traversed first, allowing the operator in that sub-tree to get
precedence over the operator in the parent nodes. For example, the left-most derivation of a +
b * c: E → E * E E → E + E * E E → id + E * E E → id + id * E E → id + id * id

The language generated by ambiguous grammar is inherently ambiguous. No method can

detect and remove ambiguity automatically. Still, it can be removed by either re-writing the
whole grammar without ambiguity or by setting and following associativity and precedence
constraints. These methods decrease the chances of ambiguity in a language or its grammar.

Associativity When an operand has operators on both sides, the side on which the operator
takes this operand is decided by the association of those operators. The operand will be taken
by the left operator if the operation is leftassociative, and the right operator will take the
operand if the operation is right-associative. Left-associative operations include Addition,
Multiplication, Subtraction, and Division. For example: id op id op id will be evaluated as (id op
id) op id Simply, 2 + 3 + 4 will be evaluated as (2 + 3) + 4 Right-associative operations such as
exponentiation will have the following evaluation in the same expression as above. For example:
id op id op id will be evaluated as id op (id op id) Simply, 2 ^ 3 ^ 4 will be evaluated as 2 ^ (3 ^ 4)
Precedence When two (2) different operators share a common operand, the precedence of
operators decides which will take the operand. For example, 2 + 3 * 4 can have two (2) different
parse trees: one for (2 + 3) * 4 and another for 2 + (3 * 4). This can be removed by setting
precedence among operators. As in the previous example, mathematically, Multiplication (*)
has precedence over Addition (+), so the expression 2 + 3 * 4 will always be interpreted as 2 + (3
* 4). In Python, some operators are performed before others. It is called the hierarchy of
priorities.

This table enumerates the operators in order from the highest (1) to lowest (4) priorities

Pipe Stowage Plan Spreadsheet
No ratings yet
Pipe Stowage Plan Spreadsheet
10 pages
1 Lexial Analysis
No ratings yet
1 Lexial Analysis
24 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
Unit 1
No ratings yet
Unit 1
50 pages
PP_LA_SA
No ratings yet
PP_LA_SA
20 pages
SPCC - 5
No ratings yet
SPCC - 5
19 pages
parsing
No ratings yet
parsing
12 pages
Lesson 08 2
No ratings yet
Lesson 08 2
33 pages
Unit 2
No ratings yet
Unit 2
14 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
Cd notes
No ratings yet
Cd notes
194 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
Compiler Design
No ratings yet
Compiler Design
117 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Comp Final
No ratings yet
Comp Final
16 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Unit 5 SP
No ratings yet
Unit 5 SP
28 pages
Lec 03 Syntax Analysis
No ratings yet
Lec 03 Syntax Analysis
19 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Lexical and Syntax Analysis_Updated
No ratings yet
Lexical and Syntax Analysis_Updated
5 pages
CD Unit 1
No ratings yet
CD Unit 1
54 pages
calculator
No ratings yet
calculator
3 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
SEN 317 Lecture 5
No ratings yet
SEN 317 Lecture 5
18 pages
2014-CD Ch-03 SAn
No ratings yet
2014-CD Ch-03 SAn
21 pages
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
CC-ll
No ratings yet
CC-ll
15 pages
Chapter-2
No ratings yet
Chapter-2
41 pages
Compiler Unit 2 (1)
No ratings yet
Compiler Unit 2 (1)
78 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
7.CD Lab Manual
No ratings yet
7.CD Lab Manual
35 pages
BC200405108
No ratings yet
BC200405108
5 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
cd UNIT-1
No ratings yet
cd UNIT-1
60 pages
Compiler Construction CSEC325 Token
No ratings yet
Compiler Construction CSEC325 Token
2 pages
CD Unit - 2
100% (1)
CD Unit - 2
148 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Linker
No ratings yet
Linker
10 pages
SS Unit 4
No ratings yet
SS Unit 4
29 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
Lexical Analyzer (Compiler Contruction)
100% (1)
Lexical Analyzer (Compiler Contruction)
6 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
CD IMP QUES 1
No ratings yet
CD IMP QUES 1
18 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
Compiler Constructer
No ratings yet
Compiler Constructer
17 pages
Compiler Theory: (A Simple Syntax-Directed Translator)
No ratings yet
Compiler Theory: (A Simple Syntax-Directed Translator)
50 pages
Compiler Handout
No ratings yet
Compiler Handout
48 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
SPCCPDF
No ratings yet
SPCCPDF
83 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
GAD Notes
No ratings yet
GAD Notes
33 pages
Don't Overprep Coding Interviews.: For Your
No ratings yet
Don't Overprep Coding Interviews.: For Your
24 pages
IOS Interview Questions for MNC’s (Infosys,Cognizent , HCL, Wipro) _ by Randhir Raj _ Medium
No ratings yet
IOS Interview Questions for MNC’s (Infosys,Cognizent , HCL, Wipro) _ by Randhir Raj _ Medium
14 pages
12 Task Performance 1final Exam ARG TANTAY
No ratings yet
12 Task Performance 1final Exam ARG TANTAY
4 pages
Choose The Most Appropriate Answer
No ratings yet
Choose The Most Appropriate Answer
5 pages
Search Algorithms in AI
No ratings yet
Search Algorithms in AI
7 pages
Mathematical Induction & Recursion: CS 441 Discrete Mathematics For CS
No ratings yet
Mathematical Induction & Recursion: CS 441 Discrete Mathematics For CS
11 pages
COMPLETE-INTERVIEW-GUIDE-2025-15-05-06-39-53
No ratings yet
COMPLETE-INTERVIEW-GUIDE-2025-15-05-06-39-53
85 pages
OmniStudio Developer Salesforce Exam Practice Questions
No ratings yet
OmniStudio Developer Salesforce Exam Practice Questions
12 pages
SOLID: The First 5 Principles of Object Oriented Design - DigitalOcean
No ratings yet
SOLID: The First 5 Principles of Object Oriented Design - DigitalOcean
25 pages
Computer Vision Is A Field of Artificial Intelligence
No ratings yet
Computer Vision Is A Field of Artificial Intelligence
2 pages
Intel Carry Less Multiplication Whitepaper
No ratings yet
Intel Carry Less Multiplication Whitepaper
84 pages
Practical 07
No ratings yet
Practical 07
3 pages
Readme
No ratings yet
Readme
9 pages
"Quo Teinp Ut" "Color Outpu T" "Col Orin Put" "Fontf Amilyi Nput" "Feed Backo Utput" "Quo Teou Tput" "Font Sizei Nput"
No ratings yet
"Quo Teinp Ut" "Color Outpu T" "Col Orin Put" "Fontf Amilyi Nput" "Feed Backo Utput" "Quo Teou Tput" "Font Sizei Nput"
3 pages
(DS) (APY Material)
No ratings yet
(DS) (APY Material)
3 pages
Assignment 2 - Group Assignment
No ratings yet
Assignment 2 - Group Assignment
6 pages
SPSS Word Rike
No ratings yet
SPSS Word Rike
8 pages
Experiment-5: AIM: Write A Program To Implement The Tic-Tac-Toe Game Problem. Theory
No ratings yet
Experiment-5: AIM: Write A Program To Implement The Tic-Tac-Toe Game Problem. Theory
5 pages
Unit 5
No ratings yet
Unit 5
6 pages
Semester-8 MCA Integrated IIPS DAVV Syllabus
No ratings yet
Semester-8 MCA Integrated IIPS DAVV Syllabus
15 pages
ARM-UNIT3 and UNIT4 Question Bank
No ratings yet
ARM-UNIT3 and UNIT4 Question Bank
3 pages
STA03B3 Lecture 3
No ratings yet
STA03B3 Lecture 3
27 pages
College information management system.
No ratings yet
College information management system.
26 pages
Geethanjali College of Engineering and Technology: Ugc Autonomous
No ratings yet
Geethanjali College of Engineering and Technology: Ugc Autonomous
106 pages
Embedded Systems Midterm
No ratings yet
Embedded Systems Midterm
2 pages
Signatures - DSPy
No ratings yet
Signatures - DSPy
5 pages
(English) Advanced CPU Designs - Crash Course Computer Science #9 (DownSub - Com)
No ratings yet
(English) Advanced CPU Designs - Crash Course Computer Science #9 (DownSub - Com)
10 pages
Python Examples
No ratings yet
Python Examples
13 pages

Lexical and Syntax Analysis

Uploaded by

Lexical and Syntax Analysis

Uploaded by

Lexical and Syntax Analysis

The language generated by ambiguous grammar is inherently ambiguous. No method can

You might also like