0% found this document useful (0 votes)
8 views10 pages

SEN 317 Lecture 3

Syntax Analysis, or Parsing, is a crucial phase in compiler construction that verifies the syntax of source code by checking it against formal grammar rules and producing a parse tree. It plays a vital role in ensuring syntactical correctness, error detection, and serves as a bridge to semantic analysis. The document also discusses various types of grammars, their applications, and the importance of regular expressions in tokenization during lexical analysis.

Uploaded by

stargazeboi14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

SEN 317 Lecture 3

Syntax Analysis, or Parsing, is a crucial phase in compiler construction that verifies the syntax of source code by checking it against formal grammar rules and producing a parse tree. It plays a vital role in ensuring syntactical correctness, error detection, and serves as a bridge to semantic analysis. The document also discusses various types of grammars, their applications, and the importance of regular expressions in tokenization during lexical analysis.

Uploaded by

stargazeboi14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Lecture 3

Introduction to Syntax Analysis

Compiler Construction- NUN 2024 Austin Olom Ogar


Introduction to Syntax Analysis
Definition:
• Syntax Analysis, also known as Parsing, is the process by which a compiler verifies the syntax of
a programming language’s source code. It checks whether the input string of symbols (typically
tokens generated during lexical analysis) conforms to the formal grammar rules of the
language.
• The syntax analyzer takes tokens from the lexical analyzer as input and produces a parse tree
or syntax tree as output, representing the syntactic structure of the input according to a formal
grammar.. .
Goals of Syntax Analysis:
•To validate the syntactical correctness of a program by ensuring it follows the defined grammar
rules.
•To construct a parse tree that visually represents the program structure, which aids in further
stages of compilation such as semantic analysis and code generation..
Example:
If we have a mathematical expression like 3 + (5 * 2), the syntax analyzer will check if the
arrangement of tokens 3, +, (, 5, *, 2, and ) follows the rules of arithmetic expression grammar.
Role of Syntax Analysis in Compiler Design
Role and Importance:
• Syntax Analysis is the second phase of compilation, following Lexical Analysis. Its main role is to interpret the
syntactic structure of the program to ensure it is meaningful and adheres to language rules.
• Syntax Analysis bridges the gap between the lexically analyzed tokens and the semantic meaning of the code by
structuring them into a coherent parse tree.
• The parser’s output is used by the subsequent phases, particularly Semantic Analysis, which checks for
meaningfulness beyond the syntax and applies the semantic rules of the language..
Main Functions:
• Grammar Enforcement: Syntax Analysis checks that the source code follows language-specific grammatical rules.
• Error Detection and Recovery: It can detect syntax errors (such as missing parentheses or unclosed brackets) early in the
.
compilation process and, where possible, apply error recovery techniques to continue parsing.
• Generation of Parse Tree: By organizing tokens into a hierarchical structure, the parser makes it easier to handle complex
expressions and statements, which simplifies tasks in subsequent compilation stages.

Types of Parsers::
• There are two main types of parsers used in compiler design: Top-Down Parsers (e.g., Recursive Descent) and Bottom-Up Parsers
(e.g., LR Parsers). Each type has specific advantages depending on the structure of the grammar it is handling.Data Movement:
Load (LD), Store (ST).
Difference between Syntax Analysis and Lexical Analysis
Purpose:
• Lexical Analysis: Transforms the input character stream into tokens (the smallest meaningful units) by
identifying keywords, operators, identifiers, etc.
• Syntax Analysis: Takes the stream of tokens produced by the lexical analyzer and arranges them into a
hierarchical structure (parse tree) according to the language’s grammar.
Output:
• Lexical Analysis: Produces a series of tokens (e.g., IDENTIFIER, NUMBER, PLUS).
• Syntax Analysis: Produces a parse tree or syntax tree that represents the syntactic structure of the
source code..
.
Error Detection:
• Lexical Analysis: Primarily detects errors in token structure, such as invalid characters or unrecognized
symbols.
• Syntax Analysis: Detects errors related to the sequence and structure of tokens, such as misplaced
operators or unbalanced parentheses.

Example of a Flow in Compilation:


• If the source code is int x = 5 + ;, the lexical analyzer would identify the tokens as int, x, =, 5, +, ;.
• The syntax analyzer would then detect an error in the syntax, as the + operator lacks an operand.
Grammar and Language Theory
Grammar in Compiler Design:
• Grammar in compiler design is a formal framework that specifies the structure of the strings in a given
language. It defines the syntax rules for arranging symbols, operators, and keywords within a
programming language.
• A grammar is often represented as a set of production rules, which are used by a parser to analyze the
syntactical structure of source code and generate a parse tree, reflecting the hierarchical organization
of the language.
Components of Grammar:

.
• Non-Terminal Symbols: Abstract symbols representing groups or patterns in the language (e.g.,
expressions, statements).
• Terminal Symbols: The basic symbols or tokens of the language, like keywords and operators.
• Production Rules: Define how non-terminal symbols can be replaced by groups of terminal and/or
non-terminal symbols.
• Start Symbol: A special non-terminal symbol from which the production process begins.
Example of a Flow in Compilation:
• In a simple arithmetic language, a grammar might define an expression (E) that can consist of a term
(T), plus another expression, or just a term, with production rules such as:
• E→E+T
• E→T
• T → integer
Types of Grammar (Regular, Context-Free, Context-Sensitive, Unrestricted)
According to the Chomsky Hierarchy, grammars are classified into four main types, each with increasing levels of
expressiveness:
Type 3: Regular Grammar:
• The simplest type, used to define regular languages. Its production rules are of the form A → aB or A → a, where A and B
are non-terminal symbols and a is a terminal symbol.
• Regular grammar can be represented by finite automata and is often used in lexical analysis to define token patterns..
Type 2: Context-Free Grammar (CFG):
• In CFG, each production rule has a single non-terminal symbol on the left-hand side (e.g., A → γ, where A is a non-
terminal, and γ is a string of terminals and/or non-terminals).
.
• CFGs are powerful enough to describe most programming language constructs and are typically used for syntax analysis
(parsing)..
Type 1: Context-Sensitive Grammar:
• These grammars allow more complex production rules, where the left-hand side can contain multiple symbols, and the
right side may vary based on the surrounding context. Productions follow the form αAβ → αγβ, where A is a non-
terminal, and α and β can be empty or non-empty strings of terminals/non-terminals.
• Context-sensitive grammars are less restrictive and are used to model context-dependent structures.

Type 0: Unrestricted Grammar::


• The most general form of grammar, unrestricted grammars have no constraints on production rules. They are as
powerful as Turing machines and can represent any computable function.
• Type-0 grammars are rarely used in practical compiler design due to their complexity.
Type 3 Grammar: Overview and Characteristics
In the Chomsky Hierarchy, Type 3 grammars represent the simplest form of formal grammar, also known as
Regular Grammar. This type of grammar is used to define regular languages, which can be processed by finite
automata. Regular grammars are fundamental in automata theory and computational linguistics, particularly in
applications where simple, linear patterns must be recognized, such as in lexical analysis within compilers.:
Characteristics of Type 3 Grammar:
Grammar Structure:

• 𝐴→𝑎𝐵A→aB (where 𝐴A and 𝐵B are non-terminal symbols and 𝑎a is a terminal), or


• Rules in Type 3 grammar follow a very restricted format. Each rule must either take the form:

• 𝐴→𝑎A→a, where 𝐴A is a non-terminal and 𝑎a is a terminal.


.
• This format ensures that productions proceed in a linear and non-nested manner, making them simpler than other types
of grammar.
Direction of Production:

• Right-linear grammar: All productions are of the form 𝐴→𝑥𝐵A→xB or 𝐴→𝑥A→x, where non-terminals appear to
• Type 3 grammars can either be right-linear or left-linear:

• Left-linear grammar: All productions are of the form 𝐴→𝐵𝑥A→Bx or 𝐴→𝑥A→x, where non-terminals appear to
the right.

the left.
• However, left-linear grammars are less commonly used since they can often be converted to right-linear grammars, which
align more naturally with finite state machines.
Type 3 Grammar: Overview and Characteristics cont.
Limitations:
• Regular grammars cannot handle nested structures or recursive patterns like parenthesis matching or
palindromes. For example, a Type 3 grammar cannot express languages where symbols must be balanced or
deeply embedded.
Applications in Computing:
• Type 3 grammars and regular languages are widely used in:
• Lexical Analysis: Recognizing tokens (keywords, identifiers, literals) in programming languages.
• Pattern Matching: Used extensively in tools like regular expressions for searching and validating strings.
.
• Finite Automata Design: Since regular languages can be represented by regular expression,
deterministic or non-deterministic finite automata (DFA or NFA), they are essential in designing efficient
parsers and scanners in compilers.

Examples of Type 3 Grammar:


•A typical regular language could be represented by a grammar like:
• S→aS or S→bS or S→a or S→b
•This grammar generates strings consisting of any combination of "a"s and "b"s, like "ab", "aabb", or "bb".
Introduction to Regular Expressions (Regex)
A Regular Expression (often abbreviated as regex or regexp) is a powerful tool used to define search patterns in text. It allows
users to find, match, and manipulate specific text strings efficiently within a larger text body. Regular expressions are widely
applied in data validation, text processing, lexical analysis in compilers, search engines, and text editors.
Components of Regular Expressions:
• Regular expressions use various characters to define patterns:
• Literals: Characters or strings that match themselves, like abc matching "abc".
• Meta-characters: Special characters with specific functions, such as:
• . (dot) matches any character except a newline.
• * (asterisk) matches zero or more occurrences of the preceding element.

.
• + (plus) matches one or more occurrences of the preceding element.
• ? matches zero or one occurrence.
• | represents an OR operator, such as (cat|dog) to match "cat" or "dog".
• Anchors: Indicate positions in a string.
• ^ matches the start of a string.
• $ matches the end of a string.
• Character Classes: Define sets of characters to match.
• [abc] matches any of "a", "b", or "c".
• \d matches any digit, \w matches any word character, and \s matches whitespace.
• Quantifiers: Specify the number of times an element should appear.
• {n} exactly n occurrences.
• {n,} at least n occurrences.
• {n,m} between n and m occurrences.
Introduction to Regular Expressions (Regex) cont.
Applications of Regular Expressions
• Text Search and Replacement: Regular expressions allow for advanced search patterns and replacements in editors and
command-line tools.
• Data Validation: Common in form input validation for email, phone numbers, or zip codes.
• Tokenization in Programming: Lexical analyzers in compilers use regex to categorize tokens in code.

Examples of Regular Expressions for a Compiler Design (Tokenization):


• Identifiers: In many programming languages, identifiers consist of letters followed by letters or digits.
• Regex: [a-zA-Z][a-zA-Z0-9]*

.
• This matches strings starting with a letter and followed by any combination of letters and digits.
• Keywords: Reserved words, such as "if" or "while," can be matched as exact patterns.
• Regex for "if": \bif\b
• The \b denotes word boundaries, ensuring that "if" is matched as a whole word and not as a part of another word.
• Integer Constants: In languages like C, integers consist of sequences of digits.
• Regex: [0-9]+
• This matches one or more digits.
• Floating-point Numbers: Regular expressions for floating-point numbers recognize decimal patterns.
• Regex: [0-9]+\.[0-9]+
• This matches numbers with a decimal point, such as "123.45".
• Operators: Many operators are single characters, like +, -, *, or =.
• Regex for operators: [+\-*/=]
• This matches any single character operator.

You might also like