SEN 317 Lecture 3
SEN 317 Lecture 3
Types of Parsers::
• There are two main types of parsers used in compiler design: Top-Down Parsers (e.g., Recursive Descent) and Bottom-Up Parsers
(e.g., LR Parsers). Each type has specific advantages depending on the structure of the grammar it is handling.Data Movement:
Load (LD), Store (ST).
Difference between Syntax Analysis and Lexical Analysis
Purpose:
• Lexical Analysis: Transforms the input character stream into tokens (the smallest meaningful units) by
identifying keywords, operators, identifiers, etc.
• Syntax Analysis: Takes the stream of tokens produced by the lexical analyzer and arranges them into a
hierarchical structure (parse tree) according to the language’s grammar.
Output:
• Lexical Analysis: Produces a series of tokens (e.g., IDENTIFIER, NUMBER, PLUS).
• Syntax Analysis: Produces a parse tree or syntax tree that represents the syntactic structure of the
source code..
.
Error Detection:
• Lexical Analysis: Primarily detects errors in token structure, such as invalid characters or unrecognized
symbols.
• Syntax Analysis: Detects errors related to the sequence and structure of tokens, such as misplaced
operators or unbalanced parentheses.
.
• Non-Terminal Symbols: Abstract symbols representing groups or patterns in the language (e.g.,
expressions, statements).
• Terminal Symbols: The basic symbols or tokens of the language, like keywords and operators.
• Production Rules: Define how non-terminal symbols can be replaced by groups of terminal and/or
non-terminal symbols.
• Start Symbol: A special non-terminal symbol from which the production process begins.
Example of a Flow in Compilation:
• In a simple arithmetic language, a grammar might define an expression (E) that can consist of a term
(T), plus another expression, or just a term, with production rules such as:
• E→E+T
• E→T
• T → integer
Types of Grammar (Regular, Context-Free, Context-Sensitive, Unrestricted)
According to the Chomsky Hierarchy, grammars are classified into four main types, each with increasing levels of
expressiveness:
Type 3: Regular Grammar:
• The simplest type, used to define regular languages. Its production rules are of the form A → aB or A → a, where A and B
are non-terminal symbols and a is a terminal symbol.
• Regular grammar can be represented by finite automata and is often used in lexical analysis to define token patterns..
Type 2: Context-Free Grammar (CFG):
• In CFG, each production rule has a single non-terminal symbol on the left-hand side (e.g., A → γ, where A is a non-
terminal, and γ is a string of terminals and/or non-terminals).
.
• CFGs are powerful enough to describe most programming language constructs and are typically used for syntax analysis
(parsing)..
Type 1: Context-Sensitive Grammar:
• These grammars allow more complex production rules, where the left-hand side can contain multiple symbols, and the
right side may vary based on the surrounding context. Productions follow the form αAβ → αγβ, where A is a non-
terminal, and α and β can be empty or non-empty strings of terminals/non-terminals.
• Context-sensitive grammars are less restrictive and are used to model context-dependent structures.
• Right-linear grammar: All productions are of the form 𝐴→𝑥𝐵A→xB or 𝐴→𝑥A→x, where non-terminals appear to
• Type 3 grammars can either be right-linear or left-linear:
• Left-linear grammar: All productions are of the form 𝐴→𝐵𝑥A→Bx or 𝐴→𝑥A→x, where non-terminals appear to
the right.
the left.
• However, left-linear grammars are less commonly used since they can often be converted to right-linear grammars, which
align more naturally with finite state machines.
Type 3 Grammar: Overview and Characteristics cont.
Limitations:
• Regular grammars cannot handle nested structures or recursive patterns like parenthesis matching or
palindromes. For example, a Type 3 grammar cannot express languages where symbols must be balanced or
deeply embedded.
Applications in Computing:
• Type 3 grammars and regular languages are widely used in:
• Lexical Analysis: Recognizing tokens (keywords, identifiers, literals) in programming languages.
• Pattern Matching: Used extensively in tools like regular expressions for searching and validating strings.
.
• Finite Automata Design: Since regular languages can be represented by regular expression,
deterministic or non-deterministic finite automata (DFA or NFA), they are essential in designing efficient
parsers and scanners in compilers.
.
• + (plus) matches one or more occurrences of the preceding element.
• ? matches zero or one occurrence.
• | represents an OR operator, such as (cat|dog) to match "cat" or "dog".
• Anchors: Indicate positions in a string.
• ^ matches the start of a string.
• $ matches the end of a string.
• Character Classes: Define sets of characters to match.
• [abc] matches any of "a", "b", or "c".
• \d matches any digit, \w matches any word character, and \s matches whitespace.
• Quantifiers: Specify the number of times an element should appear.
• {n} exactly n occurrences.
• {n,} at least n occurrences.
• {n,m} between n and m occurrences.
Introduction to Regular Expressions (Regex) cont.
Applications of Regular Expressions
• Text Search and Replacement: Regular expressions allow for advanced search patterns and replacements in editors and
command-line tools.
• Data Validation: Common in form input validation for email, phone numbers, or zip codes.
• Tokenization in Programming: Lexical analyzers in compilers use regex to categorize tokens in code.
.
• This matches strings starting with a letter and followed by any combination of letters and digits.
• Keywords: Reserved words, such as "if" or "while," can be matched as exact patterns.
• Regex for "if": \bif\b
• The \b denotes word boundaries, ensuring that "if" is matched as a whole word and not as a part of another word.
• Integer Constants: In languages like C, integers consist of sequences of digits.
• Regex: [0-9]+
• This matches one or more digits.
• Floating-point Numbers: Regular expressions for floating-point numbers recognize decimal patterns.
• Regex: [0-9]+\.[0-9]+
• This matches numbers with a decimal point, such as "123.45".
• Operators: Many operators are single characters, like +, -, *, or =.
• Regex for operators: [+\-*/=]
• This matches any single character operator.