CSC312 2.docx Updated
CSC312 2.docx Updated
CSC312 2.docx Updated
WHAT IS PARSING?
It means to divide something into parts to examine each part individually, you
can take a sentence, split it into grammatical components then identify them
and the relations between them. An example is URL parsing which is common
in web development, it involves breaking down a URL into components such as
the protocol, domain, path, query parameters. We can take a look at this,
parsing the URL https://www.example.com/page?query=123 would extract the
protocol (“http”), domain (www.example.com), path (“/page”), and query
parameter (“query=123”).
WHAT IS A PARSER?
A parser is a program that’s usually part of a compiler, It receives input in the
form of sequential source program instructions, interactive online commands,
markup tags or some other defined interface. Parsers break the input they get
into parts such as objects, methods and their attributes or options. A practical
example of a parser reading and understanding a shopping list, your shopping
list has items like apples, milk and bread and you want to organise the list into
categories like fruits, dairy and bread products, it can be organised using the
following:
1. Reading the list.
2. Understanding categories.
3. Organizing the list.
4. Creating a structured list.
The parser helps implement the above and it makes your shopping easier by
diving it into sections.
WHAT DO PARSING AND PARSERS ENTAIL?
The word "parse" means to analyse an object specifically. It is commonly used
in computer science to refer to reading program code. For example, after a
program is written, whether it be in C++, Java, or any other language, the code
needs to be parsed by the compiler in order to be compiled.
Parsing can also refer to breaking up ordinary text. For example, search
engines typically parse search phrases entered by users so that they can more
accurately search for each word.
The process of transforming the data from one format to another is called
Parsing, In the world of software, every different entity has its criteria for the
data to be processed. So parsing is the process that transforms the data in such
a way so that it can be understood by any specific software.
This process can be accomplished by the parser. The parser is a component of
the translator that helps to organise linear text structure following the set of
defined rules which is known as grammar.
Parsing, in general, refers to the process of analysing a string of symbols
according to the rules of a formal grammar. It’s a fundamental concept used in
various fields, including linguistics, mathematics, and computer science. When
we talk about parsing in programming, we’re specifically referring to the
process of analysing strings of text according to the syntax rules of a
programming language or a data format.
Types of Parsing:
There are two types of Parsing:
Top-Down Parsing:
Top-down parsing is a parsing technique that first looks at the highest level of
the parse tree and works down the parse tree by using the rules of grammar,
in top-down parsing, you start from the root of the tree and work your way
down towards the leaves. It's like starting from the top of the tree and trying
to figure out how it branches out.
Let’s take this;
S -> aABe , A -> Abc|a, B -> d , where S is the start symbol and there are three
production rules.
S -> aabcde
Process:
You begin at the top node (the root) of the tree, which represents the start
symbol of the grammar.
You recursively descend the tree, following branches based on production
rules in the grammar.
At each node, you try to match the current input with the expected symbol or
pattern.
If there's a match, you continue descending further down the tree. If not, you
backtrack and try another branch.
Similitude (Analogy):
Imagine you're trying to identify a specific type of tree by starting with its
general characteristics (like the shape of its leaves, the color of its bark). Then,
based on these characteristics, you narrow down your search to find the exact
type of tree.
Example:
Let's say you're parsing an arithmetic expression. Starting with the root symbol
for an expression, you recursively break it down into smaller components like
terms, factors, and individual tokens until you reach the leaves, which
represent the tokens in the expression.
Bottom-Up Parsing: Bottom-up Parsing is a parsing technique that first looks
at the lowest level of the parse tree and works up the parse tree by using the
rules of grammar. In bottom-up parsing, you start from the leaves of the tree
and work your way up towards the root. It's like starting with individual pieces
and gradually assembling them into larger structures.
LHS (Left-Hand Side can be re-written as RHS (Right-Hand Side):
RHS (Right-Hand Side) can be reduced to LHS (Left-Hand Side) using bottom up
approach, so we will keep reducing till we get to the Start symbol. I.e aabcde
will be reduced to S -> aABe
Process:
You begin with the individual tokens (leaves) of the tree, treating each token as
a small parse tree by itself.
You combine adjacent tokens according to production rules to create larger
and larger parse trees.
Eventually, you merge these smaller parse trees together until you reach the
root of the tree, which represents the entire input.
Similitude (Analogy):
Think of it as assembling a jigsaw puzzle from individual pieces. You start with
the pieces (tokens) and gradually connect them based on their shapes and how
they fit together, until you form the complete picture (the parsed input).
Example:
Using the arithmetic expression example again, you start with individual tokens
representing numbers, operators, and parentheses. Then, based on grammar
rules, you combine adjacent tokens into larger components like factors, terms,
and finally, the entire expression.
Conclusion:
Both top-down and bottom-up parsing approaches offer different perspectives
on how to understand and process the structure of a language. While top-
down parsing starts with the big picture and breaks it down into smaller parts,
bottom-up parsing begins with the individual pieces and builds them up into
larger structures. Each approach has its own strengths and weaknesses,
depending on the specific grammar and parsing requirements.
at the highest level of the parse tree looks at the lowest level of the
and works down the parse tree by parse tree and works up the parse
using the rules of grammar. tree by using the rules of grammar.
This parsing technique uses Left Most This parsing technique uses Right
Derivation. Most Derivation.
E –> T E’
E –> E + T | T E’ –> + T E’ | e
T –> T * F | F T –> F T’
F –> ( E ) | id T’ –> * F T’ | e
F –> ( E ) | id
Here is E Epsilon
For Recursive Descent Parser, we are going to write one program for every
variable.
In LL1, the first L stands for Left to Right, and the second L stands for Left-most
Derivation. 1 stands for the number of Looks Ahead tokens used by the parser
while parsing a sentence. LL(1) parsing is constructed from the grammar which
is free from left recursion, common prefix, and ambiguity.
Let’s take this practical example
Imagine you have a recipe, and you want to follow it step by step to cook a
dish. Each step in the recipe corresponds to a rule in our grammar. Our goal is
to understand the recipe (grammar) and follow it correctly to make the dish
(parse the input) .Now, let's say our recipe (grammar) is written in such a way
that each step (rule) tells us exactly what ingredient or action to take next
without any ambiguity. This means, for any step we're on and any ingredient
or action we're considering, there's only one choice to make to move forward.
Here comes the LL(1) part: "LL" means we read the input from left to right and
construct a leftmost derivation (we start from the left of a sentence and keep
replacing non-terminals with their productions until we reach the rightmost
derivation). And "1" means we only need to look at the next 1 token in the
input to make decisions.
So, in our cooking scenario, being LL(1) means that for each step in the recipe,
we only need to look at the current ingredient or action we're dealing with to
know exactly what to do next. There's no need to look ahead multiple steps or
backtrack. To put it simply, LL(1) parsing is like following a super clear and
straightforward recipe where each step tells you exactly what to do next based
on what you're currently dealing with, without any confusion or need to guess.
Program -> Statement*
Statement -> VariableDeclaration | Assignment | IfStatement | WhileLoop |
FunctionDefinition
VariableDeclaration -> 'var' Identifier ';'
Assignment -> Identifier '=' Expression ';'
IfStatement -> 'if' '(' Expression ')' '{' Statement* '}' ('else' '{' Statement* '}' )?
WhileLoop -> 'while' '(' Expression ')' '{' Statement* '}'
FunctionDefinition -> 'function' Identifier '(' Parameters ')' '{' Statement* '}'
LR Parser
A general shift reduce parsing is LR parsing. The L stands for scanning the input
from left to right and R stands for constructing a rightmost derivation in
reverse.
Benefits of LR parsing:
1. Many programming languages using some variations of an LR parser. It
should be noted that C++ and Perl are exceptions to it.
2. LR Parser can be implemented very efficiently.
3. Of all the Parsers that scan their symbols from left to right, LR Parsers
detect syntactic errors, as soon as possible.
Imagine you have a bunch of Lego blocks scattered on a table, and you
want to build a spaceship out of them. Each Lego block represents a part
of the spaceship, like a wing, a fuselage, or an engine. Now, you have a
set of assembly instructions (grammar rules) that tell you how to put
these blocks together to build the spaceship.
In essence, LR parsing is like building a Lego spaceship by scanning
through the blocks, following assembly instructions, and gradually
assembling them into larger structures until you've constructed the
entire spaceship. It's a bottom-up process because you start with
individual blocks and build up to the final structure, just like how
bottom-up parsing starts with individual tokens and builds up to the
complete parse tree.