0% found this document useful (0 votes)
5 views5 pages

TLP IEEE (Group-1)

The document presents a project on a tiny language parser that performs syntactic and lexical analysis to convert source code into a structured format. It emphasizes the importance of grammar rules in programming languages for effective parsing and error detection. The parser serves as an educational tool for understanding the fundamentals of compiler design and language processing.

Uploaded by

CS INSIDE FI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

TLP IEEE (Group-1)

The document presents a project on a tiny language parser that performs syntactic and lexical analysis to convert source code into a structured format. It emphasizes the importance of grammar rules in programming languages for effective parsing and error detection. The parser serves as an educational tool for understanding the fundamentals of compiler design and language processing.

Uploaded by

CS INSIDE FI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Tiny Language Parser

Narayana Chakravarthi Omshith R Rayapudi Venkata Rahul


CSE20113 CSE20117 CSE20140
Dept. of Computer Science & Dept. of Computer Science & Dept. of Computer Science &
Engineering Engineering Engineering
Amrita School of Computing. Amrita School of Computing. Amrita School of Computing.
Bengaluru, India Bengaluru, India Bengaluru, India

Abstract— This project presents a small parser that through the language's grammar rules to decipher the
operates in two stages: syntactic and lexical analysis. nuances of the code.
Programming languages have rules about their grammar.
These include things like giving orders, doing loops or This study explores the fundamentals of language
comparing conditions and using input output activities and interpretation with the goal of demonstrating how
expressions. The parser wants to show the important steps lexical and syntactical analysis are essential to the
needed for turning source code into a neat version ready for
more handling. The main goal of the parser is to show how you
parsing of source code. Understanding how
turn source code into a well-organized form ready for more programming languages are processed and interpreted
work. By following the set rules of grammar, the parser makes is based on the synergy between lexical analysis, which
sure that your code matches what's needed from its divides the code into tokens, and syntactical analysis,
programming language. In this project, programmers learn the which determines the hierarchical structure of these
basics of how parsers work. They get to understand things like tokens. This project offers an informative trip into the
word and grammar rules better too. Finally, the parser shows
how programming languages work in a real-world way. This
fundamentals of compiler design through the creation
helps with learning more about designing new languages and of a bespoke top-down parser, opening the door to a
building parsers in the future. better understanding of language structures and the
complex art of parsing.
Keywords—syntactic, lexical, analysis, grammar and parser
II. GRAMMAR USED IN THE TINY
I. INTRODUCTION
PARSER
Parsers are important in computer language work
because they change high-level code into instructions It is impossible to exaggerate the significance of
that computers can understand. A small parser is a grammar in a compact parser, especially one that uses
simple version aimed at showing the basics of grammar a unique top-down parsing strategy with syntactical
and word study. While syntax analysis focuses on the and lexical analysis. Grammar is the blueprint that
rules of structure and grammar in code, lexical analysis specifies the rules and structure necessary to write
divides source code into smaller parts called tokens. In programs in a programming language in an acceptable
language, how words are arranged (called syntax) manner.
follows the proper rules of grammar. The 'Program' is
the main part of what you write in a language. It also The following are some major points that emphasize
includes things called statement lists ('StmtList'). These the importance of grammar in this situation:
are made up of different parts too, which don't end until
they must stop working together to form words and 1. Syntax Definition:
sentences that make sense. On the other hand, a list of
statements has all kinds. This includes if-then rules
('IfStmt'), loops like while and repeat scenarios Grammar establishes the computer language's syntax
('WhileStmt'/'RepeatStmt'). It also contains assignment by describing the acceptable token groupings and their
tasks that tell you how to put values in things or show hierarchical connections. It is the basis for the parser's
what they are (AssignStmts). They can get input from syntactical analysis, which makes it possible to
words given by users ("Reading Stmt"), send results. recognize and verify the structural components
contained in the source code.
Within the field of compiler building, the creation of a
small parser using a unique top-down parsing technique
is a noteworthy investigation into the basic principles 2. Rules for Tokenization:
of language processing. This parser breaks down and
makes sense of the structure of a programming Grammar rules are essential because they define the
language's source code by combining syntactical and patterns that correspond to legitimate tokens in lexical
lexical analysis. This little parser employs a top-down analysis. This guarantees that various components,
parsing approach, starting its analysis from the highest- including operators, literals, identifiers, and keywords,
level structures and working its way recursively are correctly recognized and categorized by the lexer.
Exact tokenization is made possible by well-defined

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


grammar rules, which also make syntactical analysis 'WhileStmt': ['while Condition do StmtList endwhile'],
easier. 'RepeatStmt': ['repeat StmtList until Condition'],
'Condition': ['Expr ComparisonOp Expr'],
3. Parsing Logic: 'ComparisonOp': ['<', '>', '<=', '>=', '==', '<>'],
'Expr': ['Term AddOp Expr', 'Term'],
Grammar rules play a major role in the proprietary top- 'AddOp': ['+', '-'],
down parsing strategy, helping the parser navigate the 'Term': ['Factor MulOp Term', 'Factor'],
code's hierarchical structure. In the parsing process, 'MulOp': ['*', '/'],
each grammar rule relates to a production rule that 'Factor': ['( Expr )', 'Number', 'Identifier'],
controls how higher-level constructions are 'Identifier': [r'[A-Za-z]\w*'],
disassembled into simpler parts. Building the parse 'Number': [r'\d+(\.\d+)?']
tree, also known as the abstract syntax tree, requires }
this parsing logic.
III. LITERATURE SURVEY
4. Error Identification:

Syntax mistake detection relies heavily on grammar Highlights on LR parser comparison and evaluation in
rules. Syntax problems may be quickly detected and relation to compiler design. It is likely that the study
reported by the parser when it comes across code that looks at shift-reduce parsing algorithms, or LR parsing
doesn't follow the prescribed grammar. For developers, techniques, and how to use them in compiler
this feature is essential since it allows them to fix development. It is reasonable to assume that the
problems and improve the general quality of the code. research may provide insights into LR parsing
strategies and their efficacy in the context of compiler
construction, potentially with implications for
5. Language Understanding:
languages like TINY, even though the abstract
provided lacks specific details. This comparison helps
A clearly defined syntax makes the programming
clarify the parser choices made for languages that are
language easier to grasp overall. It provides
comparable to TINY and provides insight into the
clarification on how certain language structures should
effectiveness, performance, and suitability of LR
be expressed and acts as a reference for both users and
parsers for small languages.[1] The results of the paper
parser developers. Building reliable and accurate
may provide useful insights for the development and
compilers requires a knowledge of this fundamental
refinement of parsers for simple languages with
concept.
particular grammar rules, like TINY.
6. Consistency and standardization The high-level domain-specific language
created for the creation of compiler optimizers; the
Grammar offers a uniform and defined means of abstract of the paper does not specifically mention
expressing programs. Developers may make sure that Tiny language; however, the idea of compiler
their code follows an organized and generally optimizers is essential to improving the effectiveness
recognized format by following a specified set of and performance of compilers for languages such as
standards. This uniformity helps to make the code Tiny;[2] these optimizers are crucial in transforming
easier to understand and maintain in addition to and improving the code produced by compilers,
helping with processing. influencing programme execution speed and resource
utilisation; the results of this research are likely to
The following are the grammar rules used in the Tiny provide valuable considerations for the creation of
parser: domain-specific languages intended for compiler
optimisation, as well as optimisations that can be
grammar_rules = applied to languages with similar characteristics to
{ Tiny.
'Program': ['StmtList'],
'StmtList': ['Stmt StmtList', ''], The creation of lexer and parser parts for a
'Stmt': ['AssignStmt', 'ReadStmt', 'WriteStmt', 'IfStmt', compiler that uses Python to target the instruction set
'WhileStmt', 'RepeatStmt'], of the GAMA32 processor. The ideas of lexer and
'AssignStmt': ['Identifier := Expr ;'], parser design presented in the context of compiler
'ReadStmt': ['read Identifier ;'], creation apply to the wider area of compiler building,
'WriteStmt': ['write Expr ;'], including parsers for languages such as Tiny, even if
'IfStmt': ['if Condition then StmtList ElseStmt endif'], the study concentrates on a particular processor
'ElseStmt': ['elseif Condition then StmtList ElseStmt', architecture. The work probably adds something useful
'else StmtList', ''], to the implementation of syntactic and lexical analysis
components, which are essential to parsing simple- it works through the code, the parser in this approach
syntax languages like Tiny. [3]The approaches and creates an abstract syntax tree or a parse tree by
strategies discussed here might provide insightful processing the input from left to right.
viewpoints on the development and use of lexers and A. Lexical analysis:
parsers for processors or languages that have
similarities with the Tiny language. Lexical analysis is the process of dissecting the source
code into tokens, which are the smallest meaningful
Examines how information retrieval systems may be units of the language, such as operators, literals,
improved by developing a tokenizer and parser for the keywords, and identifiers. A custom top-down parser
Mizar language. While Mizar is limited to formal uses rules to identify and classify these tokens,
generating a structured sequence that conforms to the
mathematics, addressing sophisticated and organised
lexical specifications of the language.
language structures presents issues that can be
addressed by developing a versatile tokenizer and The first step in a computer language's compilation
parser. [4]The knowledge gained from this work may process is lexical analysis. Its main objective is to
benefit the larger area of parser design by taking into divide the source code into basic components called
account adaptation and flexibility, two factors that are tokens. This process is sometimes referred to as
essential when creating parsers for languages like scanning or tokenization. The smallest, most significant
Tiny. Through an analysis of the paper's handling of linguistic building elements are represented by these
the complexities of the Mizar language, one may make tokens, which also include literals, operators,
comparisons and obtain insightful insights that are keywords, and identifiers.
relevant to the development of parsers for languages Lexical analysis involves scanning and analyzing the
with less complex syntax, like Tiny. source code character by character in a sequential
manner. Finding patterns in the code that match
The paper focuses on natural language processing predetermined lexical items is the method. These
problems, however parsers that work with patterns are often defined using regular expressions,
programming languages such as Tiny can benefit from which enable the parser to match and extract characters
the study of lexical ambiguity and disambiguation according to predefined criteria.
strategies. [5]Gaining insight from the paper's The lexical analyzer, sometimes called a lexer or
approach to word sense disambiguation and scanner, eliminates unnecessary components like
homonymy resolution can help strengthen Tiny whitespace and comments so that it can only recognize
language parsers' resilience and ensure that statements and classify legitimate tokens. The token stream that is
and expressions are correctly interpreted even when produced is used as input for the compiler's later stages,
they contain similar lexical structures with different especially the syntax analysis stage. Lexical analysis
meanings. makes sure the source code follows the lexical structure
of the language and gets it ready for the compiler to
By investigating and proposing techniques for parsing interpret and process it further.
and analysing the language, the work advances the B. Syntactical analysis:
subject of compiler design by illuminating the
difficulties and solutions related to lexical and Using a unique top-down parsing approach, the
syntactic processing. The work probably covers the syntactical analysis step checks the token sequence and
essential components of syntactic analysis, which applies the language's syntax rules. The program or
statement list is usually the highest-level grammatical
includes grammar and parsing rules, and lexical
rule that this parser begins with. It then recursively
analysis, which includes tokenization, even if the
breaks it down into smaller components until individual
precise details of the EI language are not given. tokens are found. A set of production rules that reflect
[6]Since these elements entail comparable underlying the language's hierarchical syntax serve as guidelines
concepts in compiler building, understanding them is for this procedure.
essential for the development of parsers, particularly
those for languages like TINY. The study broadens our Syntactic analysis, also referred to as parsing, is an
understanding of language translation and compiler essential step in a computer language's compilation
design, two critical fields of inquiry for practitioners process. Its main goal is to use the grammar rules of the
and scholars involved in compiler technology and language to analyze the source code's structure. This
stage comes after the source code has been divided into
programming languages.
tokens by lexical analysis.
In syntactic analysis, the parser looks at how tokens are
IV. METHODOLOGY arranged and checks if they follow the computer
Source code for a programming language is interpreted language's grammar rules. Valid programs are defined
by a small parser that combines lexical and syntactical by their syntactic structure, or syntax, according to the
analysis using a unique top-down parsing technique. As grammar rules. The parser creates a hierarchical
representation of the code using parsing techniques
including bottom-up parsing and recursive descent. followed by an else-if statement or another else
This representation is frequently in the form of an statement.
abstract syntax tree or parse tree.
The syntactic structure of the program is reflected in Else Statement: The term "else" is used in an else
the parse tree, which shows the relationships between sentence, which is then followed by a series of
the various language components. The parser creates statements. The word "end" or another sentence may
the parse tree successfully if the source code complies come after it.
with the grammar rules. On the other hand, the parser
finds and reports grammatical flaws in the code, While Statement: The keywords "while", "do",
assisting developers in fixing and improving the code. "condition", "statement list", and "endwhile" are the
components of a while statement.
In order to guarantee that the source code complies
with the grammatical rules of the programming
language and to provide a structured representation that Repeat Statement: The keywords "repeat", "until", a
can be utilized for further compilation stages, syntactic condition, and a statement list make up a repeat
analysis is essential. statement.

Condition: An expression, a comparison operator,


The custom top-down parser may employ techniques and an additional expression make up a condition.
like recursive descent parsing or predictive parsing,
where it predicts the next production rule based on the Comparison Operator: A comparison operator can
current token, efficiently navigating the grammar to be "<", ">", "<=", ">=", "==", or "<>" which are used
identify the syntactic structure of the source code. By for the comparison between two integer variables.
combining lexical and syntactical analysis in a top-
down approach, this tiny parser provides a systematic Expression: An expression might consist of simply
and structured means of interpreting and understanding one word or of a term followed by an addition operator
programming language constructs. and another expression.
The grammar of the tiny parser language is described
by the following: Addition Operator: An addition operator can be "+"
which is used to add two or more integer variables.
Program: The program is made up of a set of
statements called statement list. Subtraction Operator: A subtraction operator can
Statement List: An empty string or a statement be "-" which is used to subtract two or more integer
followed by another statement list can both be found in variables.
a statement list.
Term: A term may consist of only one factor or of a
Statement: There are several types of statements,
factor followed by a multiplication operator and
such as assignment statements, read statements, write
statements, if statements, while statements, and repeat another term.
statements.
Multiplication Operator: A multiplication operator
Assignment Statement: The elements of an can be "*" used for the multiplication operation.
assignment statement are an identifier, ":=", and an
expression, which are finished by a semicolon. Division Operator: A Division operator can be "/"
Read Statement: The phrase "read" is used in a read used for the division operation for more than one
statement, and it is followed by an identifier and a integer variables.
semicolon.
Factor: A factor can be an expression enclosed in
Write Statement: An expression or the keyword parentheses, a number, or an identifier.
"endl" come after the word "write" in a write statement,
which is then ended by a semicolon. Identifier: An identifier starts with a letter and can be
If Statement: The elements of an if statement are the followed by any combination of letters and digits.
keyword "if" followed by a condition, the keyword
"then", a list of statements, an optional otherwise-if Number: A number can be a sequence of digits,
statement, an optional else statement, and the keyword possibly including a decimal point.
"endif".
V. CONCLUSION
Else-If Statement: An else-if statement consists of In summary, the use of a compact parser with a
the keyword "elseif" followed by a list of statements, a unique top-down parsing strategy that combines
condition, and the term "then". It may be empty, syntactical and lexical analysis turns out to be a useful
tool for comprehending the basic procedures involved Compilers," 2022 International Conference on Augmented
Intelligence and Sustainable Systems (ICAISS), Trichy, India,
in language interpretation. A thorough study of the 2022, pp. 707-714, doi:
grammar rules of the programming language is made 10.1109/ICAISS55157.2022.10010906.
possible by the combination of lexical analysis, which
dissects source code into understandable tokens, and [2] S. Venkat and P. Kanwal, ”COpt: A High-Level Domain-
syntactic analysis, which determines the hierarchical Specific Language to Generate Compiler Optimizers,” 2018
International Conference on Advanced Computation and
structure of these tokens. Telecommunication (ICACAT), Bhopal, India, 2018, pp. 1-6,
doi: 10.1109/ICACAT.2018.8933593.
Using a top-down parsing approach, the parser
begins with high-level constructions and proceeds to [3] W. Jordan, A. Bejo and A. G. Persada, ”The Development of
identify and structure the source code by recursively Lexer and Parser as Parts of Compiler for GAMA32
navigating through the grammar rules. This not only Processor’s Instruction-set using Python,” 2019 International
Seminar on Research of Information Technology and
helps identify and report syntactic problems, but it also Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 2019,
offers a methodical and user-friendly approach to pp. 450-455, doi: 10.1109/ISRITI48646.2019.9034617.
understanding the connections between various
linguistic structures.
[4] Nakasho, Kazuhisa. ”Development of a flexible Mizar
With its insights into the complexities of compiler tokenizer and parser for information retrieval system.” In
design and the significance of lexical and syntactical 2019 Federated Conference on Computer Science and
Information Systems (FedCSIS), pp. 77-80. IEEE, 2019.
analysis in processing programming languages, the
little parser doubles as an instructional tool. By giving
[5] G. Chetverikov, O. Puzik and O. Tyshchenko, "Analysis
developers a hands-on opportunity to investigate the of the Problem of Homonyms in the Hyperchains
nuances of language processing, its practical use paves Construction for Lexical Units of Natural Language,"
the way for future research into language design and 2018 IEEE 13th International Scientific and Technical
Conference on Computer Sciences and Information
compiler architecture. All in all, the little parser serves Technologies (CSIT), Lviv, Ukraine, 2018, pp. 356-
as a reminder of the value of organized parsing 359, doi: 10.1109/STC-CSIT.2018.8526663.
methods for understanding and analyzing computer
languages.
[6] A. A. Maliavko, "The Lexical and Syntactic Analyzers
of the Translator for the EI Language," 2018 XIV
VI. REFERENCES International Scientific-Technical Conference on Actual
Problems of Electronics Instrument Engineering
(APEIE), Novosibirsk, Russia, 2018, pp. 360-364, doi:
[1] M. Shah, N. Chitroda, S. Dharmajwala and A. Vasant, 10.1109/APEIE.2018.8545874.
"Comparative Analysis of LR Parsers in Designing of

You might also like