0% found this document useful (0 votes)
23 views

Lexical and Syntax Analysis_Updated

Uploaded by

amc204849
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lexical and Syntax Analysis_Updated

Uploaded by

amc204849
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Lexical and Syntax Analysis

 We know three different ways of implementing programming languages, compilation, pure interpretation, and hybrid
implementation.
 The compilation approach uses a program called a compiler, which translates programs written in a high-level
programming language into machine code.
 Compilation is typically used to implement programming languages that are used for large applications, often written in
languages such as C++ and COBOL.
 Pure interpretation systems perform no translation; rather, programs are interpreted in their original form by a software
interpreter
 HTML and Java Scripts are example of Pure interpretation, where execution efficiency is not required.
 Hybrid implementation systems translate programs written in high-level languages into intermediate forms, which are
interpreted.
 In recent years the use of Just-in-Time (JIT) compilers has become widespread, particularly for Java programs and
programs written for the Microsoft .NET system.
 A JIT compiler, which translates intermediate code to machine code, is used on methods at the time they are first called.
 A JIT compiler transforms a hybrid system to a delayed compiler system.
 Syntax analyzers, or parsers, are always based on a formal description of the syntax of program
 There 3 compelling advantages of using BNF Description
o First, BNF descriptions of the syntax of programs are clear and concise
o Second, the BNF description can be used as the direct basis for the syntax analyzer.
o Third, implementations based on BNF are relatively easy to maintain because of their modularity.
 Compilers separate the task of analyzing syntax into two distinct parts, named
o lexical analysis and
o syntax analysis
 The lexical analyzer deals with small-scale language constructs, such as names and numeric literals.
 The syntax analyzer deals with the large-scale constructs,
 There are three reasons why lexical analysis is separated from syntax analysis:
o 1. Simplicity—Techniques for lexical analysis are less complex than those required for syntax analysis, so the
lexical-analysis process can be simpler if it is separate. Also, removing the low-level details of lexical analysis
from the syntax analyzer makes the syntax analyzer both smaller and less complex.
o 2. Efficiency—Lexical analysis requires a significant portion of total compilation time; it is not fruitful to
optimize the syntax analyzer.
o 3. Portability—Because the lexical analyzer reads input program files and often includes buffering of that
input, it is somewhat platform dependent. However, the syntax analyzer can be platform independent. It is
always good to isolate machine-dependent parts of any software system.
o A lexical analyzer serves as the front end of a syntax analyzer. First phase in compiler designing, it helps you
to convert a sequence of characters into a sequence of tokens. The lexical analyzer breaks this syntax into a
series of tokens.
o Technically, lexical analysis is a part of syntax analysis.
o A lexical analyzer performs syntax analysis at the lowest level of program structure.
o An input program appears to a compiler as a single string of characters.
o The lexical analyzer collects characters into logical groupings and assigns internal codes to the groupings
according to their structure.
o these logical groupings are named lexemes,
o and the internal codes for categories of these groupings are named tokens
o
o Lexical analyzers extract lexemes from a given input string and produce the corresponding tokens.
o The lexical-analysis process includes skipping comments and white space outside lexemes, as they are not
relevant to the meaning of the program
 There are three approaches to building a lexical analyzer:
o Write a formal description of the token patterns of the language using a descriptive language related to
regular expressions.
o These descriptions are used as input to a software tool that automatically generates a lexical analyzer.
o Design a state transition diagram that describes the token patterns of the language and write a program that
implements the diagram.
o Design a state transition diagram that describes the token patterns of the language and hand-construct a
table-driven implementation of the state diagram

Steps to draw a state diagram –

 Identify the initial state and the final terminating states.


 Identify the possible states in which the object can exist (boundary values corresponding to different attributes guide us
in identifying different states).
 Label the events which trigger these transitions.
 Convenient utility subprograms:
 getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass
 addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme
 lookup - determines whether the string in lexeme is a reserved word (returns a code)

 The Parsing Problem


o Goals of the parser, given an input program:
o Find all syntax errors; for each, produce an appropriate diagnostic message and recover quickly
o Produce the parse tree, or at least a trace of the parse tree, for the program
 Introduction to Parsing
o Parsers for programming languages construct parse trees for given programs.
o Both parse trees and derivations include all of the syntactic information needed by a language processor.
 There are two distinct goals of syntax analysis:
o First, the syntax analyzer must check the input program to determine whether it is syntactically correct.
o The second goal of syntax analysis is to produce a complete parse tree, or at least trace the structure of the
complete parse tree, for syntactically correct input.
o Parsers are categorized according to the direction in which they build parse trees.
o The two broad classes of parsers are top-down, in which the tree is built from the root downward to the
leaves, and bottom-up, in which the parse tree is built from the leaves upward to the root.
o We use a small set of notational conventions for grammar symbols and strings to make the discussion less
cluttered.
o 1. Terminal symbols—lowercase letters at the beginning of the alphabet (a, b, . . .)
o 2. Nonterminal symbols—uppercase letters at the beginning of the alphabet (A, B, . . .)
o 3. Terminals or non-terminals—uppercase letters at the end of the alphabet (W, X, Y, Z)
o 4. Strings of terminals—lowercase letters at the end of the alphabet (w, x, y, z)
 Lexemes are terminals
 Non terminals are in angled bracket for example,
 <while_statement>, <expr>, and <function_def>.
 The sentences of a language (programs, in the case of a programming language) are strings of terminals.
 Mixed strings describe right-hand sides (RHSs) of grammar rules and are used in parsing algorithms
 Top-Down Parsers
o Is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree
by using the rewriting rules of a formal grammar.
o Each node is visited before its branches are followed. Branches from a particular node are followed in left-to-
right order. This corresponds to a leftmost derivation
 Bottom-Up Parsers
o A bottom-up parser constructs a parse tree by beginning at the leaves and progressing toward the root.
o This parse order corresponds to the reverse of a rightmost derivation.
o For example, the first step for a bottom-up parser is to determine which substring of the initial given sentence
is the RHS to be reduced to its corresponding LHS to get the second last sentential form in the derivation.
o The process of finding the correct RHS to reduce is complicated by the fact that a given right sentential form
may include more than one RHS from the grammar of the language being parsed.
o The correct RHS is called the handle.
 The Complexity of Parsing
o Parsers that work for any unambiguous grammar are complex and inefficient ( O(n3), where n is the length of
the input )
o Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time ( O(n),
where n is the length of the input )
o There is a subprogram for each nonterminal in the grammar, which can parse sentences that can be
generated by that nonterminal
o A grammar for simple expressions:
o <expr>  <term> {(+ | -) <term>}
o <term>  <factor> {(* | /) <factor>}
o <factor>  id | int_constant | ( <expr> )

o Assume we have a lexical analyzer named lex, which puts the next token code in nextToken
o The coding process when there is only one RHS:
o For each terminal symbol in the RHS, compare it with the next input token; if they match, continue, else there
is an error
o For each nonterminal symbol in the RHS, call its associated parsing subprogram

o
o A nonterminal that has more than one RHS requires an initial process to determine which RHS it is to parse
o The correct RHS is chosen on the basis of the next token of input (the lookahead)
o The next token is compared with the first token that can be generated by each RHS until a match is found
o If no match is found, it is a syntax error

o
o Furthermore, parsers must recover from the error so that the parsing process can continue.

o
o
o

You might also like