Lexical and Syntax Analysis_Updated
Lexical and Syntax Analysis_Updated
We know three different ways of implementing programming languages, compilation, pure interpretation, and hybrid
implementation.
The compilation approach uses a program called a compiler, which translates programs written in a high-level
programming language into machine code.
Compilation is typically used to implement programming languages that are used for large applications, often written in
languages such as C++ and COBOL.
Pure interpretation systems perform no translation; rather, programs are interpreted in their original form by a software
interpreter
HTML and Java Scripts are example of Pure interpretation, where execution efficiency is not required.
Hybrid implementation systems translate programs written in high-level languages into intermediate forms, which are
interpreted.
In recent years the use of Just-in-Time (JIT) compilers has become widespread, particularly for Java programs and
programs written for the Microsoft .NET system.
A JIT compiler, which translates intermediate code to machine code, is used on methods at the time they are first called.
A JIT compiler transforms a hybrid system to a delayed compiler system.
Syntax analyzers, or parsers, are always based on a formal description of the syntax of program
There 3 compelling advantages of using BNF Description
o First, BNF descriptions of the syntax of programs are clear and concise
o Second, the BNF description can be used as the direct basis for the syntax analyzer.
o Third, implementations based on BNF are relatively easy to maintain because of their modularity.
Compilers separate the task of analyzing syntax into two distinct parts, named
o lexical analysis and
o syntax analysis
The lexical analyzer deals with small-scale language constructs, such as names and numeric literals.
The syntax analyzer deals with the large-scale constructs,
There are three reasons why lexical analysis is separated from syntax analysis:
o 1. Simplicity—Techniques for lexical analysis are less complex than those required for syntax analysis, so the
lexical-analysis process can be simpler if it is separate. Also, removing the low-level details of lexical analysis
from the syntax analyzer makes the syntax analyzer both smaller and less complex.
o 2. Efficiency—Lexical analysis requires a significant portion of total compilation time; it is not fruitful to
optimize the syntax analyzer.
o 3. Portability—Because the lexical analyzer reads input program files and often includes buffering of that
input, it is somewhat platform dependent. However, the syntax analyzer can be platform independent. It is
always good to isolate machine-dependent parts of any software system.
o A lexical analyzer serves as the front end of a syntax analyzer. First phase in compiler designing, it helps you
to convert a sequence of characters into a sequence of tokens. The lexical analyzer breaks this syntax into a
series of tokens.
o Technically, lexical analysis is a part of syntax analysis.
o A lexical analyzer performs syntax analysis at the lowest level of program structure.
o An input program appears to a compiler as a single string of characters.
o The lexical analyzer collects characters into logical groupings and assigns internal codes to the groupings
according to their structure.
o these logical groupings are named lexemes,
o and the internal codes for categories of these groupings are named tokens
o
o Lexical analyzers extract lexemes from a given input string and produce the corresponding tokens.
o The lexical-analysis process includes skipping comments and white space outside lexemes, as they are not
relevant to the meaning of the program
There are three approaches to building a lexical analyzer:
o Write a formal description of the token patterns of the language using a descriptive language related to
regular expressions.
o These descriptions are used as input to a software tool that automatically generates a lexical analyzer.
o Design a state transition diagram that describes the token patterns of the language and write a program that
implements the diagram.
o Design a state transition diagram that describes the token patterns of the language and hand-construct a
table-driven implementation of the state diagram
o Assume we have a lexical analyzer named lex, which puts the next token code in nextToken
o The coding process when there is only one RHS:
o For each terminal symbol in the RHS, compare it with the next input token; if they match, continue, else there
is an error
o For each nonterminal symbol in the RHS, call its associated parsing subprogram
o
o A nonterminal that has more than one RHS requires an initial process to determine which RHS it is to parse
o The correct RHS is chosen on the basis of the next token of input (the lookahead)
o The next token is compared with the first token that can be generated by each RHS until a match is found
o If no match is found, it is a syntax error
o
o Furthermore, parsers must recover from the error so that the parsing process can continue.
o
o
o