Chap-1 Language Processors
Chap-1 Language Processors
Chap-1 Language Processors
Dhamdhere
System Programing by John J. Donovan
System:
Software :
Computer software, or simply software, refers to the
non-tangible components of computers, known
as computer programs. The term is used to contrast
with computer hardware, which denotes the physical
tangible components of computers.
Software can be classified into
◦ System software:
◦ System software (or systems software) is computer software designed to operate and
control the computer hardware and to provide a platform for running application software.
◦ Application software:
Application software is kind of software which is designed for fulfillment specialized
user requirement.
MS Office
Adobe Photoshop
The system software work as middleware between
application software and hardware.
Application software
System Software
Hardware
Language processors (Why?)
◦ Language processing activities arise due to the
differences between the manner in which a
software designer describes the ideas concerning
the behavior of software and the manner in which
these ideas are implemented in computer system.
Application Execution
domain domain
Application PL Execution
domain domain domain
◦ A Language Translator
◦ De-translator
◦ Preprocessor
◦ Language migrator
Errors
C++ C
C++ Program
Preprocessor Program
Errors
Machine
C++
C++ Program language
Translator
program
An interpreter is language processor which bridges an
execution gap without generating a machine language
program that means the execution gap vanishes totally.
Interpreter domain
Application PL Execution
domain domain domain
Three consequences of the semantic gap are in fact the
consequences of specification gap.
A classical solution is to develop a PL such that the PL
domain is very close or identical to the application
domain.
Such PLs can only used for specific applications, they
are problem oriented languages.
A procedure oriented language provides general
purpose facilities required in most application domains.
Specification
Execution gap
gap
Problem
Application oriented Execution
domain language domain
domain
Fundamental activities divided into those that bridge
the specification gap and execution gap.
◦ Program generation activities
◦ Program execution activities
Program generation activities
◦ A program generation activity aims at automatic generation of
a program.
◦ A source language is a specification language of an application
domain and the target language is procedure oriented PL.
◦ Program generator introduces a new domain between the
application and PL domain , call this the program generator
domain.
◦ Specification gap now between Application domain and
program generation domain, reduction in the specification gap
increases the reliability of the generated program.
This arrangement also reduces the testing effort.
Errors
Errors Data
PC PC
Machine
Source Program language
Errors + Program
Data +
Data
Language Processor
Source Target
Analysis Phase Synthesis Phase
Program Program
Errors Errors
percent_profit = (profit*100) / cost_price;
Lexical Analysis
Syntax Analysis
Semantic Analysis
Forward references: for reducing execution gap the
language processor can performed on a statement by
statement basis.
Analysis of source statement can be immediately
followed by synthesis of equivalent target statements.
But this may not feasible due to :Forward reference
“A forward reference of a program entity is a reference
to the entity which precedes its definition in the
program.”
Language processor pass: “A language processor pass
is the processing of every statement in a source
program, or its equivalent representation, to perform a
language processing function.”
Intermediate representation(IR):
“An intermediate representation is a representation
of a source program which reflects the effect of some,
but not all, analysis and synthesis tasks performed
during language processing.”
Language Processor
Source Target
Front end Back end
Program Program
Intermediate
representation (IR)
Semantic Action: “All the actions performed by the
front end, except lexical and syntax analysis, are called
semantic action.
◦ Checking semantic validity of constructs in SP
◦ Determining the meaning of SP
◦ Constructing an IR
The Front End
◦ The front end performs lexical, syntax and semantic
analysis of the source program, each kind of
analysis involves the following functions:
Determine validity of source statement from the view
point of the analysis.
Determine the ‘content’ of a source statement
For lexical, the lexical class to which each lexical unit
belongs.
Syntax analysis it is syntactic structure of source program.
Semantic analysis the content is the meaning of a
statement.
Construct a suitable representation of source
statement for use by subsequent analysis
function/synthesis phase.
Source
Program
Scanning (Lexical
Analysis)
Symbol table
Parsing (Syntax
Constants
Analysis)
table
Etc..
Semantic Analysis
IC
IR
Out put of front end produced two
components: (IR)
◦ Table of information
The symbol table which contain information
concerning all identifier used in the source program.
Parsing YACC
Semantic Analysis
Language Processor
Source Target
Front end Back end
Program Program
Intermediate
representation
(IR)
Lex accepts an input specification which consist
of three components.
1. Definations
2. Rules
3. User Code
• Once you have defined your terms, you can write the
rules section. It contains strings and expressions to be
matched by the yylexsubroutine, and C commands to
execute when a match is made.
• This section is required, and it must be preceded by
the delimiter %%(double percent signs), whether or not
you have a definitions section. The lex command does
not recognize your rules without this delimiter.
Defining Patterns in Lex
• X
match the character `x‘
• .
any character except newline.
• [xyz]
a "character class"; in this case, the pattern matches either
an `x', a `y', or a `z‘.
• r*
zero or more r's, where r is any regular expression
• r+
one or more r's
User Code Section
%{
#include <iostream>
%}
%%
[ \t] ;
[0-9]+\.[0-9]+ { cout << "Found a floating-point number:" << yytext << endl; }
[0-9]+ { cout << "Found an integer:" << yytext << endl; }
[a-zA-Z0-9]+ { cout << "Found a string: " << yytext << endl; }
%%
main() {
// lex through the input:
yylex();
}
Each string specification in the input to yacc
resembles a grammar production.
The parser generated by yacc performs
reductions according to this grammar.
The action associated with a string specification
are executed when a reduction is made
according to specification.
Finite Automata
Example
<Noun Phrase> ::= <Article> <Noun>
<Article> ::= a | an | the
<Noun>::= boy | apple
Derivation -- Example
<Noun Phrase> ::= <Article> <Noun>
<Article> ::= a | an | the
<Noun>::= boy | apple
Suppose we want to derivate strings “the boy”
“” denote direct derivation.
<Noun Phrase> <Article> <Noun>
the <Noun>
the boy Leftmost Derivation
Recursive Specification
• A grammar is in recursive specification, if NT being defining
in a production, itself occurs in a RHS string of the
production, e..g. X::=AXB
• The RHS alternative employing recursion is called recursive
rules.
Recursive Specification
Consider the grammar G
Recursive Specification
[..] denotes an optional specification
Recursive Specification
• Two types of recursive rules
• Left recursive rule NT appears on the extreme left in
the recursive rule
• Right recursive rule NT appears on the extreme right in
the recursive rule
Recursive Specification
Indirect recursion
Occurs when two or more NTs are defined in terms of one
another.
Such recursion is useful for specifying nested constructs in a
language
Recursive Specification
Grammars are classified as
• Type–0 (Phrase structure grammar)
α=β (strings of Ts and NTs)
-Permits arbitrary substitutions of strings
-No limitation on production rules: at least one nonterminal on LHS.
-not relevant to specification of PLs.
Example:
Start = <S>
<S> ⇒<S> <S> <A><B> ⇒<B><A>
<S> ⇒<A> <B> <C> <B><A> ⇒<A><B>
<A> ⇒ a <A><C> ⇒<C><A>
<B> ⇒b <C><A> ⇒<A><C>
<C> ⇒ c <B><C> ⇒<C><B>
<S> ⇒ ε
Strings generated:
ε, abc, aabbcc, cabcab, acacacacacacbbbbbb, ...
Type–1 (Context sensitive
grammar)
αA β = α Πβ
-not relevant to specification of PLs.
Type–2 (Context free
grammar)
• A=Π
• Limit production rules to have exactly one nonterminal on LHS, but
anything on RHS.
Compilation time of P
The binding of the attributes of variables is performed.
Example the int is bounded with a variable var.