Lec#1
Lec#1
Lec#1
CHAPTER# 1
INSTRUCTOR: MR. FAIZ RASOOL
EMAIL: [email protected]
ABOUT COURSE AND INSTRUCTOR
Designation : Lecturer
Source program
Interpreter Output
Input
The machine-language target program produced by a compiler is usually much faster than
an interpreter at mapping inputs to outputs . An interpreter, however, can usually give better
error diagnostics than a compiler, because it executes the source program statement by
statement.
JAVA PROCESSORS
Example: Java language processors combine compilation and interpretation.
• A Java source program may first be compiled into an intermediate form called bytecodes.
• The bytecodes are then interpreted by a virtual machine. A benefit of this arrangement is that
bytecodes compiled on one machine can be interpreted on another machine, perhaps across a
network.
• In order to achieve faster processing of inputs to outputs, some Java compilers, called just-in-time
compilers, translate the bytecodes into machine language immediately before they run the
intermediate program to process the input.
A HYBRID COMPILER OF JAVA
Source Program
Translator
Intermediate Program
Virtual Machine Output
Input
THE STRUCTURE OF A COMPILER
• Till now point we have treated a compiler as a single box that maps a source program into a
semantically equivalent target program. If we open up this box a little, we see that there are two
parts to this mapping:
• Analysis
• Synthesis
• The analysis part breaks up the source program into constituent pieces and imposes a grammatical
structure on them. It then uses this structure to create an intermediate representation of the source program.
• If the analysis part detects that the source program is either syntactically ill formed or semantically unsound,
then it must provide informative messages, so the user can take corrective action.
• The analysis part also collects information about the source program and stores it in a data structure called a
Symbol table, which is passed along with the intermediate representation to the synthesis part
THE STRUCTURE OF A COMPILER
• The Synthesis part constructs the desired target program from the intermediate
representation and the information in the symbol table.
• The analysis part is often called the front end of the compiler; the synthesis part is the back
end.
PHASES OF COMPILER
• If we examine the compilation process in more detail, we see that it operates as a
sequence of phases, each of which transforms one representation of the source
program to another. A typical decomposition of a compiler into phases is shown in
PHASES OF COMPILER
PHASES OF COMPILER
• In practice, several phases may be grouped together, and the intermediate
representations between the grouped phases need not be constructed
explicitly. The symbol table, which stores information about the entire source
program, is used by all phases of the compiler
• Some compilers have a machine-independent optimization phase between
the front end and the back end.
• The purpose of this optimization phase is to perform transformations on the
intermediate representation, so that the back end can produce a better target
program than it would have otherwise produced from an unoptimized
intermediate representation. Since optimization is optional, one or the other
of the two optimization phases shown in Fig. 1.6 may be missing.
LEXICAL ANALYSIS
• The first phase of a compiler is called lexical analysis or scanning. The lexical
analyzer reads the stream of characters making up the source program and
groups the characters into meaningful sequences called lexemes. For each
lexeme, the lexical analyzer produces as output a token of the form
(token-name, attribute-value)
• that it passes on to the subsequent phase, syntax analysis. In the token, the
first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the
symbol table for this token. Information from the symbol-table entry Is needed
for semantic analysis and code generation.
LEXICAL ANALYSIS
For example, suppose a source program contains the assignment statement
position = initial + rat e * 60
• The characters in this assignment could be grouped into the following lexemes and mapped
into the following tokens passed on to the syntax analyzer:
• Position is a lexeme that would be mapped into a token (id, 1), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The symbol-table
entry for an identifier holds information about the identifier, such as its name and type.
• The assignment symbol = is a lexeme that is mapped into the token (=). Since this token
needs no attribute-value, we have omitted the second component. We could have used any
abstract symbol such as assign for the token-name, but for notational convenience we have
chosen to use the lexeme itself as the name of the abstract symbol.
• Initial is a lexeme that is mapped into the token (id, 2), where 2 points to the symbol-table
entry for initial.
LEXICAL ANALYSIS
• + is a lexeme that is mapped into the token (+).
• Rate is a lexeme that is mapped into the token (id, 3), where 3 points to the symbol-table
entry for rate.
• * is a lexeme that is mapped into the token (*).
• 60 is a lexeme that is mapped into the token (60). 1 Blanks separating the lexemes would
be discarded by the lexical analyzer
• After lexical analysis as the sequence of tokens (id,l) <=) (id, 2) (+) (id, 3) (*) (60)
• In this representation, the token names =, +, and * are abstract symbols for the
assignment, addition, and multiplication operators, respectively. 1
• Technically speaking, for the lexeme 60 we should make up a token like (number, 4),
where 4 points to the symbol table for the internal representation of integer 60
• Page 30
SYNTAX ANALYSIS
• Second phase of the compiler is syntax analysis or parsing. The parser uses the first
components of the tokens produced by the lexical analyzer to create a tree-like
intermediate representation that depicts the grammatical structure of the token
stream.
• A typical representation is a syntax tree in which each interior node represents an
operation and the children of the node represent the arguments of the operation.
• Page 31
SEMANTIC ANALYSIS
• The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
• It also gathers type information and saves it in either the syntax tree or the symbol
table, for subsequent use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the compiler checks
that each operator has matching operands
• For example, many programming language definitions require an array index to be
an integer; the compiler must report an error if a floating-point number is used to
index an array.
• The language specification may permit some type conversions called coercions
INTERMEDIATE CODE GENERATION
• The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.
• The data structure should be designed to allow the compiler to find the record
for each name quickly and to store or retrieve data from that record quickly.
GROUPING OF PHASES INTO PASSES
• The first electronic computers appeared in the 1940's and were programmed
in machine language by sequences of O's and l's that explicitly told the
computer what operations to execute and in what order. The operations
themselves were very low level: move data from one location to another, add
the contents of two registers, compare two values, and so on. And once
written, the programs were hard to understand and modify.
MOVE TO HIGHER-LEVEL LANGUAGES &
CLASSIFICATION
• The first step towards more people-friendly programming languages was the development of
mnemonic assembly languages in the early 1950’s.
• A major step towards higher-level languages was made in the latter half of the 1950's with the
development of Fortran for scientific computation, Cobol for business data processing, and Lisp for
symbolic computation.
• Today, there are thousands of programming languages. They can be classified in a variety of
ways. One classification is by generation.
• First-generation languages are the machine languages
• Second-generation the assembly languages
• Third-generation the higher-level languages like Fortran, Cobol, Lisp, C, C++ , C# , and Java.
• Fourth-generation languages are languages designed for specific applications like NOMAD for
report generation, SQL for database queries, and Postscript for text formatting. The term fifth-
generation language has been applied to logic- and constraint-based languages like Prolog and
OPS5.
CLASSIFICATION OF LANGUAGES