0% found this document useful (0 votes)
20 views36 pages

Lec#1

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 36

COMPILER CONSTRUCTION

CHAPTER# 1
INSTRUCTOR: MR. FAIZ RASOOL
EMAIL: [email protected]
ABOUT COURSE AND INSTRUCTOR

Course: Compiler Construction

Instructor: Faiz Rasool

Designation : Lecturer

Office Location: CS-Staff Room 2nd Floor old Building room 14

Visiting Hours: Wednesday & Friday (9:00am – 11:00am)


TODAY’S LECTURE
➢In this lesson we will try to understand overall working of the compiler in terms of
their various phases and interaction.
➢In particular we will understand
➢ What is compiler
➢ Two main parts of compiler
➢How it works
➢Programs that helps compiler
➢Phases of compiler Through a simple language and its processing based on the
grammar and its lexical specification.
WHAT IS PROGRAM?

• Programming languages are notations for describing computations to people


and to machines. The world as we know it depends on programming
languages, because all the software running on all the computers was written
in some programming language. But, before a program can be run, it first
must be translated into a form in which it can be executed by a computer.
WHAT IS COMPILER?

• The software systems that do this translation are called compilers.


LANGUAGE PROCESSORS

• A compiler is a program that can read a program in one language — the


source language — and translate it into an equivalent program in another
language — the target language.
• An important role of the compiler is to report any errors in the source program
that it detects during the translation process.
A COMPILER

Source Program Compiler Target Program

➢If the target program is an executable machine-language program, it can then


be called by the user to process inputs and produce outputs
LANGUAGE PROCESSORS
➢If the target program is an executable machine-language program, it can then
be called by the user to process inputs and produce outputs.
RUNNING THE TARGET PROGRAM

If the target program is an executable machine-language program, it can then be


called by the user to process inputs and produce outputs

Input Target Program Output


INTERPRETER
• An interpreter is another common kind of language processor. Instead of producing a target
program as a translation, an interpreter appears to directly execute the operations specified in
the source program on inputs supplied by the user

Source program
Interpreter Output
Input

The machine-language target program produced by a compiler is usually much faster than
an interpreter at mapping inputs to outputs . An interpreter, however, can usually give better
error diagnostics than a compiler, because it executes the source program statement by
statement.
JAVA PROCESSORS
Example: Java language processors combine compilation and interpretation.
• A Java source program may first be compiled into an intermediate form called bytecodes.
• The bytecodes are then interpreted by a virtual machine. A benefit of this arrangement is that
bytecodes compiled on one machine can be interpreted on another machine, perhaps across a
network.
• In order to achieve faster processing of inputs to outputs, some Java compilers, called just-in-time
compilers, translate the bytecodes into machine language immediately before they run the
intermediate program to process the input.
A HYBRID COMPILER OF JAVA

Source Program

Translator

Intermediate Program
Virtual Machine Output
Input
THE STRUCTURE OF A COMPILER
• Till now point we have treated a compiler as a single box that maps a source program into a
semantically equivalent target program. If we open up this box a little, we see that there are two
parts to this mapping:
• Analysis
• Synthesis
• The analysis part breaks up the source program into constituent pieces and imposes a grammatical
structure on them. It then uses this structure to create an intermediate representation of the source program.
• If the analysis part detects that the source program is either syntactically ill formed or semantically unsound,
then it must provide informative messages, so the user can take corrective action.
• The analysis part also collects information about the source program and stores it in a data structure called a
Symbol table, which is passed along with the intermediate representation to the synthesis part
THE STRUCTURE OF A COMPILER

• The Synthesis part constructs the desired target program from the intermediate
representation and the information in the symbol table.
• The analysis part is often called the front end of the compiler; the synthesis part is the back
end.
PHASES OF COMPILER
• If we examine the compilation process in more detail, we see that it operates as a
sequence of phases, each of which transforms one representation of the source
program to another. A typical decomposition of a compiler into phases is shown in
PHASES OF COMPILER
PHASES OF COMPILER
• In practice, several phases may be grouped together, and the intermediate
representations between the grouped phases need not be constructed
explicitly. The symbol table, which stores information about the entire source
program, is used by all phases of the compiler
• Some compilers have a machine-independent optimization phase between
the front end and the back end.
• The purpose of this optimization phase is to perform transformations on the
intermediate representation, so that the back end can produce a better target
program than it would have otherwise produced from an unoptimized
intermediate representation. Since optimization is optional, one or the other
of the two optimization phases shown in Fig. 1.6 may be missing.
LEXICAL ANALYSIS
• The first phase of a compiler is called lexical analysis or scanning. The lexical
analyzer reads the stream of characters making up the source program and
groups the characters into meaningful sequences called lexemes. For each
lexeme, the lexical analyzer produces as output a token of the form
(token-name, attribute-value)
• that it passes on to the subsequent phase, syntax analysis. In the token, the
first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the
symbol table for this token. Information from the symbol-table entry Is needed
for semantic analysis and code generation.
LEXICAL ANALYSIS
For example, suppose a source program contains the assignment statement
position = initial + rat e * 60
• The characters in this assignment could be grouped into the following lexemes and mapped
into the following tokens passed on to the syntax analyzer:
• Position is a lexeme that would be mapped into a token (id, 1), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The symbol-table
entry for an identifier holds information about the identifier, such as its name and type.
• The assignment symbol = is a lexeme that is mapped into the token (=). Since this token
needs no attribute-value, we have omitted the second component. We could have used any
abstract symbol such as assign for the token-name, but for notational convenience we have
chosen to use the lexeme itself as the name of the abstract symbol.
• Initial is a lexeme that is mapped into the token (id, 2), where 2 points to the symbol-table
entry for initial.
LEXICAL ANALYSIS
• + is a lexeme that is mapped into the token (+).
• Rate is a lexeme that is mapped into the token (id, 3), where 3 points to the symbol-table
entry for rate.
• * is a lexeme that is mapped into the token (*).
• 60 is a lexeme that is mapped into the token (60). 1 Blanks separating the lexemes would
be discarded by the lexical analyzer
• After lexical analysis as the sequence of tokens (id,l) <=) (id, 2) (+) (id, 3) (*) (60)
• In this representation, the token names =, +, and * are abstract symbols for the
assignment, addition, and multiplication operators, respectively. 1
• Technically speaking, for the lexeme 60 we should make up a token like (number, 4),
where 4 points to the symbol table for the internal representation of integer 60
• Page 30
SYNTAX ANALYSIS
• Second phase of the compiler is syntax analysis or parsing. The parser uses the first
components of the tokens produced by the lexical analyzer to create a tree-like
intermediate representation that depicts the grammatical structure of the token
stream.
• A typical representation is a syntax tree in which each interior node represents an
operation and the children of the node represent the arguments of the operation.
• Page 31
SEMANTIC ANALYSIS
• The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
• It also gathers type information and saves it in either the syntax tree or the symbol
table, for subsequent use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the compiler checks
that each operator has matching operands
• For example, many programming language definitions require an array index to be
an integer; the compiler must report an error if a floating-point number is used to
index an array.
• The language specification may permit some type conversions called coercions
INTERMEDIATE CODE GENERATION

• process of translating a source program into target code


• a compiler may construct one or more intermediate representations, which can
have a variety of forms
• Syntax trees are a form of intermediate representation; they are commonly
used during syntax and semantic analysis.
CODE OPTIMIZATION

• The machine-independent code-optimization phase attempts to improve the


intermediate code so that better target code will result.
• Better means faster, but other objectives may be desired, such as shorter code,
or target code that consumes less power.
• For example, a straightforward algorithm generates the intermediate code,
using an instruction for each operator in the tree representation that comes
from the semantic analyzer
CODE GENERATION
• The code generator takes as input an intermediate representation of the source
program and maps it into the target language.
• If the target language is machine code, registers Or memory locations are
selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of machine
instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of registers to
hold variables.
SYMBOL-TABLE MANAGEMENT

• The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.
• The data structure should be designed to allow the compiler to find the record
for each name quickly and to store or retrieve data from that record quickly.
GROUPING OF PHASES INTO PASSES

• In an implementation, activities from several phases may be grouped together


into a pass that reads an input file and writes an output file. For example, the
front-end phases of lexical analysis, syntax analysis, semantic analysis, and
intermediate code generation might be grouped together into one pass. Code
optimization might be an optional pass. Then there could be a back-end pass
consisting of code generation for a particular target machine.
COMPILER-CONSTRUCTION TOOLS
• The compiler writer, like any software developer, can profitably use modern
software development environments containing tools such as language editors,
debuggers, version managers, profilers, test harnesses, and so on. In addition to
these general software-development tools, other more specialized tools have
been created to help implement various phases of a compiler. These tools use
specialized languages for specifying and implementing specific components,
and many use quite sophisticated algorithms. The most successful tools are those
that hide the details of the generation algorithm and produce components that
can be easily integrated into the remainder of the compiler. Some commonly
used compiler-construction tools include
COMPILER-CONSTRUCTION TOOLS
• Parser generators that automatically produce syntax analyzers from a grammatical description of a
programming language.
• Scanner generators that produce lexical analyzers from a regular-expression description of the
tokens of a language.
• Syntax-directed translation engines that produce collections of routines for walking a parse tree
and generating intermediate code.
• Code-generator generators that produce a code generator from a collection of rules for translating
each operation of the intermediate language into the machine language for a target machine.
• Data-flow analysis engines that facilitate the gathering of information about how values are
transmitted from one part of a program to each other part. Data-flow analysis is a key part of
code optimization.
• Compiler-construction toolkits that provide an integrated set of routines for constructing various
phases of a compiler.
THE EVOLUTION OF PROGRAMMING LANGUAGES

• The first electronic computers appeared in the 1940's and were programmed
in machine language by sequences of O's and l's that explicitly told the
computer what operations to execute and in what order. The operations
themselves were very low level: move data from one location to another, add
the contents of two registers, compare two values, and so on. And once
written, the programs were hard to understand and modify.
MOVE TO HIGHER-LEVEL LANGUAGES &
CLASSIFICATION
• The first step towards more people-friendly programming languages was the development of
mnemonic assembly languages in the early 1950’s.
• A major step towards higher-level languages was made in the latter half of the 1950's with the
development of Fortran for scientific computation, Cobol for business data processing, and Lisp for
symbolic computation.
• Today, there are thousands of programming languages. They can be classified in a variety of
ways. One classification is by generation.
• First-generation languages are the machine languages
• Second-generation the assembly languages
• Third-generation the higher-level languages like Fortran, Cobol, Lisp, C, C++ , C# , and Java.
• Fourth-generation languages are languages designed for specific applications like NOMAD for
report generation, SQL for database queries, and Postscript for text formatting. The term fifth-
generation language has been applied to logic- and constraint-based languages like Prolog and
OPS5.
CLASSIFICATION OF LANGUAGES

• Another classification of languages uses the term imperative for languages in


which a program specifies how a computation is to be done. Languages such as C,
C++ , C# , and Java are imperative languages. In imperative languages there is
a notion of program state and statements that change the state.
• Declarative for languages in which a program specifies what computation is to be
done. Functional languages such as ML and Haskell and constraint logic languages
such as Prolog are often considered to be declarative languages.
VON NEUMANN LANGUAGE

• The term von Neumann language is applied to programming languages whose


computational model is based on the von Neumann computer architecture.
Many of today's languages, such as Fortran and C are von Neumann
languages.
OBJECT-ORIENTED LANGUAGE

• An object-oriented language is one that supports object-oriented


programming, a programming style in which a program consists of a collection
of objects that interact with one another. Simula 67 and Smalltalk are the
earliest major object-oriented languages. Languages such as C++ , C# ,
Java, and Ruby are more recent object-oriented languages.
SCRIPTING LANGUAGES

• Scripting languages are interpreted languages with high-level operators


designed for "gluing together" computations. These computations were
originally called "scripts." Awk, JavaScript, Perl, PHP, Python, Ruby, and Tel
are popular examples of scripting languages. Programs written in scripting
languages are often much shorter than equivalent programs written in
languages like C.
Thank You

You might also like