0% found this document useful (0 votes)
22 views27 pages

Week 1-2

Slides of Compiler Construction chapter 1-2

Uploaded by

Malik Zohaib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views27 pages

Week 1-2

Slides of Compiler Construction chapter 1-2

Uploaded by

Malik Zohaib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

MODULE # 1: INTRODUCTION TO COMPILER

AND INTERPRETER
INSTRUCTOR: DR. SAKEENA JAVAID
OUTLINE OF THE LECTURE
 Why Compilers?
 Overview of high-level languages and translation
 Levels of programming languages
 Need for high-level languages
 Advantages of high-level languages
 Language translation: Compilation and Interpretation
 Architecture of a compiler
 Phases of the compilation process
 Language definition
 Syntax and semantic Specification
 Using BNF notation to define syntax of a language
WHY COMPILERS?

 For the translation of the source code to the machine code


 Source is mostly high level language code and machine code is low level language (object code)

 It is fairly a complex program which can take 10,000 to ten lac lines of code
 With the advent of stored-program computers concept by John Von Neuman in late
1940’s, it became necessary to perform computation at the desired level
 Initially instructions and memory locations were written in machine language which is
vey tedious task
 Later on assembly language is used where instructions and memory locations are
written using this in symbolic forms
 An assembler is used for the translation of the symbolic codes in the numeric machine codes
WHY COMPILERS? CONT’D
 It has improved the speed and accuracy, however it still has few defects
 Still not easy to read and understand,
 Machine dependant code

 Development of FORTRAN and its compiler by IBM (by john Backus between 1954 to 1957)
 With the success of this project, although issues are smoothly resolved however not all processes
involved in translating programming languages are completely understood
 At the same time, when the first compiler was under development, John Chomsky began its study
regarding the structure of natural languages.
 His findings made the compilers construction easy and capable of partial automation
WHY COMPILERS? CONT’D
 Chomsky’s study led to the classification of languages according
 to the complexity of grammars (the rules specifying their structures)
 and the power of the algorithms to recognize them

 Chomsky’s hierarchy is now comprised on four levels of grammars: type 0, type 1, type 2 and
type 3
 Each of which is the specialization of its predecessor
 Type 2 or context free grammars are most useful for programming languages (are considered as the standard
way to represent the structure of the programming language)
 Study of the parsing problem (determination of the efficient algorithms for recognizing the context free
languages) was pursued during the 1960s to 1970s
 Considered as the fairly complete solution of the compilers and becomes the standard part in compiler theory
WHY COMPILERS? CONT’D
 Closely related to the context free grammars are type 3 grammar: finite automata and regular
expressions
 Used for expressing the structure of words or tokens of a programming language

 Programs related to the Compilers


 Interpreters
 Assemblers
 Linkers
 Loaders
 Pre-processors
 Editors
 Debuggers
 Profilers, project managers, etc.
OVERVIEW OF HIGH-LEVEL LANGUAGES AND TRANSLATION
 High level programming languages are more developed than low-level languages so are
closer to human spoken language
 Examples of high level languages: C#, Visual Basic, C, C++, JavaScript, Objective C,
BASIC and Pascal
 High-level programming languages are easier for humans to write, read and maintain
 They support a wide range of data types
 They allow the programmer to think about how to solve the problem and not how to
communicate with the computer. This is called abstraction
OVERVIEW OF HIGH-LEVEL LANGUAGES AND TRANSLATION

 Converting to Machine Code


 Translators:
 Just like low-level languages, high-level languages must be converted to machine code before a computer can
understand and run them
 This is done using a ‘translator’
 Different translators convert the same high level code into machine code for different computers
 High level code ready to be translated into machine code is called ‘source code’

 There are two different types of translator: Compilers and Interpreters


OVERVIEW OF HIGH-LEVEL LANGUAGES AND TRANSLATION

 Compilers:
 Compilers convert (or ‘compile’) the source code to machine code all at once
 This is then stored as an executable file which the computer can run (for example,
something ending with the ‘.exe’ file extension)
 Errors in the source code can be spotted as the program is compiling and reported
to the programmer
OVERVIEW OF HIGH-LEVEL LANGUAGES AND TRANSLATION
 Interpreters
 Interpreters convert the code as it is running
 They take a line of source code at a time and convert it to machine code (which the computer
runs straight away)
 This is repeated until the end of the program
 No executable file is created
 If the interpreter comes across an error in the source code the only things it can do is to report
the error to the person trying to use the program (or it may just refuse to continue running)
OVERVIEW OF HIGH-LEVEL LANGUAGES AND TRANSLATION
 Understanding of the need for both high-level and low-level languages
 Computers don’t understand high level languages because they only understand binary
(‘machine code’).
 Humans struggle to understand exactly what a program does when it is in binary only.
 High-level languages are more accessible to programmers.
 High-level languages will work on different types of computers.
 Low-level programming allows for hardware to be controlled directly
 Low-level programming will only work with the processor it is designed for (machine-
dependent)
OVERVIEW OF HIGH-LEVEL LANGUAGES AND TRANSLATION
 Need for compilers when translating programs written in a high-level language

 Compilers
 Translates the entire program from source (i.e. high-level language) to object code / machine code.
 Produces an executable file (i.e. In binary / machine code)

 Advantages
 Fast code is produced
 Source code remains hidden so cannot be modified by customer
 Compiled once only so doesn’t need a translator

 Disadvantages
 Compilers use a lot of computer resources: It has to be loaded in the computer’s memory at the same time as the source code
and there has to be sufficient memory to hold the object code
 Difficult to pin-point errors its source in the original program
OVERVIEW OF HIGH-LEVEL LANGUAGES AND TRANSLATION
 Understanding of the use of interpreters with high-level language programs
 Interpreters translate each instruction is taken in turn and translated to machine code. The instruction is then
executed before the next instruction is translated
 Advantages
 Error messages are output as soon as an error is encountered so easy to debug
 Useful for prototypes as program will run even when part of it has errors.
 Disadvantages
 Execution of a program is slow compared to that of a compiled program.
 Instructions inside a loop have to be translated each time the loop is entered

 Need for assemblers when translating programs written in assembly language


 Assemblers translate assembly language to machine code / binary / object code
 Programming languages can be divided into two different
LEVELS OF PROGRAMMING
LANGUAGES levels
 High-level Languages – Python, Visual Basic, Java, C, C++, SQL
and many more
 Low-level Languages – Hardware/Processor-specific assembly
languages and machine code
Differences between Low-level and High-level Languages
ARCHITECTURE OF A COMPILER

 A compiler can broadly be divided into two phases based on the way they compile

 The analysis phase of the compiler reads the source program,

 Divides it into core parts and then checks for lexical, grammar and syntax errors.

 The analysis phase generates an intermediate representation of the source program


and symbol table, which should be fed to the Synthesis phase as input

Synthesis Phase
Known as the back-end of the compiler, the synthesis phase generates the
target program with the help of intermediate source code representation
and symbol table.
A compiler can have many phases and passes.
Pass: A pass refers to the traversal of a compiler through the entire
program.
Phase: A phase of a compiler is a distinguishable stage, which takes input
from the previous stage, processes and yields output that can be used as
input for the next stage. A pass can have more than one phase
PHASES OF THE COMPILER

 The first phase of scanner works as a text scanner. This phase


scans the source code as a stream of characters and converts
it into meaningful lexemes.
 Lexical analyzer represents these lexemes in the form of tokens as:
<token-name, attribute-value>
 The next phase is called the syntax analysis or parsing. It
takes the token produced by lexical analysis as input and
generates a parse tree (or syntax tree).
 In this phase, token arrangements are checked against the source
code grammar, i.e., the parser checks if the expression made by the
tokens is syntactically correct.
PHASES OF THE COMPILER CONT’D
 Semantic analysis checks whether the parse tree constructed follows the rules of language
 For example: assignment of values is between compatible data types, and adding string to an
integer
 Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether
identifiers are declared before use or not etc.
 The semantic analyzer produces an annotated syntax tree as an output.
 After semantic analysis, the compiler generates an intermediate code of the source code for the target
machine.
 It represents a program for some abstract machine.
 It is in between the high-level language and the machine language.
 This intermediate code should be generated in such a way that it makes it easier to be translated into
the target machine code .
PHASES OF THE COMPILER CONT’D
 The next phase does code optimization of the intermediate code.
 Optimization can be assumed as something that removes unnecessary code lines, and arranges the
sequence of statements in order to speed up the program execution without wasting resources
(CPU, memory)
 In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language
 The code generator translates the intermediate code into a sequence of (generally) re-locatable
machine code.
 A data-structure maintained throughout all the phases of a compiler
 All the identifier's names along with their types are stored here
 The symbol table makes it easier for the compiler to quickly search the identifier record and
retrieve it
 The symbol table is also used for scope management
LANGUAGE DEFINITION

 What are the requirements (Aspects)?


 Lexical and syntactic structures of a programming language are usually specified in
formal terms
 i.e., using regular expressions and context-free grammars

 Semantics of a programming language, are commonly specified


 Using English (or other natural language) descriptions
 These descriptions (together with the formal lexical and syntactic structure) are usually
collected into a language reference manual, or language definition
LANGUAGE DEFINITION CONT’D

 Using a new language


 Language definition and a compiler are often developed simultaneously
 Techniques available to the compiler writer can have a major impact on the definition of the
language
 Similarly, the way in which a language is defined will have a major impact on the techniques
that are needed to construct the compiler
 A more common situation for the compiler writer is that the language being implemented is
well known and has an existing language definition
LANGUAGE DEFINITION CONT’D
 Language standards
 Sometimes the language definition has been raised to the level of a language standard
 Approved by one of the official standardization organizations, such as ANSI (American National Standards Institute) or ISO
(International Organization for Standardization)
 For example, FORTRAN, Pascal, and C have ANSI standards
 Ada has a standard approved by the U.S. government

 Applicability for the Compiler


 The compiler writer must interpret the language definition
 Implement a compiler that conforms to the language definition
LANGUAGE DEFINITION CONT’D
 Standard test programs (a test suite)
 This is often not an easy task
 It is made easier by the existence of a set of Standard test programs (a test suite) against which a
compiler can be tested (such a test suite exists for Ada)
 For example, the TINY example language used in the text has its lexical, syntactic, and semantic
structure specified
LANGUAGE DEFINITION CONT’D

 Denotational semantics
 Occasionally, a language will have its semantics given by a formal
definition in mathematical terms
 Several methods that are currently used do this, and no one method has
achieved the level of a standard
 Denotational semantics has become one of the more common methods,
especially in the functional programming community
 When a formal definition exists for a language, then it is (in theory)
possible to give a mathematical proof that a compiler conforms to the
definition
LANGUAGE DEFINITION CONT’D
 Runtime Environment
 One aspect of compiler construction that is particularly affected by the language definition:
 Particularly due to the structure and behavior of the runtime environment
 Structure of data allowed in a programming language, i.e., kinds of function calls and returned values
allowed, have a decisive effect on the complexity of the runtime system
 Three basic types of runtime environments in increasing order of complexity are as follows:
 FORTRAN77: With no pointers or dynamic allocation and no recursive function calls
 Allows a completely static runtime environment, where all memory allocation is done prior
to execution
 Makes the job of allocation particularly easy for the compiler writer, as no code needs to be
generated to maintain the environment
LANGUAGE DEFINITION CONT’D
 Pascal, C and other so-called Algol-like languages: allow a limited form of dynamic allocation
and recursive function calls
 Require a “semi- dynamic” or stack-based runtime environment with an additional dynamic, structure called a
heap
 Programmer can schedule dynamic allocation

 Functional and most object-oriented languages: such as LISP and Smalltalk, require a “fully
dynamic” environment
 In which all allocation is performed automatically via code generated by the compiler
 This is complicated because it requires that memory also be freed automatically
 Requires complex “garbage collection” algorithms
REFERRED TEXT BOOKS

1. Compiler Construction – Principles and Practice by Kenneth C. Louden, Course Technology, 1997,
ISBN 978-0534939724.

2. Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman,
Contributor Jeffrey D. Ullman, Addison-Wesley Pub. Co., 2nd edition, 2006 Original from the
University of Michigan.

3. Modern Compiler Design by Dick Grune, Henri E. Bal, Ceriel J. H. Jacobs, Koen G. Langendoen,
2003, John Wiley & Sons.
Thanks 

You might also like