0% found this document useful (0 votes)
61 views86 pages

01 Introduction To Compiler Construction

Uploaded by

iasifiqbal0285
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views86 pages

01 Introduction To Compiler Construction

Uploaded by

iasifiqbal0285
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 86

Compiler Construction

Faryal Shamsi
Lecturer Computer Science

Sukkur IBA University


About Me
• Qualification
• MS (Computer Science) – 2019 from Sukkur IBA University
• BS (Computer Science) – 2005 from IBA-Sukkur (Affiliated to IBA Karachi)
• Field of Interest – Multimedia Data (Video) Mining

• Experience
• 16 years, Alhamdulillah!
• Joined Sukkur IBA University 4 years ago
• Previously –
• Lecturer at Sistech Sukkur - February 2008 to January 2020
• Visiting Faculty Member Sukkur IBA University in STHP 2018
• Visiting Faculty Member DIHE in 2017
About Us
• Today is our day!

● Let’s share –
○ Our experience(s) in previous courses?
○ Any other Suggestions/Regrets?
○ Which is your favorite programing language? Why?
○ Have you developed some exciting projects?
○ What you already know about Compiler Construction?
○ What you expect to learn in this course?
For Online 24/7 Support
• Course Dashboard: https://elearning.iba-suk.edu.pk/course/view.php?id=360
• Self Enrollments: GROUP NAME ENROLLEMENT KEY
Section B CCFall2024-B
Section C CCFall2024-C
Section D CCFall2024-D
Section E CCFall2024-E
Section F CCFall2024-F

• Email: [email protected]

• Some online lectures created during COVID lockdown are still available on my
YouTube Channel: https://www.youtube.com/channel/UCQ1D9g88cCWJwEj8VGT6KoA
My Availability & Support
• 1 ¼ Hours Class Meeting:
• Twice a week (as per Time Table)

• One-to-One Counseling:
• COSULATION HOURS:
• Wednesdays: 9:00am to 1:00pm
• Consultation Hours fluctuate with Time Table

• Therefore, come with prior appointment ([email protected]) to avoid


inconvenience

• Office Location: (AB – 1, Room – 4, Cubical – 8)


Study Resources

• Text Book

1. Aho, A. V., Sethi, R., & Ullman, J. D. (2007). Compilers: principles,


techniques, and tools (Vol. 2). Reading: Addison-wesley. – the
Dragon Book
2. Lewis, P. M., Stearns, R. E., & Rosenkrantz, D. J. (1976). Compiler
design theory.
Study Resources

• Reference Book:

1. Cooper, K., & Torczon, L. (2011). Engineering a compiler. Elsevier.


2. Louden, K. C. (1997). Compiler construction. Cengage Learning.
3. Wirth, N., Wirth, N., Wirth, N., Informaticien, S., & Wirth, N. (1996).
Compiler construction (Vol. 1). Reading: Addison-Wesley.
Program Learning Outcomes (PLOs)
with respect to COMPILER CONSTRUTION

GA1 Computing Knowledge: An ability to apply knowledge of mathematics, science,


computing fundamentals and computing specialization to the solution of complex computing problems.

GA2 Problem Analysis: An ability to identify, formulate, research literature, and analyze
complex computing problems reaching substantiated conclusions using first principles of mathematics,
natural sciences and computing sciences.

GA3 Design/Development of Solutions: An ability to design solutions for complex


computing problems and design systems, components or processes that meet specified needs with
appropriate consideration for public health and safety, cultural, societal, and environmental
considerations.
Course Learning Outcomes

1. To understand how compilers translate source code to machine


executable.
2. Identify tokens of a typical high-level programming language
3. To comprehend how to perform parsing.
4. To apply parsing algorithms for syntax analysis.
5. To understand how compilers generate machine code.
6. To be familiar with techniques for simple code optimizations.
Assessment Strategy – Please
Note!!!
Tool Name Weightage Criteria

Mid Examination 30 Points Correctness of Solutions

Final Examination 50 Points Correctness of Solutions

Project Presentation/
7 (4 + 3) Points Submission Punctuality, Class Participation
Demonstration

Assignments 13 Points Submission Punctuality and Ethics


Should we start?
Introduction to
Compilers
Lecture 1

Faryal Shamsi
Lecturer Computer Science

Sukkur IBA University


What are compilers?
What are compilers?
• A translator or a language processor

• Compiler

• Interpreter

• Assembler
Compiler vs Interpreter

Compiler Interpreter
Compiler vs Interpreter

Compiler Interpreter

• A language rewriter program • An execution program that


that translates the form of directly runs instructions, without
expressions without a change of requiring them previously to have
meaning. been compiled into a machine
language program.

• HTML, Lisp, Pearl


• C, C++
• How Java works?

• What about Python?

• What are these?


• Bytecode?
• Virtual Machine?
Hybrid Complier
Intermediate Code – Bytecode
Java Python
• Statically typed, compiled • Dynamically typed, (Scripting)
language non-compiled language
Types of a Compiler

• Single-pass compilers
• 2-pass compilers
• Multi-pass compilers
The Cross Compilers

• A cross compiler is one that can run on a computer’s operating system


that is different from the operating system that the program ordinarily
uses. It breaks down binary codes, understands them and allows
computer programmers to gain access to the codes.

• For example, a compiler that runs on a Windows 7 PC but generates


code that runs on Android smartphone is a cross compiler.
Parallelizing Compiler

• A parallelizing compiler is typically a compiler that finds parallelism in


a sequential program and generates appropriate code for
a parallel computer.

• More recent parallelizing compilers accept explicitly parallel language


constructs, such as array assignments or parallel loops.

• Applications: Big Data and Multithreading


Compiler Traditional Application
• Implementation of High-Level Programming Languages

• Optimizations for Computer Architectures


• Parallelism, parallel (independent) or serial(dependent) instruction be executed
• memory hierarchy (register, cache, RAM,HDD)

• Design of New Computer Architectures


• CISC -Complex Instruction set computer, used in x86 machines
• Uses complex addressing to store data structures using registers and stack.
• RISC-Reduced Instruction set computer
• PowerPC, SPARC , MIPS, Alpha, and PA-RIS C, are based on the RISC concept
• Specialized Architectures (embedded machines)
Compiler Also……
• Natural Language Processing
• Database Query Interpreters

• Compiled Simulation
• Instead of writing a simulator that interprets the design, it is faster to compile the
design to produce machine code that simulates that particular design natively.

• Hardware Synthesis:
• Hardware designs are typically described at the register transfer level (RTL) ,
where variables represent registers and expressions represent combinational logic.
• Hardware-synthesis tools translate RTL descriptions automatically into gates , which
are then mapped to transistors and eventually to a physical layout .
Programming Language Basics
Programming Language Basics
• Environment and states (Figure 1.8)

• Environment is mapping from names to location


• State is mapping from location value
Programming Language Basics
Intermediate Code – Bytecode
Java Python
• Statically typed, compiled • Dynamically typed, (Scripting)
language non-compiled language
Programming Language Basics
Programming Language Basics
Scopes and Declarations
Scopes and Declarations
Scopes and Declarations
Call by Value
• In call-by-value, the actual parameter is evaluated (if it is an
expression) or copied (if it is a variable).
• The value is placed in the location belonging to the corresponding
formal parameter of the called procedure.
• This method is used in C and Java, and is a common option in C++, as
well as in most other languages.
• Call-by-value has the effect that all computation involving the formal
parameters done by the called procedure is local to that procedure,
and the actual parameters themselves cannot be changed.
Call by Reference
• In call- by-reference, the address of the actual parameter is passed to
the callee as the value of the corresponding formal parameter.
• Uses of the formal parameter in the code of the callee are
implemented by following this pointer to the location indicated by the
caller. Changes to the formal parameter thus appear as changes to the
actual parameter.
• If the actual parameter is an expression, however, then the expression
is evaluated before the call, and its value stored in a location of its
own. Changes to the formal parameter change this location, but can
have no effect on the data of the caller.
Call by Name
• This was used in the early programming language Algol 60.
• It requires that the callee execute as if the actual parameter were
substituted literally for the formal parameter in the code of the callee,
as if the formal parameter were a macro standing for the actual
parameter (with renaming of local names in the called procedure, to
keep them distinct).
• When the actual parameter is an expression rather than a variable,
some unintuitive behaviors occur, which is one reason this
mechanism is not favored today.
Aliasing
• There is an interesting consequence of call-by-reference parameter
passing or its simulation, as in Java, where references to objects are
passed by value.
• It is possible that two formal parameters can refer to the same
location; such variables are said to be aliases of one another.
• As a result, any two variables, which may appear to take their values
from two distinct formal parameters, can become aliases of each
other, as well.
Aliasing – Example 1.9
• Suppose a is an array belonging to a procedure p, and p calls another
procedure q(x, y) with a call q(a, a).

• Suppose also that parameters are passed by value, but that array
names are really references to the location where the array is stored,
as in C or similar languages.

• Now, x and y have become aliases of each other. The important point is
that if within q there is an assignment x [10] = 2, then the value of
y[10] also becomes 2.
Phases of Compilation
Lecture 1

Faryal Shamsi
Lecturer Computer Science

Sukkur IBA University


Language Processing System
Pre-processor
Pre-processor
• A program that processes the source code before it is compiled by the
compiler.

• #define: This statement is used to define a macro, which is essentially


a symbolic name for a value or a piece of code. For example:
• #define PI 3.1415

• #include: This statement is used to include a header file in the source


code. For example:
• #include <stdio.h>
Pre-processor
• The preprocessor permits conditionally compiling portions of code
based on certain conditions using directives like #ifdef, #ifndef, #if,
#else, and #endif.
• #ifdef ENABLE_LOGGING
• System.out.println("Logging is enabled");
• #endif
• To enable or disable the preprocessing, set the preprocess.enabled
property in your Maven project configuration.
• <properties>
• <preprocess.enabled>true</preprocess.enabled>
• </properties>
Scopes and Declarations
Java Pre-processor
• There is no preprocessor directive in Java. Unlike languages like C and
C++, Java does not have a preprocessor that can modify the source
code before it is compiled.

• Java instead relies on the runtime environment to handle things like


platform-specific code inclusion and conditional compilation.
Java Pre-processor
• Add the maven-antrun-plugin
plugin to your Maven project's
pom.xml file.

• This plugin allows you to


execute Ant tasks within your
Maven build.
Linker / Loader
Linker / Loader

• Linker is a computer program that links and merges various object


files together in order to make an executable file. All these files might
have been compiled by separate assemblers. The major task of a
linker is to search and locate referenced module/routines in a
program and to determine the memory location where these codes
will be loaded, making the program instruction to have absolute
references.
Linker / Loader

• Loader is a part of operating system and is responsible for loading


executable files into memory and execute them. It calculates the size
of a program (instructions and data) and creates memory space for it.
It initializes various registers to initiate execution.
Phases of Compiler

• Analysis / Front-end

• Synthesis / Back-end

• Machine Independent Optimization (optional)


Frontend – Analysis
• If the analysis part detects that the source program is either
syntactically ill formed or semantically unsound, then it must provide
informative messages, so the user can take corrective action.

• The analysis part also collects information about the source program
and stores it in a data structure called a symbol table which is passed
along with the intermediate representation to the synthesis part .
Backend – Synthesis

• The synthesis part constructs the desired target program from the
intermediate representation and the information in the symbol table.

• The symbol table, which stores information about the entire source
program, is used by all phases of the compiler.
Mid-end or Optimization

• Some compilers have a machine-independent optimization phase


between the front end and the back end

• The purpose of this optimization phase is to perform transformations


on the intermediate representation, so that the back-end can produce
a better target program than an un-optimized intermediate
representation
The Structure of a
Compiler

Symbol Table
• Modern compilers contain two (large)
parts, each of which is often subdivided.

• These two parts are the –


• front-end(analyzer), and the
• back-end (synthesizer)
• The analysis part breaks up the source
program into –
• constituent pieces and
• Imposes a grammatical structure on them

• It then uses this structure to create an


intermediate representation (IR) of the
source program.
Lexical Analysis
• For example,
position = initial + rate * 60

• Lexemes (position,=,initial,+,rate,*,60) are mapped into following tokens –

• Position is a lexeme that would be mapped into a token <id , 1> where id is an abstract
symbol standing for identifier and 1 points to the symbol table entry for position.
• The assignment symbol = is a lexeme that is mapped into the token <=>. It does not
need attribute value to store name and type.
• Blanks separating the lexemes would be discarded by the lexical analyzer.
Lexical Analysis
• The symbol-table entry for an identifier holds information about the
identifier, such as its name and type.
1. Lexical Analysis/ Scanning
1. Lexical Analysis/ Scanning
• Reads the stream of characters making up
the source program and groups the characters
into meaningful sequences called lexemes.
• For each lexeme, the lexical analyzer produces as output a token of
the form
<token-name , attribute-value>
• token-name is an abstract symbol that is used during syntax
analysis,
• attribute-value points to an entry in the symbol table for this token.
2. Syntax Analysis or Parsing
• The syntax Analyzer (parser) uses the first components of the
tokens produced by the lexical analyzer to create a tree-like
intermediate representation(IR) that depicts the grammatical
structure of the token stream.

• A typical representation is a syntax tree in which each interior


node represents an operation and the children of the node
represent the arguments of the operation.
Input:
for ( ; expr ; expr )
Parser / Parsing???

• Context-free grammars will be used to specify the grammatical


structure of programming languages and discuss algorithms for
constructing efficient syntax analyzers/parser automatically from
certain classes of grammars.
Parsing id+id*id E
E+E |E*E | id
E E

E E E E
+ *

E E E E id
id * +

id id id id
3. Semantic Analyzer
• An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
• The compiler needs semantic information, e.g., the types (integer,
real, pointer to array of integers, etc.) of the objects involved.
• The language specification may permit some type conversions
called coercions.
• Suppose that position, initial, and rate have been declared to be
floating-point numbers, and that the lexeme 60 by itself forms an
integer. Here integer may be converted into floating-point.
3. Semantic Analysis
4. Intermediate Code Generation
• After syntax and semantic analysis of the source program, many
compilers generate an explicit low-level or machine-like
intermediate representation,

• This intermediate representation (IR) should have two important


properties:
• it should be easy to produce and
• it should be easy to translate into the target machine.
4. Intermediate Code Generation
The 3-address Code
• we consider an intermediate form called three-address code,
• which consists of a sequence of assembly-like instructions with
three operands per instruction.
• Each operand can act like a register .
The 3-address Code
• There are several points worth noting about three-address
instructions.
• First , each three-address assignment instruction has at most one
operator on the right side. Thus, these instructions fix the order in
which operations are to be done; the multiplication precedes the
addition in the source program.
• Second, the compiler must generate a temporary name to hold the
value computed by a three-address instruction.
• Third, some "three-address instructions" like the first and last in
the sequence above, have fewer than three operands.
5. Code Optimization

• The machine-independent code optimization phase attempts to


improve the intermediate code so that better target code will result .

• Usually better code means faster, shorter, or target code that


consumes less power
5. Code Optimization
6. Code Generation

• The code generator takes as input an intermediate representation of


the source program and maps it into the target language.
• If the target language is machine code, registers or memory
locations are selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of
machine instructions that perform the same task.
• The F in each instruction tells us that it deals with floating-point
numbers
• The # signifies that 60.0 is to be treated as an immediate constant.
Types of Compiler
Lecture 2

Faryal Shamsi
Lecturer Computer Science

Sukkur IBA University


Types of a Compiler

• Single-pass compilers
• 2-pass compilers
• Multi-pass compilers
Single Pass Compiler
• If we combine or group all the phases of compiler design in
a single module known as single pass compiler.
Single Pass Compiler
• A one pass/single pass compiler is that type of compiler that passes
through the part of each compilation unit exactly once.
• Single pass compiler is faster and smaller than the multi pass compiler.
• As a disadvantage of single pass compiler is that it is less efficient in
comparison with multi-pass compiler.

• Single pass compiler almost never done, early Pascal compiler did this as
an introduction.
• We can not optimize very well due to the context of expressions are limited, as
we can’t backup and process, it again so grammar should be limited or
simplified.
2-pass or Multi-pass Compiler

• A Two pass/multi-pass Compiler is a type of compiler that processes


the source code or abstract syntax tree of a program more than
once. In multi-pass Compiler we divide phases in two pass as:
First Pass
• (a). Front end
• (b). Analytic part
• (c). Platform independent
Second Pass
• (a). Back end
• (b). Synthesis Part
• (c). Platform Dependent
Multi-pass Benefit – 1
Multi-pass Benefit – 2
An ideal Multi-pass Compiler
Assignment 1

You might also like