Csc311 - Lecture Note
Csc311 - Lecture Note
Faculty of Science
Department of Computer Science
Lecture Materials
RECOMMENDED TEXBOOKS:
Page 2 of 139
FREE TEXTBOOKS ON THE INTERNET
(1) A Compact Guide to Lex & Yacc
Post date: 24 Oct 2004
Explains how to construct a compiler using lex and yacc, the tools used to generate
lexical analyzers and parsers.
Publication date: 30 Nov -0001
(2) Basics of Compiler Design
Post date: 19 Apr 2007
Conveys the general picture of compiler design without going into extreme detail.
Gives the students an understanding of how compilers work and the ability to make
simple (but not simplistic) compilers for simple languages.
Publisher: University of Copenhagen
Publication date: 01 Jan 2010
Page 3 of 139
(6) Let's Build a Compiler
Post date: 24 Oct 2004
A fifteen-part tutorial series, written from 1988 to 1995, on the theory and practice
of developing language parsers and compilers from scratch.
Publication date: 31 Dec 1988
(7) Languages And Machines
Post date: 19 Oct 2006
Provides a view to the concept of a language as a system of strings of characters
strings obeying certain rules. Topics covered includes logic, meta languages,
proofs, finite state machine, Turing machine, encryption and coding.
Publication date: 01 Jul 2005
(8) Compiler Construction
Post date: 17 Sep 2006
A concise, practical guide to modern compiler design and construction by the
author of Pascal and Oberon. Readers are taken step-by-step through each stage of
compiler design, using the simple yet powerful method of recursive descent to
create a compiler.
Publisher: Addison-Wesley Pub Co
Publication date: 01 May 2017
Document Type: Book
Reference
https://www.amazon.com/Compiler-Design-Construction-Arthur-Pyster/dp/0442275366
https://www.amazon.com/Compiler-construction-Electrical-computer-engineering/dp/0442243944
https://www.amazon.com/Introduction-Compilers-Language-Design-Second/dp/B08BFWKRJH
https://www3.nd.edu/~dthain/compilerbook/
https://www.oreilly.com/library/view/compiler-construction/9789332524590/
Page 4 of 139
COURSE CONTENT
Introduction to Compiling: Overview of the compilation process, Source code and Target code
(Analysis of the source programme), Translators (language processors), Advantages and
Disadvantages, Cousins of the compiler and Types of compiler.The phases of a Compiler:
Compiler structure, the grouping of phases, Compiler-construction tools.A Simple One-Pass
Compiler:Anatomy of a compiler. Syntax definition, Syntax-directed translation, Parsing, A
translator for simple expressions, Lexical analysis, incorporating a symbol table, Abstract stack
machines, putting the techniques together.Lexical Analysis: The role of the lexical analyzer,
Input buffering, Specification of tokens, Recognition of tokens, A language for specifying lexical
analyzers, Finite automata, From a regular expression to an NFA, Design of a lexical analyzer
generator, Optimization of DFA-based pattern matchers.Syntax Analysis:The role of the parser,
Context-free grammars, Writing a grammar, Top-down parsing, Bottom up parsing, Operator-
precedence parsing, LR parsers, Using ambiguous grammars, Parser generators.Syntax-Directed
Translation:Syntax-directed definitions, Construction of syntax trees, Bottom-up evaluation of S-
attributed definitions, L-attributed definitions, Top-down translation, Bottom-up evaluation of
inherited attributes, Recursive evaluators, Space for attribute values at compile time, Assigning
space at compile time, Analysis of syntax-directed definitions.Type Checking:Type systems,
Specification of a simple type checker, Equivalence of type expressions, Type conversions,
Overloading of functions and operators, Polymorphic functions, An algorithm for
unification.Run-Time Environments:Source language issues, Storage organization, Storage-
allocation strategies, Access to nonlocal names, parameter passing, Symbol tables, Language
facilities for dynamic storage allocation, Dynamic storage allocation techniques, Storage
allocation in Fortran.Intermediate Code Generation:Intermediate languages,
Declarations, Assignment statements, Boolean expressions, Case statements, Back Patching,
Procedure calls.Code generation:Issues in the design of a code generator, The target machine,
Run-time storage management, Basic blocks and flow graphs, Next-use information, A Simple
code generator, Register allocation and assignment, The dag representation of basic blocks,
Peephole optimization, Generating code from dags, Dynamic programming code-generation
algorithm, Code-generator generators.Code Optimization:Introduction, The Principal sources of
optimization, Optimization of basic blocks, Loops in flow graphs, Introduction to global data-
flow analysis, Iterative solution of data-flow equations, Code improving transformations,
Dealing with aliases, Data-flow analysis of structured flow graphs, Efficient data-flow
algorithms, A tool for data-flow analysis, Estimation of types, Symbolic debugging of optimized
code.Advanced topics include garbage collection; dynamic data structures, pointer analysis,
aliasing; code scheduling, pipelining; dependence testing; loop level optimisation; superscalar
optimisation; profile-driven optimisation; debugging support; incremental parsing; type
inference; advanced parsing algorithms; practical attribute evaluation; function in-lining and
partial evaluation.
Page 5 of 139
CSC311: Compiler Construction 1 (3 Credit Units)
Lecture Note:
Weekly Synopsis
Week 1: Introduction to Compiling: Overview of the compilation process, Source
code and Target code (Analysis of the source programme), Translators
(language processors), Advantages and Disadvantages Cousins of the
compiler and Types of compilers. The phases of a Compiler: Compiler
structure, the grouping of phases, Compiler-construction tools. A Simple
One-Pass Compiler:Anatomy of a compiler.
Weeks 3 & 4: putting the techniques together. Lexical Analysis: The role of the lexical
analyzer, Input buffering, Specification of tokens, Recognition of tokens,
A language for specifying lexical analyzers, Finite automata, From a
regular expression to an NFA,
Week 8: Test
Page 6 of 139
Week 11: Access to nonlocal names, parameter passing, Symbol tables, Language
facilities for dynamic storage allocation, Dynamic storage allocation
techniques, Storage allocation in Fortran. Intermediate Code Generation:
Intermediate languages, Declarations, Assignment statements, Boolean
expressions, Case statements, Back Patching, Procedure calls. Code
generation: Issues in the design of a code generator, The target machine,
Run-time storage management, Basic blocks and flow graphs, Next-use
information.
Week 12: A Simple code generator, Register allocation and assignment, The dag
representation of basic blocks, Peephole optimization, Generating code
from dags, Dynamic programming code-generation algorithm, Code-
generator generators.Code Optimization: Introduction, The Principal
sources of optimization, Optimization of basic blocks, Loops in flow
graphs, Introduction to global data-flow analysis, Iterative solution of
data-flow equations, Code improving transformations, Dealing with
aliases, Data-flow analysis of structured flow graphs, Efficient data-flow
algorithms, A tool for data-flow analysis, Estimation of types,
Week 13: Symbolic debugging of optimized code. Advanced topics include garbage
collection; dynamic data structures, pointer analysis, aliasing; code
scheduling, pipelining; dependence testing; loop level optimisation;
superscalar optimisation; profile-driven optimisation; debugging support;
incremental parsing; type inference; advanced parsing algorithms;
practical attribute evaluation; function in-lining and partial evaluation.
Page 7 of 139
STUDENT COURSE MANUAL
Description: This course deals with the theory and practice of compiler design. Topics
emphasized are scanning, parsing and semantic analysis.
A compiler translates source code into object without tempering with the meaning of the source
code. The steps involved in translating a language are six namely: lexical, syntax, semantic,
intermediate representation, code optimizer and code generator. Each of this phase perform a
single task.
In computing, a compiler is a computer program that transforms source code written in a
programming language or computer language, into another computer language. The most
common reason for transforming source code is to create an executable program.
Compiler construction is a complex task. A good compiler combines ideas from formal language
theory, from the study of algorithms, from artificial intelligence, from systems design, from
computer architecture, and from the theory of programming languages and applies them to the
problem of translating a program.
Weekly Synopsis
Compiler Design and Construction (CDC)
Compiler Design
Computers are a balanced mix of software and hardware. Hardware is just a piece of mechanical
device and its functions are being controlled by a compatible software. Hardware understands
instructions in the form of electronic charge, which is the counterpart of binary language in
software programming. Binary language has only two alphabets, 0 and 1. To instruct, the
hardware codes must be written in binary format, which is simply a series of 1s and 0s. It would
be a difficult and cumbersome task for computer programmers to write such codes, which is why
we have compilers to write such codes.
Phases of a Compiler
There are two major phases of compilation, which in turn have many parts. Each of them takes
input from the output of the previous level and works in a coordinated way.
Phases of a compiler are the processes that source code goes through prior to being converted to
object code by a compiler. Each step fulfills a specific, particular function. The output of each
stage must be stored in a data structure called a symbol table, and an error handler must be
provided to keep track of errors. A compiler's phases are divided into six phases. These phases
can be divided into the following two (2) types. These phases can be divided into the following
two (2) groups, that is, analysis, and synthesis.
That is, a compiler can broadly be divided into two phases based on the way they compile.
1. Analysis Phase
Known as the front-end of the compiler, the analysis phase of the compiler reads the source
program, divides it into core parts and then checks for lexical, grammar and syntax errors. The
analysis phase generates an intermediate representation of the source program and symbol table,
which should be fed to the Synthesis phase as input.
Page 9 of 139
Analysis: The output of the analysis is used here to produce the desired machine-oriented
code. The source code is divided into meaning characters and creates an intermediate
representation. This part is broken into three (3) further sections as follows: (a) Lexical
Analysis, (b) Syntax Analysis, and (c) Semantic Analysis.
An intermediate representation is created from the given source code:
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
The lexical analyzer divides the program into “tokens”, the Syntax analyzer recognizes
“sentences” in the program using the syntax of the language and the Semantic analyzer checks
the static semantics of each construct. Intermediate Code Generator generates “abstract” code.
2. Synthesis Phase
Known as the back-end of the compiler, the synthesis phase generates the target program with
the help of intermediate source code representation and symbol table.
A compiler can have many phases and passes.
Pass: A pass refers to the traversal of a compiler through the entire program.
Phase: A phase of a compiler is a distinguishable stage, which takes input from
the previous stage, processes and yields output that can be used as input for the
next stage. A pass can have more than one phase.
Synthesis: This section is subdivided into three (3). (a). Intermediate code generation (b). code
optimization (c). code generator.
A compiler translates source code into object without tempering with the meaning of the source
code. The steps involved in translating a language are six namely: lexical, syntax, semantic,
intermediate representation, code optimizer and code generator. Each of these phases perform a
single task. The compilation process contains the sequence of various phases. Each phase takes
source program in one representation and produces output in another representation. Each phase
takes input from its previous stage.
An equivalent target program is created from the intermediate representation. It has two parts:
Code Optimizer
Code Generator
Page 10 of 139
Code Optimizer optimizes the abstract code, and the final Code Generator translates abstract
intermediate code into specific machine instructions.
1. Analysis Phase
Lexical analyzer phase is the first phase of compilation process. lexical analysis, also known as
scanning. It takes source code as input. It reads the source program one character at a time and
converts it into meaningful lexemes. That is, a series of tokens, such as keywords, identifiers,
and operators. Lexical analyzer represents these lexemes in the form of tokens. These tokens
are then passed on to the next stage of the compilation process.
Page 11 of 139
(ii) Syntax Analysis
Syntax analysis is the second phase of compilation process. Syntax analysis, also known as
parsing. It takes tokens as input and generates a parse tree as output. In syntax analysis phase, the
parser checks that the expression made by the tokens is syntactically correct or not. That is, in
this stage, the compiler checks the syntax of the source code to ensure that it conforms to the
rules of the programming language. The compiler builds a parse tree, which is a hierarchical
representation of the program’s structure, and uses it to check for syntax errors.
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Likewise, in this phase, the compiler checks the meaning of the
source code to ensure that it makes sense. Semantic analyzer keeps track of identifiers, their
types and expressions. The compiler performs type checking, which ensures that variables are
used correctly and that operations are performed on compatible data types. The compiler also
checks for other semantic errors, such as undeclared variables and incorrect function calls. The
output of semantic analysis phase is the annotated tree syntax. Examples of semantic errors are
data compatibility (data type), undeclared variable use and many more.
The fourth phase of compiler design is intermediate code generation also called code generation.
In this phase, the compiler translates the parse tree into machine code that can be executed by the
computer. The code generated by the compiler must be efficient and optimized for the target
platform. In this phase of the intermediate code generation, compiler generates the source code
into the intermediate code. Intermediate code is generated between the high-level language and
the machine language. The intermediate code should be generated in such a way that one can
easily translate it into the target machine code.
2. Synthesis Phase
The intermediate code generated in the previous stage has been optimized in this phase. The
structure of the tree that is generated by the parser can be rearranged to suit the needs of the
machine architecture to produce an object code that runs faster. The optimization is achieved by
removing unnecessary lines of codes. Code optimization is an optional phase. It is used to
improve the intermediate code so that the output of the program could run faster and take less
space. It removes the unnecessary lines of the code and arranges the sequence of statements in
order to speed up the program execution. In this stage, the compiler analyzes the generated code
and makes optimizations to improve its performance. The compiler may perform optimizations
such as constant folding, loop unrolling, and function inlining.
Code generation is the final phase of the compilation process. It takes the optimized intermediate
code as input and maps it to the target machine language. Code generator translates the
intermediate code into the machine code of the specified computer.
Page 12 of 139
Example:
Page 13 of 139
Operations of Compiler
These are some operations that are done by the compiler.
It breaks source programs into smaller parts.
It enables the creation of symbol tables and intermediate representations.
It helps in code compilation and error detection.
it saves all codes and variables.
It analyses the full program and translates it.
Convert source code to machine code.
REASONS/IMPORTANCE OF A COMPILER
(i) A compiler is a translator that converts the high-level language into the machine language.
(ii) High-level language is written by a developer and machine language can be understood by
the processor.
(iii) Compiler is used to show errors to the programmer.
(iv) The main purpose of compiler is to change the code written in one language without
changing the meaning of the program.
Page 14 of 139
(v) When one executes a program which is written in HLL programming language then it
executes into two parts.
(vi) In the first part, the source program compiled and translated into the object program (low
level language).
(vii) In the second part, object program translated into the target program through the assembler.
Overall, compiler design is a complex process that involves multiple stages and requires a deep
understanding of both the programming language and the target platform. A well-designed
compiler can greatly improve the efficiency and performance of software programs, making
them more useful and valuable for users.
Compiler
Cross Compiler that runs on a machine ‘A’ and produces a code for another machine
‘B’. It is capable of creating code for a platform other than the one on which the
compiler is running.
Source-to-source Compiler or transcompiler or transpiler is a compiler that
translates source code written in one programming language into the source code of
another programming language.
Page 15 of 139
TYPES OF COMPILERS
When all the phases of the compiler are present inside a single module, it is simply called
a single-pass compiler. It performs the work of converting source code to machine code.
Two-pass compiler is a compiler in which the program is translated twice, once from the front
end and the back from the back end known as Two Pass Compiler.
Multipass Compiler
When several intermediate codes are created in a program and a syntax tree is processed many
times, it is called Multi pass Compiler. It breaks codes into smaller programs.
Features of a compiler
a. Correctness
b. Speed of compilation
c. Preserve the correct meaning of the code
d. Compile-time proportion to program size
e. Good diagnostics for syntax errors
f. Good error reporting and handling
g. Work well with the debugging
QUIZ QUESTIONS
Page 16 of 139
There are three main kinds of programming language: Machine language. Assembly
language. High-level language
C++ can perform both low-level and high-level programming, and that's why it is essentially
considered a mid-level language. However, as its programming syntax also includes
comprehensible English, many also view C++ as another high-level language.
Python and C# are examples of high-level languages that are widely used in education and in the
workplace. A high-level language is one that is user-oriented in that it has been designed to make
it straightforward for a programmer to convert an algorithm into program code. A low-level
language is machine oriented.
HTML is not a programming language. It's a markup language. In fact, that is the technology's
name: HyperText Markup Language. That self-identified fact alone should settle the debate.
Answer
JavaScript. JavaScript is one of the world's most popular programming languages on the web.
Using JavaScript, you can build some of the most interactive websites.
Answer
JSON is a lightweight, text-based, language-independent data interchange format. It was derived
from the JavaScript/ECMAScript programming language, but is programming language
independent.
Questions 7: What is the hardest programming language?
Answer
Page 17 of 139
Malbolge is considered the hardest programming language to learn. It is so hard that it has to be
set aside in its own paragraph. It took two whole two years to finish writing the code for
Malbolge.
LANGUAGE PROCESSING SYSTEM
We have learnt that any computer system is made of hardware and software. The hardware
understands a language, which humans cannot understand. So we write programs in high-level
language, which is easier for us to understand and remember. These programs are then fed into a
series of tools and OS components to get the desired code that can be used by the machine. This
is known as Language Processing System.
Page 18 of 139
The high-level language is converted into binary language in various phases. A compiler is a
program that converts high-level language to assembly language. Similarly, an assembler is a
program that converts the assembly language to machine-level language.
Let us first understand how a program, using C compiler, is executed on a host machine.
User writes a program in C language (high-level language).
The C compiler, compiles the program and translates it to assembly program (low-
level language).
An assembler then translates the assembly program into machine code (object).
A linker tool is used to link all the parts of the program together for execution
(executable machine code).
A loader loads all of them into memory and then the program is executed.
Before diving straight into the concepts of compilers, we should understand a few other tools that
work closely with compilers.
Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces input for
compilers. It deals with macro-processing, augmentation, file inclusion, language extension, etc.
Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine language.
The difference lies in the way they read the source code or input. A compiler reads the whole
source code at once, creates tokens, checks semantics, generates intermediate code, executes the
whole program and may involve many passes. In contrast, an interpreter reads a statement from
the input, converts it to an intermediate code, executes it, then takes the next statement in
sequence. If an error occurs, an interpreter stops execution and reports it. whereas a compiler
reads the whole program even if it encounters several errors.
Assembler
An assembler translates assembly language programs into machine code.The output of an
assembler is called an object file, which contains a combination of machine instructions as well
as the data required to place these instructions in memory.
Linker
Linker is a computer program that links and merges various object files together in order to make
an executable file. All these files might have been compiled by separate assemblers. The major
task of a linker is to search and locate referenced module/routines in a program and to determine
the memory location where these codes will be loaded, making the program instruction to have
absolute references.
Loader
Loader is a part of operating system and is responsible for loading executable files into memory
and execute them. It calculates the size of a program (instructions and data) and creates memory
space for it. It initializes various registers to initiate execution.
Cross-compiler
A compiler that runs on platform (A) and is capable of generating executable code for platform
(B) is called a cross-compiler.
Page 19 of 139
Source-to-source Compiler
A compiler that takes the source code of one programming language and translates it into the
source code of another programming language is called a source-to-source compiler.
Computer languages
Computer languages have progressed from Low-Level Languages to High-Level Languages over
the years. Programs could only be written in Binary Language in the early days of computing.
The following categories apply to computer languages:
Machine language
Machine language, often known as a low-level language, refers to a programming language that
is directly understood and executed by a computer's hardware.
Page 20 of 139
The machine-level language is a language that consists of a set of instructions that are in the
binary form 0 or 1. As we know that computers can understand only machine instructions, which
are in binary digits, i.e., 0 and 1, so the instructions given to the computer can be only in binary
codes. Creating a program in a machine-level language is a very difficult task as it is not easy for
the programmers to write the program in machine instructions. It is error-prone as it is not easy
to understand, and its maintenance is also very high. A machine-level language is not portable as
each computer has its machine instructions, so if we write a program in one computer will no
longer be valid in another computer.
The different processor architectures use different machine codes, for example, a PowerPC
processor contains RISC architecture, which requires different code than intel x86 processor,
which has a CISC architecture.
Low-level language
Low-level language is the sole form of programming language that can be comprehended
by a computer. Low-level language, alternatively referred to as Machine Language, is a
programming language that is closely associated with the hardware architecture of a
computer system. The machine language is composed exclusively of two symbols,
namely 1 and 0. The instructions of machine language are exclusively expressed in binary
notation, consisting solely of the digits 1 and 0. Computers has the inherent ability to
comprehend machine language directly.
The low-level language is a programming language that provides no abstraction from the
hardware, and it is represented in 0 or 1 forms, which are the machine instructions. The
languages that come under this category are the Machine level language and Assembly language.
Assembly language
The assembly language contains some human-readable commands such as mov, add, sub, etc.
The problems which we were facing in machine-level language are reduced to some extent by
using an extended form of machine-level language known as assembly language. Since assembly
language instructions are written in English words like mov, add, sub, so it is easier to write and
understand.
As we know that computers can only understand the machine-level instructions, so we require a
translator that converts the assembly code into machine code. The translator used for translating
the code is known as an assembler.
The assembly language code is not portable because the data is stored in computer registers, and
the computer has to know the different sets of registers.
The assembly code is not faster than machine code because the assembly language comes above
the machine language in the hierarchy, so it means that assembly language has some abstraction
from the hardware while machine language has zero abstraction.
Page 21 of 139
Assembly language, also referred to as a middle-level language, is a low-level programming
language that is closely related to machine code.
The assembler is a software tool that functions as a translator, accepting assembly code as its
input and generating machine code as its output. This implies that the computer lacks the ability
to comprehend middle-level language, necessitating its translation into a low-level language in
order to render it comprehensible to the computer. The assembler is employed to convert an
intermediate-level language into a low-level language.
The command "g++ -S main.cpp -o main.s" is used to compile the source code file "main.cpp"
using the g++ compiler and generate the assembly code file "main.s" as output.
Or
Definition 2: A high-level language (HLL) is a programming language such as C, FORTRAN,
or Pascal that enables a programmer to write programs that are more or less independent of a
particular type of computer. Such languages are considered high-level because they are closer to
human languages and further from machine languages.
A programming language defines a set of instructions that are compiled together to perform a
specific task by the CPU (Central Processing Unit). The programming language mainly refers to
high-level languages such as C, C++, Pascal, Ada, COBOL, etc.
Each programming language contains a unique set of keywords and syntax, which are used to
create a set of instructions. Thousands of programming languages have been developed till now,
but each language has its specific purpose. These languages vary in the level of abstraction they
provide from the hardware. Some programming languages provide less or no abstraction while
some provide higher abstraction. Based on the levels of abstraction, they can be classified into
two categories:
o Low-level language
o High-level language
The image which is given below describes the abstraction level from hardware. As we can
observe from the below image that the machine language provides no abstraction, assembly
language provides less abstraction whereas high-level language provides a higher level of
abstraction.
Page 23 of 139
The first high-level programming languages were designed in the 1950s. Now there are dozens
of different languages, including Ada, Algol, BASIC, COBOL, C, C++, FORTRAN, LISP,
Pascal, and Prolog.
The following are differences between machine-level language and assembly language:
1 The machine-level language comes at the The assembly language comes above the
lowest level in the hierarchy, so it has zero machine language means that it has less
abstraction level from the hardware. abstraction level from the hardware.
4 It does not require any translator as the In assembly language, the assembler is used
machine code is directly executed by the to convert the assembly code into machine
computer. code.
Page 24 of 139
language.
What is the main difference between high level and low-level language?
The main difference between high level language and low-level language is that Programmers
can easily understand or interpret or compile the high level language in comparison of machine.
On the other hand, Machine can easily understand the low-level language in comparison of
human beings.
6 Portable Non-portable
Assignments
1. Itemize ten differences between high-level language and low-level language
2. What are the differences between an interpreter and a compiler in language processing?
3. Explain the processes involved in the language processing system, such as pre-processor,
compiler, assembler, linker, loader, and memory
4. Explain the 3 levels of programming languages
Compiler Structure
Any large software is easier to understand and implement if it is divided into well-defined
modules.
Page 26 of 139
Figure: Structure of a compiler
In a compiler,
o linear analysis
is called LEXICAL ANALYSIS or SCANNING and
is performed by the LEXICAL ANALYZER or LEXER,
o hierarchical analysis
is called SYNTAX ANALYSIS or PARSING and
is performed by the SYNTAX ANALYZER or PARSER.
Page 27 of 139
When the identifier x is found by the lexical analyzer
o generates the token id
o enters the lexeme x in the symbol-table (if it is not already there)
o associates to the generated token a pointer to the symbol-table entry x. This
pointer is called the LEXICAL VALUE of the token.
During the analysis or synthesis, the compiler may DETECT ERRORS and report on
them.
o However, after detecting an error, the compilation should proceed allowing
further errors to be detected.
o The syntax and semantic phases usually handle a large fraction of the errors
detectable by the compiler.
QUIZ QUESTIONS
There are three main kinds of programming language: Machine language. Assembly
language. High-level language
C++ can perform both low-level and high-level programming, and that's why it is essentially
considered a mid-level language. However, as its programming syntax also includes
comprehensible English, many also view C++ as another high-level language.
Python and C# are examples of high-level languages that are widely used in education and in the
workplace. A high-level language is one that is user-oriented in that it has been designed to make
it straightforward for a programmer to convert an algorithm into program code. A low-level
language is machine oriented.
HTML is not a programming language. It's a markup language. In fact, that is the technology's
name: HyperText Markup Language. That self-identified fact alone should settle the debate.
Answer
JavaScript. JavaScript is one of the world's most popular programming languages on the web.
Using JavaScript, you can build some of the most interactive websites.
Answer
JSON is a lightweight, text-based, language-independent data interchange format. It was derived
from the JavaScript/ECMAScript programming language, but is programming language
independent.
Practicing the following questions will help you test your knowledge on the course. It is highly
recommended that you practice them.
1. GATE CS 2011, Question 1
Link https://www.geeksforgeeks.org/gate-gate-cs-2011-question-1/
Page 29 of 139
Link https://www.geeksforgeeks.org/gate-gate-cs-2008-question-11/
Quiz
1. What are the main principles of compiled code?
Answer
Lexical analysis, Syntax analysis, Intermediate code generation, Code optimisation, Code
generation. Like an assembler, a compiler usually performs the above tasks by making multiple
passes over the input or some intermediate representation of the same.
Answer
The language processor that reads the complete source program written in high-level language as
a whole in one go and translates it into an equivalent program in machine language is called a
Compiler. Example: C, C++, C#.
Page 30 of 139
If X is a terminal, then First(X) is {X}. If X is a non-terminal and X tends to aα is production,
then add 'a' to the first of X. if X->ε, then add null to the First(X). If X_>YZ then if First(Y)=ε,
then First(X) = { First(Y)-ε} U First(Z).
Answer
Programs that use compilers to translate their code can sometimes run faster than interpreted
code. A compiler keeps source code contained and private from end-users, which can be
especially beneficial for programs that use commercial code.
Page 31 of 139
TAKE HOME ASSIGNMENT
Question 1. What are the 6 phases of a compiler?
ANSWER
The 6 phases of a compiler are:
Lexical Analysis.
Syntactic Analysis or Parsing.
Semantic Analysis.
Intermediate Code Generation.
Code Optimization.
Code Generation
ANSWER
Question 2. What are the two parts of compilation?
ANSWER
The two components of compilation are analysis and synthesis. The analysis stage separates the
source code into its constituent elements and produces an intermediate representation of the
source program. The target program is created from the intermediate term by the synthesis
component.
ANSWER
The compilation process can be divided into four steps, i.e., Pre-processing, Compiling,
Assembling, and Linking. The preprocessor takes the source code as an input, and it removes all
the comments from the source code. The preprocessor takes the preprocessor directive and
interprets it.
Page 32 of 139
Figure: Compilation Process in C – javatpoint
The compiler writer can use some specialized tools that help in implementing various phases of
a compiler. These tools assist in the creation of an entire compiler or its parts. Some commonly
used compiler construction tools include:
1. Parser Generator – It produces syntax analyzers (parsers) from the input that is
based on a grammatical description of programming language or on a context-free
grammar. It is useful as the syntax analysis phase is highly complex and consumes
more manual and compilation time. Example: PIC, EQM
Page 33 of 139
2. Scanner Generator – It generates lexical analyzers from the input that consists of
regular expression description based on tokens of a language. It generates a finite
automaton to recognize the regular expression.
Example:
Page 34 of 139
5. Data-flow analysis engines – It is used in code optimization.Data flow analysis is a
key part of the code optimization that gathers the information, that is the values that
flow from one part of a program to another. Refer – data flow analysis in Compiler
6. Compiler construction toolkits – It provides an integrated set of routines that aids
in building compiler components or in the construction of various phases of
compiler.
Lexical Analyzer Generator: This tool helps in generating the lexical analyzer or scanner of
the compiler. It takes as input a set of regular expressions that define the syntax of the
language being compiled and produces a program that reads the input source code and
tokenizes it based on these regular expressions.
Parser Generator: This tool helps in generating the parser of the compiler. It takes as input a
context-free grammar that defines the syntax of the language being compiled and produces a
program that parses the input tokens and builds an abstract syntax tree.
Code Generation Tools: These tools help in generating the target code for the compiler. They
take as input the abstract syntax tree produced by the parser and produce code that can be
executed on the target machine.
Optimization Tools: These tools help in optimizing the generated code for efficiency and
performance. They can perform various optimizations such as dead code elimination, loop
optimization, and register allocation.
Debugging Tools: These tools help in debugging the compiler itself or the programs that are
being compiled. They can provide debugging information such as symbol tables, call stacks,
and runtime errors.
Profiling Tools: These tools help in profiling the compiler or the compiled code to identify
performance bottlenecks and optimize the code accordingly.
Documentation Tools: These tools help in generating documentation for the compiler and the
programming language being compiled. They can generate documentation for the syntax,
semantics, and usage of the language.
Language Support: Compiler construction tools are designed to support a wide range of
programming languages, including high-level languages such as C++, Java, and Python, as
well as low-level languages such as assembly language.
User Interface: Some compiler construction tools come with a user interface that makes it
easier for developers to work with the compiler and its associated tools.
Page 35 of 139
Anatomy of a Compiler
A compiler is a tool that translates a program from one language to another language.
An interpreter is a tool that takes a program and executes it. In the first case the
program often comes from a file on disk and in the second the program is sometimes
stored in a RAM buffer, so that changes can be made quickly and easily through an
integrated editor. This is often the case in BASIC interpreters and calculator
programs. We will refer to the source of the program, whether it is on disk or in RAM,
as the input stream.
Regardless of where the program comes from it must first pass through a Tokenizer,
or as it is sometimes called, a Lexer. The tokenizer is responsible for dividing the
input stream into individual tokens, identifying the token type, and passing tokens one
at a time to the next stage of the compiler.
The next stage of the compiler is called the Parser. This part of the compiler has an
understanding of the language's grammar. It is responsible for identifying syntax
errors and for translating an error free program into internal data structures that can be
interpreted or written out in another language.
The next step in the process is to send the parse tree to either an interpreter, where it is
executed, or to a code generator preprocessor. Not all compilers have a code generator
preprocessor. The preprocessor has two jobs. The first is to break any expressions into
their simplest components. For example, the assignment a := 1 + 2 * 3 would be
broken into temp := 2 * 3; a := 1 + temp; Such expressions are called Binary
Expressions. Such expressions are necessary for generating assembler language code.
Compilers that translate from one high level language to another often do not contain
Page 36 of 139
this step. Another task of the code generator preprocessor is to perform certain
machine independent optimizations.
After preprocessing, the parse tree is sent to the code generator, which creates a new
file in the target language. Sometimes the newly created file is then post processed to
add machine dependent optimizations.
Interpreters are sometimes called virtual machines. This stems from the idea that a
CPU is actually a low level interpreter - it interprets machine code. An interpreter is a
high level simulation of a CPU.
The Tokenizer
The job of the tokenizer is to read tokens one at a time from the input stream and pass the tokens
to the parser. The heart of the tokenizer is the following type:
token_type_enum = (glob_res_word,
con_res_word,
reserved_sym,
identifier,
string_type,
int_type,
real_type);
record Token_Type is
begin
infile : text;
cur_str : array [1..80] of char;
cur_str_len : integer;
cur_line : integer;
cur_pos : integer;
type_of_token : token_type_enum;
Page 37 of 139
read_string : char [1..30] of char;
cap_string : char [1..30] of char;
int_val : integer;
float_val : real;
glob_res_word : glob_res_word_type;
con_res_word : con_res_word_type;
res_sym : res_sym_type;
end; (* Token *)
A variable of this type is used to hold the current token. The field infile is the input stream the
program being parsed is held in (for those that do not know Pascal, text files have the type text).
The next field is the current line being parsed. It is more efficient to read files a chunk at a time
rather than a character at a time, so it is standard practice to add a field to hold an entire string to
the token. Cur_str_len gives the length of the current string;
If the stream is from a RAM buffer then these two fields can be replaced with a pointer to the
correct position in the buffer.
The cur_line and cur_pos fields hold the current line number and current position in that line.
This data is used by the parser to indicate where errors occur.
There is an alternate way to handle context sensitive reserved words. The tokenizer can handle
all identifiers simply as identifiers, but provide additional procedures to determine if an identifier
is a globally reserved word or a context sensitive reserved word. Then when the parser reads an
identifier it queries the tokenizer as to whether the identifier is one or the other. Which ever
method is used, context sensitive reserved words mean more work for the parser. This is why it
is preferred to make all reserved words global.
Read_string contains the token string as it was read from the input stream, cap_stream contains
the token string after it has been capitalized. That is these strings contain only the token. When
the token type is reserved word, identifier, or string the correct value will be in one of these
fields. When the token type is integer or real a string representation of value will be found here.
Since Pascal is not case sensitive all strings will be capitalized as they are read. This will
facilitate locating variables and procedures in a case independent way. Sometimes, however, the
uncapitalized string is required, such as when a string constant is encountered in the input
stream.
Int_val and float_val will contain the correct value when either an integer or real are read.
Glob_res_word, con_res_word and res_sym are enumerations that contain all possible globally
reserved words, context sensitive reserved words and reserved symbols, respectively.
The tokenizer next must provide several procedures to manipulate tokens. An initialization
procedure is usually needed to open the input stream and find the first token. The parser will
need a procedure to read the next token on command. This procedure is shown below. The
Page 38 of 139
procedure looks long and scary, but it is very straight forward. Most of the space is taken up with
comments, and there is nothing tricky in the code itself.
var
read_str_idx : integer;
i : integer;
begin
with token do
begin
(* Clear strings *)
(* You may have to provide the following *)
(* procedure. Check your compiler's manuals *)
(* for how to do this *)
clear_string (cur_str);
clear_string (read_string);
clear_string (cap_string);
Page 39 of 139
if (cap_string[1] >= 'A') and
(cap_string[1] <= 'Z') then
begin
(* is token a global reserved word? *)
type_of_token := INDENTIFIER;
return;
end; { if (cap_string[1] >= 'A') and
(cap_string[1] <= 'Z') }
(* is token a number? *)
if ((cap_string[1] >= '0') and
(cap_string[1] <= '9')) or
(cap_string[1] = '-' then
(* is token a real or integer? *)
for i := 2 to read_str_idx do
if (cap_string[i] = '.') or
(cap_string[i] = 'E') then
begin
(* once again, you may have to provide *)
(* the following function to translate *)
(* a string to a real *)
float_val := string_to_real(cap_string);
type_of_token := real_type;
return;
end; {if (cap_string[i] = '.') or
(cap_string[i] = 'E') }
else
begin
int_val := string_to_int(cap_string);
type_of_token := int_type;
Page 40 of 139
return;
end;
(* is token a string? *)
if (cap_string[1] = '''') then (* this syntax seems
strange, but it seems to
work! *)
begin
type_of_token := string_type;
return;
end;
This procedure is actually only about two and a half pages long, and without comments it would
probably be less than two. Some software engineers stress that a procedure should not be more
than a page long. Such "engineers" are generally college professors that have never ventured
beyond the walls of their ivory towers. In real life a two and a half page procedure is considered
not overly long. As long as the entire procedure is on the same logical level, it will be readable
and easy to understand.
The general logic of the procedure should be easy to see by reading it (or its comments). First we
find the next token. This might involve reading the next line from the input stream. Next we
copy the token into read_string and cap_string. Then we set about determining the type of the
token. If the token starts with a letter, it is an identifier, global reserved word or context sensitive
reserved word. To determine if the identifier is a context sensitive or global reserved word, tables
are queried that contain each type of word. If the identifier is found in one of the tables, the
associated enumeration is returned.
Note that a very flexible tokenizer could be created by using strings instead of enumerations and
keeping the reserved words and symbols in a file. When the tokenizer is initialized the reserved
words and symbols can be read into the tables. This way the language the tokenizer works on can
be changed by simply changing the files. No source code would need to be changed. The draw
back is that the parser needs to perform comparisons. If strings are used instead of enumerations,
less efficient string compares would have to be used instead of more efficient comparisons on
enumerations.
Page 41 of 139
If the first character of the token is a digit the token is a number, or if the first character is a
minus sign the token is a negative number. If the token is a number it might be a real or an
integer. If it contains a decimal point or the letter E (which indicates scientific notation) then it is
a real, otherwise it is an integer. Note that this could be masking a lexical error. If the file
contains a token "9abc" the lexer will turn it into an integer 9. It is likely that any such error will
cause a syntax error which the parser can catch, however the lexer should probably be beefed up
to look for such things. It will make for more readable error messages to the user.
If the token is not a number, it could be a string. Strings in Pascal are identified by single quote
marks. Finally, if the token is not a string it must be a reserved symbol. For convenience, the
reserved symbols are stored in the same type of table as the global and context sensitive reserved
words. If the token is not found in this table, it is a lexical error. The tokenizer does not handle
errors itself, so it simply notifies the parser that an unidentified token type has been found. The
parser will handle the error.
QUESTIONS
1. Draw anatomy of a compiler
2. Explain the work of tokenizer
3. Discuss the work of parser
4. Enumerate the work of intermediate code
5. Highlights the work of code generators
References
https://www.geeksforgeeks.org/introduction-of-compiler-design/
https://www.geeksforgeeks.org/introduction-of-lexical-analysis/
https://www.loc.gov/preservation/digital/formats/fdd/
fdd000381.shtml#:~:text=JSON%20is%20a%20lightweight%2C%20text,but%20is
%20programming%20language%20independent.
https://www.javatpoint.com/compiler-phases
Page 42 of 139
Lecture note:
Weekly Synopsis
Weeks 3 & 4: Automata theory – finite state automaton, state diagrams, state tables
Automata theory is the study of abstract machines and automata, as well as the computational problems that
can be solved using them.
The major objective of automata theory is to develop methods by which computer scientists can describe
and analyze the dynamic behavior of discrete systems, in which signals are sampled periodically.
-It is a theory in theoretical computer science. The word automata come from the Greek word αὐτόματος,
which means "self-acting, self-willed, self-moving.
Automata theory is a theoretical branch of computer science. It studies abstract mathematical machines
called automatons. When given a finite set of inputs, these automatons automatically imitate humans
performing tasks by going through a finite sequence of states.
Types of Automata:
Finite Automata
Finite mean something that has a limited number of possibilities or possible outcome
Finite state Automata or Finite State Machine is the simplest model used in Automata.
Page 43 of 139
Finite state automata accept regular language. In this, the term finite means it has a limited number of
possible states, and number of alphabets in the strings are finite. Finite state Automata is represented by 5
tuples or elements (Q, ?, q 0 , F, ?):
The states of NDFA move from one state to another state in response to some inputs by using the transition
function.
?: Q x ? ?2Q
Where 2Q is a power set of Q. In this graph representation of the automata, transition function is represented
by arcs between states and the labels on the arcs.
q0: It is used for representing the initial state of the NDFA form where any input is processed.
1. Transition diagram
The transition diagram is also called a transition graph; it is represented by a diagraph. A transition graph
consists of three things:
Arrow (->): The initial state in the transition diagram is marked with an arrow.
Page 44 of 139
Note: Finite state Automata is represented by 5 tuples or elements (Q, ?, q 0 , F, ?):
Transition Table
It is the tabular representation of the behavior of the transition function that takes two arguments, the first is a
state, and the other is input, and it returns a value, which is the new state of the automata. It represents all the
moves of finite-state function based on the current state and input.
In the transition table, the initial state is represented with an arrow, and the final state is represented by a single
circle.
Formally a transition table is a 2-dimensional array, which consists of rows and columns where:
The rows in the transition table represent the states.
The columns contain the state in which the machine will move on the input alphabet.
Self-assessments:
Quiz:
Page 45 of 139
(iv) Formally a transition table is a 2-dimensional array, which consists of rows and columns. What does
the rows represent? What does the columns contain?
(v) What is the other name for the transition diagram called?
Quiz:
(i) Answer:
The 5 tuples or elements representation are (Q, ?, q 0 , F, ?):
(ii) Answer:
(iii) Answer:
State two composition of a formally transition table are rows and columns
(iv) Answer:
The rows in the transition table represent the states.
The columns contain the state in which the machine will move on the input alphabet.
(v) Answer:
The transition diagram is also called a transition graph
(vi) Answer:
A transition graph consists of three things:
Arrow (->): The initial state in the transition diagram is marked with an arrow.
Page 46 of 139
Deterministic Finite Automata (DFA)
DFA is a short form of Deterministic Finite Automata. In DFA, there is one and only one move from a given
state to the next state of any input symbol. In DFA, there is a finite set of states, a finite set of input symbols,
and a finite set of transitions from one state to another state that occur on input symbol chosen to form an
alphabet; there is exactly one transition out of each state.
Formal Notation of Deterministic Finite Automata (DFA):
A DFA contains 5 tuples or elements (Q, ?, ?, q0, F):
Where,
Q: It is a collection of a finite set of states.
?: It is a finite set of input symbols called as the alphabet of the automata.
?: It is used for representing the Transition Function.
The states of DFA move from one state to another state in response to some inputs by using the transition
function. The transition function given by DFA is given as follow:
Q x ? -> Q
q0: It is used for representing the initial state of the DFA form where any input is processed.
Example of DFA
Design a DFA with ? = {0, 1} that accepts those string ending with '01'.
Solution:
L = {01, 010, 110 ……………..} is the language generated.
Q: {q0, q1, q2}, It represents the total number of states
? = {0, 1}
q0 is the initial state
q2 is the final state
Transition diagram
Page 47 of 139
Transition table
NDFA is a short form of Non-Deterministic Finite Automata. In NDFA, there may be more than one move
or no move from a given state to the next state of any input symbol. NDFA is a simple machine that is used to
recognize the pattern by consuming the string of symbols and alphabets for each input symbol. NDFA differs
from DFA in the sense that NDFA can have any number of transitions to the next state from a given state on a
given input symbol.
?: Q x ? ?2Q
Where 2Q is a power set of Q. In this graph representation of the automata, transition function is represented
by arcs between states and the labels on the arcs.
q0: It is used for representing the initial state of the NDFA form where any input is processed.
F: It is a collection of a finite set of final states.
Example of NDFA
Design a NDFA with ? = {0, 1} that accepts those strings starting with '01'.
Solution:
L = {01, 010, 011 ……………..} is the language generated using this language
Q: {q0, q1, q2}
It represents the total number of states
? = {0, 1}
q0 is the initial state
q2 is the final state
Transition diagram
Page 48 of 139
Transition table
QUIZ
4. Define the terms in a NDFA that contains 5 tuples or elements (Q, ?, ?, q0, F):
Answer:
The definition of the terms in a NDFA that contains 5 tuples or elements (Q, ?, ?, q0, F):
q0: It is used for representing the initial state of the NDFA form where any input is processed.
F: It is a collection of a finite set of final states.
Note:
?: Q x ? ?2Q
Where 2Q is a power set of Q. In this graph representation of the automata, transition function is represented
by arcs between states and the labels on the arcs.
Unreachable state: Unreachable state is that state in which finite automata never reaches during the transition
from one state to another state.
In the above DFA, we have unreachable state E, because on any input from the initial state, we are unable to
reach to that state. This state is useless in finite automata. So, the best solution is to eliminate these types of
states to minimize the finite automata.
Dead State: It is a non-accepting state, which goes itself for every possible input symbol.
In the above DFA, we have q5, and q6 are dead states because every possible input symbol goes to itself.
Page 50 of 139
This step is repeated for every group. Find group the input lead to if there are differences the partition the
group into sets containing states which go to the same groups under the inputs.
The resulting final partition contains equivalent states now merge them into a single state.
Example:
Minimize the following DFA.
Solution
T = {q0}
Now, for all states in temporary state set T, find transition from each state on each input symbol in ?.
If resulting state is not in T add that state in T.
? (q0, a) = q1
? ( q0, b) = q2
Again
? ( q1, a) = q3
? ( q1, b) = q4
Page 51 of 139
Again
? ( q2, a) = q3
? ( q2, b) = q5
Again
? ( q3, a) = q3
? ( q3, b) = q1
Again
? ( q4, a) = q4
? ( q4, b) = q5
Again
? ( q5, a) = q5
? ( q5, b) = q4
U=Q–T
U = {q0, q1, q2, q3, q4, q5, q6} – {q0, q1, q2, q3, q4, q5}
Step 2: In this step, we eliminate the unreachable state found in first step.
Page 52 of 139
Step 3: Identify the equivalent steps and merge them.
Group A1 - q3
Group A2 – q4, q5
Group B – q0, q1, q2
Group B1 – q0
Group B2 – q1, q2
Page 53 of 139
? (q5, a) = q5
? (q5, b) = q4
As both belong to the same group, the further division is not possible.
?(q1, a) = q3
?(q2, a) = q3
?(q1, b) = q4
?(q2, b) = q5
q4 and q5 belong to group A2 for input b. So no further partitioning is possible.
Step 4: In this step, we detected dead states. There are no dead states in the above DFA; hence it is minimized.
Assignment:
Page 54 of 139
Introduction
The most difficult task in designing sequential circuits occurs at the very start of the design; in determining
what characteristics of a given problem require sequential operations, and more particularly, what behaviors
must be represented by a unique state. A poor choice of states coupled with a poor understanding of the
problem can make a design lengthy, difficult, and error-prone. With better understanding and a better choice of
states, the same problem might well be trivial. Whereas it is relatively straight-forward to describe sequential
circuit structure and define applicable engineering design methods, it is relatively challenging to find analytical
methods capable of matching design problem requirements to eventual machine states.
Note:
A sequential circuit refers to a special type of circuit. It consists of a series of various inputs and outputs.
Here, the outputs depend on a combination of both the present inputs as well as the previous outputs. This
previous output gets treated in the form of the present state.
What is sequential circuit with example?
In other words, their output depends on a SEQUENCE of the events occurring at the circuit inputs. Examples
of such circuits include clocks, flip-flops, bi-stables, counters, memories, and registers. The actions of the
sequential circuits depend on the range of basic sub-circuits.
Sequential circuits are the other important digital type, used in counting and for memory actions. The
simplest type is the S-R flip-flop (or latch) whose output(s) can be set by one pair of inputs and reset by
reversing each input.
Sequential circuits are essentially combinational circuits with feedback. A block diagram of a generalized
sequential circuit is shown in Figure 1.
1: Sequential circuits have ‘memory’ because their outputs depend, in part, upon past
outputs. 2: Combinational logic plus ‘memory’. 3: For n-outputs from ‘memory’, and m-external inputs; have:
Page 55 of 139
2n internal and 2m + n possible total states. 4: Memory elements in synchronous circuits are flip-flops which are
clocked. Asynchronous circuits are unclocked. 5: The internal inputs and outputs must match (as they are
connected). 6: Only one input can change at a time (fundamental mode operation). 7: ‘Cutting’ the connection
between internal inputs and outputs. 8: (a) Horizontal; (b) vertical. 9: Oscillation. 10: Non-critical races
do not affect final output; critical races do.
A state table is nothing more than a truth table that specifies the requirements for the next-state logic,
with inputs coming from the state register and from outside the circuit. The state table lists all required
states, and all possible next states that might follow a given present state.
State Tables and State Diagrams. The relationship that exists among the inputs, outputs, present states
and next states can be specified by either the state table or the state diagram. The state table
representation of a sequential circuit consists of three sections labelled present state, next state and output.
A state diagram is used to represent the condition of the system or part of the system at finite instances of
time. It's a behavioral diagram and it represents the behavior using finite state transitions. State diagrams are
also referred to as State machines and State-chart Diagrams.
We use it to state the events responsible for change in state (we do not show what processes cause
those events).
We use it to model the dynamic behavior of the system.
To understand the reaction of objects/classes to internal or external stimuli.
Firstly, let us understand what are Behavior diagrams? There are two types of diagrams in UML:
Page 56 of 139
1. Structure Diagrams – Used to model the static structure of a system, for example- class diagram,
package diagram, object diagram, deployment diagram etc.
2. Behavior diagram – Used to model the dynamic change in the system over time. They are used to
model and construct the functionality of a system. So, a behavior diagram simply guides us through
the functionality of the system using Use case diagrams, Interaction diagrams, Activity diagrams
and State diagrams.
Difference between state diagram and flowchart –
The basic purpose of a state diagram is to portray various changes in state of the class and not the
processes or commands causing the changes. However, a flowchart on the other hand portrays the processes
or commands that on execution change the state of class or an object of the class.
The state diagram above shows the different states in which the verification sub-system or class exist for a
particular system.
1. Initial state – We use a black filled circle represent the initial state of a System or a class.
2. Transition – We use a solid arrow to represent the transition or change of control from one
state to another. The arrow is labelled with the event which causes the change in state.
Figure – transition
3. State – We use a rounded rectangle to represent a state. A state represents the conditions or
circumstances of an object of a class at an instant of time.
4. Fork – We use a rounded solid rectangular bar to represent a Fork notation with incoming
arrow from the parent state and outgoing arrows towards the newly created states. We use the
fork notation to represent a state splitting into two or more concurrent states.
Page 57 of 139
Figure – a diagram using the fork notation
5. Join – We use a rounded solid rectangular bar to represent a Join notation with incoming
arrows from the joining states and outgoing arrow towards the common goal state. We use the
join notation when two or more states concurrently converge into one on the occurrence of an
event or events.
6. Self transition – We use a solid arrow pointing back to the state itself to represent a self
transition. There might be scenarios when the state of the object does not change upon the
occurrence of an event. We use self transitions to represent such cases.
7. Composite state – We use a rounded rectangle to represent a composite state also.We represent
a state with internal activities using a composite state.
8. Final state – We use a filled circle within a circle notation to represent the final state in a state
machine diagram.
Page 58 of 139
Figure – final state notation
The UMl diagrams we draw depend on the system we aim to represent. Here is just an example of how an
online ordering system might look like:
1. On the event of an order being received, we transit from our initial state to Unprocessed order
state.
2. The unprocessed order is then checked.
3. If the order is rejected, we transit to the Rejected Order state.
4. If the order is accepted and we have the items available, we transit to the fulfilled order state.
5. However, if the items are not available we transit to the Pending Order state.
6. After the order is fulfilled, we transit to the final state. In this example, we merge the two states
i.e. Fulfilled order and Rejected order into one final state.
Note – Here we could have also treated fulfilled order and rejected order as final states separately.
QUIZ:
Assignment:
Question 1.
1. What is the difference between state diagram and flowchart?
Answer 1:
The basic purpose of a state diagram is to portray various changes in state of the class and not the
processes or commands causing the changes. However, a flowchart on the other hand portrays the processes
or commands that on execution change the state of class or an object of the class.
Question 2.
2. Illustrate a state diagram for user verification
Answer 2:
Question 3.
3. Illustrate a state diagram to shows the different states in which the verification sub-system or class exist
for a particular system.
Answer 3:
Also, the state diagram below shows the different states in which the verification sub-system or class exist
for a particular system.
Question 4.
(i) Initial state, (ii) Transition, (iii) State, (iv) Fork, (v) Join, (vi) Self transition, (vii) Composite state,
(viii) final state.
Explain each basic component
Question 6.
6. Explain each basic component of a state-chart diagram given as:
(i) Initial state, (ii) Transition, (iii) State, (iv) Fork, (v) Join, (vi) Self transition, (vii) Composite state,
(viii) final state
Question 7.
(i) What is sequential circuit with example?
1. https://www.geeksforgeeks.org/unified-modeling-language-uml-state-diagrams/
2. https://www.sciencedirect.com/topics/engineering/sequential-circuits
https://www.youtube.com/watch?v=WQbSFe_aab8
https://www.youtube.com/watch?v=2TGfiaCrL2s
To optimize the DFA one has to follow the various steps. These are as follows:
Step 1: Remove all the states that are unreachable from the initial state via any set of the
transition of DFA.
Step 2: Draw the transition table for all pair of states.
Step 3: Now split the transition table into two tables T1 and T2.
Page 62 of 139
What is the purpose of optimization in compiler?
Optimization is a program transformation technique, which tries to improve the code by making
it consume less resources (i.e. CPU, Memory) and deliver high speed.
Page 63 of 139
There are three main elements to solve an optimization problem: an objective, variables, and
constraints. Each variable can have different values, and the aim is to find the optimal value for
each one. The purpose is the desired result or goal of the problem.
In this section we present three algorithms that have been used to implement and optimize
pattern matchers constructed from regular expressions.
The first algorithm is useful in a Lex compiler, because it constructs a DFA directly from a
regular expression, without constructing an intermediate NFA. The resulting DFA also may have
fewer states than the DFA constructed via an NFA.
Page 65 of 139
The second algorithm minimizes the number of states of any DFA, by combining states that have
the same future behavior. The algorithm itself is quite efficient, running in time 0(n log n), where
n is the number of states of the DFA.
The third algorithm produces more compact representations of transition tables than the standard,
two-dimensional table.
1. Important States of an NFA
To begin our discussion of how to go directly from a regular expression to a DFA, we must first
dissect the NFA construction of Algorithm 3.23 and consider the roles played by various states.
We call a state of an NFA important if it has a non-e out-transition. Notice that the subset
construction (Algorithm 3.20) uses only the important states in a set T when it computes e-
closure(move(T, a)), the set of states reachable from T on input a. That is, the set of
states move(s, a) is nonempty only if states is important. During the subset construction, two sets
of NFA states can be identified (treated as if they were the same set) if they:
Have the same important states, and either both have accepting states or neither does.
When the NFA is constructed from a regular expression by Algorithm 3.23, we can say more
about the important states. The only important states are those introduced as initial states in the
basis part for a particular symbol position in the regular expression. That is, each important state
corresponds to a particular operand in the regular expression.
The constructed NFA has only one accepting state, but this state, having no out-transitions, is not
an important state. By concatenating a unique right endmarker # to a regular expression r, we
give the accepting state for r a transition on #, making it an important state of the NFA for ( r )
# . In other words, by using the augmented regular expression ( r ) # , we can forget about
accepting states as the subset construction proceeds; when the construction is complete, any state
with a transition on # must be an accepting state.
The important states of the NFA correspond directly to the positions in the regular expression
that hold symbols of the alphabet. It is useful, as we shall see, to present the regular expression
by its syntax tree, where the leaves correspond to operands and the interior nodes correspond to
operators. An interior node is called a cat-node, or-node, or star-node if it is labeled by the
concatenation operator (dot), union operator |, or star operator *, respectively. We can construct a
syntax tree for a regular expression just as we did for arithmetic expressions in Section 2.5.1.
Example 3.31 : Figure 3.56 shows the syntax tree for the regular expression of our running
example. Cat-nodes are represented by circles. •
Page 66 of 139
Leaves in a syntax tree are labeled by e or by an alphabet symbol. To each leaf not labeled e, we
attach a unique integer. We refer to this integer as the position of the leaf and also as a position
of its symbol. Note that a symbol can have several positions; for instance, a has positions 1 and 3
in Fig. 3.56. The positions in the syntax tree correspond to the important states of the constructed
NFA.
Example 3.32 : Figure 3.57 shows the NFA for the same regular expression as Fig. 3.56, with the
important states numbered and other states represented by letters. The numbered states in the
NFA and the positions in the syntax tree correspond in a way we shall soon see. •
1. nullable(n) is true for a syntax-tree node n if and only if the subexpression represented
by n has e in its language. That is, the subexpression can be "made null" or the empty
string, even though there may be other strings it can represent as well.
2. firstpos(n) is the set of positions in the subtree rooted at n that corre-spond to the first symbol
of at least one string in the language of the subexpression rooted at n.
3. lastpos(n) is the set of positions in the subtree rooted at n that corre-spond to the last symbol
of at least one string in the language of the subexpression rooted at n.
4. followpos(p), for a position p, is the set of positions q in the entire syntax tree such that there
is some string x = axa2 ••• an in L ( ( r ) # ) such that for some i, there is a way to explain the
membership of x in L ( ( r ) # ) by matching to position p of the syntax tree and ai+i to position g.
Example 3.33 : Consider the cat-node n in Fig. 3.56 that corresponds to the expression (a|b)*a.
We claim nullable(ri) is false, since this node generates all strings of a's and 6's ending in an a; it
does not generate e. On the other hand, the star-node below it is nullable; it generates e along
with all other strings of a's and 6's.
firstpos{n) — {1,2,3} . In a typical generated string like aa, the first position of the string
corresponds to position 1 of the tree, and in a string like 6a, the first position of the string comes
from position 2 of the tree. However, when the string generated by the expression of node n is
just a, then this a comes from position 3.
lastpos{n) — {3}. That is, no matter what string is generated from the expression of node n, the
last position is the a from position 3 of the tree.
followpos is trickier to compute, but we shall see the rules for doing so shortly. Here is an
example of the reasoning: followpos(l) — {1,2,3} . Consider a string • • • ac • • •, where the c is
either a or 6, and the a comes from position 1. That is, this a is one of those generated by the a in
expression (a|b)*. This a could be followed by another a or 6 coming from the same
subexpression, in which case c comes from position 1 or 2. It is also possible that this a is the last
in the string generated by (a|b)*, in which case the symbol c must be the a that comes from
position 3. Thus, 1,2, and 3 are exactly the positions that can follow position 1. • • . '
Page 68 of 139
We can compute nullable, firstpos, and lastpos by a straightforward recursion on the height of
the tree. The basis and inductive rules for nullable and firstpos are summarized in Fig. 3.58. The
rules for lastpos are essentially the same as for firstpos, but the roles of children c\ and c2 must
be swapped in the rule for a cat-node.
Example 3 . 3 4 : Of all the nodes in Fig. 3.56 only the star-node is nullable. We note from the
table of Fig. 3.58 that none of the leaves are nullable, because they each correspond to non-e
operands. The or-node is not nullable, because neither of its children is. The star-node is
nullable, because every star-node is nullable. Finally, each of the cat-nodes, having at least one
nonnullable child, is not nullable.
The computation of firstpos and lastpos for each of the nodes is shown in
Fig. 3.59, with firstposin) to the left of node n, and lastpos(n) to its right. Each of the leaves has
only itself for firstpos and lastpos, as required by the rule for non-e leaves in Fig. 3.58. For the
or-node, we take the union of firstpos at the
children and do the same for lastpos. The rule for the star-node says that we take the value of
firstpos or lastpos at the one child of that node.
Now, consider the lowest cat-node, which we shall call n. To compute firstpos(n), we first
consider whether the left operand is mailable, which it is in this case. Therefore, firstpos for n is
the union of firstpos for each of its children, that is {1,2} U {3} = {1,2,3} . The rule for lastpos
does not appear explicitly in Fig. 3.58, but as we mentioned, the rules are the same as for
firstpos, with the children interchanged. That is, to compute lastpos(n) we must ask whether its
right child (the leaf with position 3) is nullable, which it is not. Therefore, lastpos(n) is the same
as lastpos of the right child, or {3}.
4 Computing followpos
Page 69 of 139
Finally, we need to see how to compute followpos. There are only two ways that a position of a
regular expression can be made to follow another.
1. If n is a cat-node with left child c \ and right child C 2 , then for every position i
in lastpos(ci), all positions in firstpos(c2) are in followpos(i).
2. If n is a star-node, and i is a position in lastpos(n), then all positions in firstpos(n) are
in followpos(i).
Example 3.35:
Let us continue with our running example; recall that firstpos and lastpos were computed in Fig.
3.59. Rule 1 for followpos requires that we look at each cat-node, and put each position
in firstpos of its right child in followpos for each position in lastpos of its left child. For the
lowest cat-node in Fig. 3.59, that rule says position 3 is in followpos(l) and followpos{2). The
next cat-node above says that 4 is in followpos(2>), and the remaining two cat-nodes give us 5
in followpos(4) and 6 in followpos(5).
Figure 3.59: firstpos and lastpos for nodes in the syntax tree for (a|b)*abb#
We must also apply rule 2 to the star-node. That rule tells us positions 1 and 2 are in both
followpos(l) and followpos(2), since both firstpos and lastpos for this node are {1,2} . The
complete sets followpos are summarized in Fig. 3.60.
Page 70 of 139
We can represent the function followpos by creating a directed graph with a node for each
position and an arc from position i to position j if and only if j is in followpos(i). Figure 3.61
shows this graph for the function of Fig. 3.60.
It should come as no surprise that the graph for followpos is almost an NFA without e-transitions
for the underlying regular expression, and would become one if we:
1. Make all positions in firstpos of the root be initial states,
2. Label each arc from i to j by the symbol at position i, and
3. Make the position associated with endmarker # be the only accepting state.
5. Converting a Regular Expression Directly to a DFA
Algorithm 3.36 : Construction of a DFA from a regular expression r.
INPUT : A regular expression r.
OUTPUT : A DFA D that recognizes L(r).
METHOD :
Page 71 of 139
Example 3.37 : We can now put together the steps of our running example to construct a DFA
for the regular expression r = (a|b)*abb. The syntax tree for ( r ) # appeared in Fig. 3.56. We
observed that for this tree, nullable is true only for the star-node, and we exhibited firstpos and
lastpos in Fig. 3.59. The values of followpos appear in Fig. 3.60.
The value of firstpos for the root of the tree is {1,2,3}, so this set is the start state of D. Call this
set of states A. We must compute Dtran[A, a] and Dtran[A, b]. Among the positions of A, 1 and
3 correspond to a, while 2 corresponds to b. Thus, Dtran[A,a] = followpos(l) U followpos(3) =
{1,2,3,4},
initialize Dstates to contain only the unmarked state firstpos(no), where no is the root of syntax
tree T for ( r ) # ;
w h i le ( there is an unmarked state S in Dstates ) {
mark S;
for ( each input symbol a ) {
let U be the union of followpos(p) for all p
in S that correspond to a;
if ( U is not in Dstates )
add U as an unmarked state to Dstates; Dtran[S, a) = U;
}
}
Figure 3.62: Construction of a DFA directly from a regular expression
and Dtran[A, b] = followpos{2) = {1,2,3} . The latter is state A, and so does not have to be
added to Dstates, but the former, B = {1,2,3,4}, is new, so we add it to Dstates and proceed to
compute its transitions. The complete DFA is shown in Fig. 3.63.
Page 72 of 139
have states with different names, but they don't even have the same number of states. If we
implement a lexical analyzer as a DFA, we would generally prefer a DFA with as few states as
possible, since each state requires entries in the table that describes the lexical analyzer.
The matter of the names of states is minor. We shall say that two automata are the same up to
state names if one can be transformed into the other by doing nothing more than changing the
names of states. Figures 3.36 and 3.63 are not the same up to state names. However, there is a
close relationship between the states of each. States A and C of Fig. 3.36 are actually equivalent,
in the sense that neither is an accepting state, and on any input they transfer to the same state —
to B on input a and to C on input b. Moreover, both states A and C behave like state 123 of Fig.
3.63. Likewise, state B of Fig. 3.36 behaves like state 1234 of Fig. 3.63, state D behaves like
state 1235, and state E behaves like state 1236.
It turns out that there is always a unique (up to state names) minimum state DFA for any regular
language. Moreover, this minimum-state DFA can be constructed from any DFA for the same
language by grouping sets of equivalent states. In the case of L ( ( a | b ) * a b b ) , Fig. 3.63 is
the minimum-state DFA, and it can be constructed by partitioning the states of Fig. 3.36 as {A,
C}{B}{D}{E}.
In order to understand the algorithm for creating the partition of states that converts any DFA
into its minimum-state equivalent DFA, we need to see how input strings distinguish states from
one another. We say that string x distinguishes state s from state t if exactly one of the states
reached from s and t by following the path with label x is an accepting state. State s is
distinguishable from state t if there is some string that distinguishes them.
Example 3.38 : The empty string distinguishes any accepting state from any nonaccepting state.
In Fig. 3.36, the string bb distinguishes state A from state B, since bb takes A to a nonaccepting
state C, but takes B to accepting state •
The state-minimization algorithm works by partitioning the states of a DFA into groups of states
that cannot be distinguished. Each group of states is then merged into a single state of the
minimum-state DFA. The algorithm works by maintaining a partition, whose groups are sets of
states that have not yet been distinguished, while any two states from different groups are known
to be distinguishable. When the partition cannot be refined further by breaking any group into
smaller groups, we have the minimum-state DFA.
Initially, the partition consists of two groups: the accepting states and the nonaccepting states.
The fundamental step is to take some group of the current partition, say A = {si, s 2 , . . . , sk}, and
some input symbol a, and see whether a can be used to distinguish between any states in
group A. We examine the transitions from each of si, s2,... , on input a, and if the states reached
fall into two or more groups of the current partition, we split A into a collection of groups, so
that si and Sj are in the same group if and only if they go to the same group on input a. We
Page 73 of 139
repeat this process of splitting groups, until for no group, and for no input symbol, can the group
be split further. The idea is formalized in the next algorithm.
Algorithm 3 . 3 9 : Minimizing the number of states of a DFA.
INPUT : A DFA D with set of states 5, input alphabet S, state state s 0 , and set of accepting
states F.
OUTPUT : A DFA D' accepting the same language as D and having as few states as possible.
Why the State-Minimization Algorithm Works
We need to prove two things: that states remaining in the same group in
Hfinal are indistinguishable by any string, and that states winding up in different groups are
distinguishable. The first is an induction on i that if after the ith iteration of step (2) of
Algorithm 3.39, s and t are in the same group, then there is no string of length i or less that
distinguishes them. We shall leave the details of the induction to you.
The second is an induction on i that if states s and t are placed in different groups at
the ith iteration of step (2), then there is a string that distinguishes them. The basis,
when s and t are placed in different groups of the initial partition, is easy: one must be accepting
and the other not, so e distinguishes them. For the induction, there must be an input a and
states p and q such that s and t go to states p and q, respectively, on
input a. Moreover, p and q must already have been placed in different groups. Then by the
inductive hypothesis, there is some string x that distinguishes
p from q. Therefore, ax distinguishes s from t.
Method:
1. Start with an initial partition II with two groups, F and S — F, the accepting and nonaccepting
states of D.
2. Apply the procedure of Fig. 3 . 64 to construct a new partition n new initially, let llnew = II;
for ( each group G of U ) {
partition G into subgroups such that two states s and t are in the same subgroup if and only if for
all input symbols a, states s and t have transitions on a to states in the same group of Tl;
/* at worst, a state will be in a subgroup by itself */
replace G in nn e w by the set of all subgroups formed;
}
Figure 3.64: Construction of nn ew
3 . If nn e w = FT, let llflnai = II and continue with step (4). Otherwise, repeat step (2) with
Ilnew in place of Tl.
Page 74 of 139
4. Choose one state in each group of IIfin a i as the representative for that group. The
representatives will be the states of the minimum-state DFA D'. The other components of D' are
constructed as follows:
Eliminating the Dead State
The minimization algorithm sometimes produces a DFA with one dead state — one that is not
accepting and transfers to itself on each input symbol. This state is technically needed, because a
DFA must have a transition from every state on every symbol. However, as discussed in
Section 3.8.3, we often want to know when there is no longer any possibility of acceptance, so
we can establish that the proper lexeme has already been seen. Thus, we may wish to eliminate
the dead state and use an automaton that is missing some transitions. This automaton has one
fewer state than the minimum-state DFA, but is strictly speaking not a DFA, because of the
missing transitions to the dead state.
(a) The state state of D' is the representative of the group containing the start state of D.
The accepting states of D' are the representatives of those groups that contain an accepting state
of D. Note that each group contains either only accepting states, or only nonaccepting states,
because we started by separating those two classes of states, and the procedure of Fig. 3.64
always forms new groups that are subgroups of previously constructed groups.
Let s be the representative of some group G of nfin a i, and let the transition of D from s on input
a be to state t. Let r be the representative of fs group H. Then in D', there is a transition from s to
r on input a. Note that in D, every state in group G must go to some state of group H on input a,
or else, group G would have been split according to Fig. 3.64.
Example 3.40 : Let us reconsider the DFA of Fig. 3.36. The initial partition consists of the two
groups {A, B, C, D}{E}, which are respectively the nonaccepting states and the accepting states.
To construct n n e w 5 the procedure of Fig. 3.64 considers both groups and inputs a and b. The
group {E} cannot be split, because it has only one state, so {E} will remain intact in n n e w -
The other group {A, B:C,D} can be split, so we must consider the effect of each input symbol.
On input a, each of these states goes to state B, so there is no way to distinguish these states
using strings that begin with a. On input 6, states A, 5, and C go to members of group {A, B, C,
D}: while state D goes to E, a member of another group. Thus, in n n e w , group {A, B,C, D} is
split into {A,B,C}{D}, and n n e w for this round is {A,B,C}{D}{E}.
In the next round, we can split {A,B,C} into {A,C}{B}, since A and C each go to a member of
{A, B, C} on input b, while B goes to a member of another group, {D}. Thus, after the second
Page 75 of 139
round, nn e w = {A, C}{B}{D}{E}. For the third round, we cannot split the one remaining group
with more than one state, since A and C each go to the same state (and therefore to the same
group) on each input. We conclude that Ilfin a i = {A, C}{B}{D}{E}.
Now, we shall construct the minimum-state DFA. It has four states, corre-sponding to the four
groups of Ufina\, and let us pick A, B, D, and E as the representatives of these groups. The initial
state is A, and the only accepting state is E. Figure 3.65 shows the transition function for the
DFA. For instance, the transition from state E on input b is to A, since in the original DFA, E
goes to C on input 6, and A is the representative of C's group. For the same reason, the transition
on b from state A is to A itself, while all other transitions are as in Fig. 3.36. •
Page 76 of 139
to find the next state and any special action we must take, e.g., returning a token to the parser.
Since a typical lexical analyzer has several hundred states in its DFA and involves the ASCII
alphabet of 128 input characters, the array consumes less than a megabyte.
However, compilers are also appearing in very small devices, where even a megabyte of storage
may be too much. For such situations, there are many methods that can be used to compact the
transition table. For instance, we can represent each state by a list of transitions — that is,
character-state pairs — ended by a default state that is to be chosen for any input character not on
the list. If we choose as the default the most frequently occurring next state, we can often reduce
the amount of storage needed by a large factor.
There is a more subtle data structure that allows us to combine the speed of array access with the
compression of lists with defaults. We may think of this structure as four arrays, as suggested in
Fig. 3.66.5 The base array is used to determine the base location of the entries for state s, which
are located in the next and check arrays. The default array is used to determine an alternative
base location if the check array tells us the one given by base[s] is invalid.
To compute nextState(s, a), the transition for state s on input a, we examine the next and
check entries in location I = base[s]+a, where character o is treated as an integer, presumably in
the range 0 to 127. If check[l) = s, then this entry
I n practice, there would be another array indexed by states to give the action associated with
that state, if any.
is valid, and the next state for states on input a is next[l]. If check[l] ^ s, then we determine
another state t = defaults] and repeat the process as if t were the current state. More formally, the
function nextState is defined as follows:
int nextState(s,a) {
if ( check[base[s] + a] = s ) r e t u r n next[base[s] + a];
else return nextState(default[s],a);
}
Page 77 of 139
The intended use of the structure of Fig. 3.66 is to make the next-check arrays short by taking
advantage of the similarities among states. For instance, state t, the default for state s, might be
the state that says "we are working on an identifier," like state 10 in Fig. 3.14. Perhaps state s is
entered after seeing the letters t h , which are a prefix of keyword t h e n as well as potentially
being the prefix of some lexeme for an identifier. On input character e, we must go from state s
to a special state that remembers we have seen t h e , but otherwise, state s behaves as t does.
Thus, we set check[base[s] + e] to s (to confirm that this entry is valid for s) and we set
next[base[s] + e] to the state that remembers the . Also, default[s] is set to t.
While we may not be able to choose base values so that no next-check entries remain unused,
experience has shown that the simple strategy of assigning base values to states in turn, and
assigning each base[s] value the lowest integer so that the special entries for state s are not
previously occupied utilizes little more space than the minimum possible.
Practice questions
What are the 6 phases of compiler?
Page 78 of 139
The 6 phases of a compiler are:
Lexical Analysis.
Syntactic Analysis or Parsing.
Semantic Analysis.
Intermediate Code Generation.
Code Optimization.
Code Generation.
What are the three types of compiler design?
Types of Compiler
Single Pass Compilers.
Two Pass Compilers.
Multipass Compilers.
Reference
https://www.brainkart.com/article/Optimization-of-DFA-Based-Pattern-Matchers_8143/
Page 79 of 139
Week 7: Construction of syntax trees, Bottom-up evaluation of S-attributed
definitions, L-attributed definitions, Top-down translation, Bottom-up
evaluation of inherited attributes, Recursive evaluators, Space for attribute
values at compile time, Assigning space at compile time.
Note
In this section, we will discuss inherited attributes in compiler design. Along with that, we also
learn some of the basic terms which we will use while explaining the inherited attribute in
compiler design.
Construction of syntax trees
Compiler Design – Variants of Syntax Tree
A syntax tree is a tree in which each leaf node represents an operand, while each inside node
represents an operator. The Parse Tree is abbreviated as the syntax tree. The syntax tree is
usually used when representing a program in a tree structure.
Rules of Constructing a Syntax Tree
A syntax tree’s nodes can all be performed as data with numerous fields. One element of the
node for an operator identifies the operator, while the remaining field contains a pointer to the
operand nodes. The operator is also known as the node’s label. The nodes of the syntax tree for
expressions with binary operators are created using the following functions. Each function
returns a reference to the node that was most recently created.
1. mknode (op, left, right): It creates an operator node with the name op and two fields,
containing left and right pointers.
2. mkleaf (id, entry): It creates an identifier node with the label id and the entry field, which is a
reference to the identifier’s symbol table entry.
3. mkleaf (num, val): It creates a number node with the name num and a field containing the
number’s value, val. Make a syntax tree for the expression a 4 + c, for example. p1, p2,…, p5 are
pointers to the symbol table entries for identifiers ‘a’ and ‘c’, respectively, in this sequence.
Example 1: Syntax Tree for the string a – b ∗ c + d is:
Page 80 of 139
Example 2: Syntax Tree for the string a * (b + c) – d /2 is:
Page 81 of 139
Examples:
T0 = a+b --- Expression 1
T1 = T0 +c --- Expression 2
Expression 1: T0 = a+b
In the given figure above (Nodes of a DAG for i = i + 10 allocated in an array) the interior nodes
contain two more fields denoting the left and right children, while leaves have one additional
field that stores the lexical value (either a symbol-table pointer or a constant in this instance).
The integer index of the record for that node inside the array is used to refer to nodes in this
array. This integer has been referred to as the node’s value number or the expression represented
by the node in the past. The value of the node labeled -I- is 3, while the values of its left and right
children are 1 and 2, respectively. Instead of integer indexes, we may use pointers to records or
references to objects in practice, but the reference to a node would still be referred to as its
Page 82 of 139
“value number.” Value numbers can assist us in constructing expressions if they are stored in the
right data format.
Algorithm: The value-number method for constructing the nodes of a Directed Acyclic Graph.
INPUT: Label op, node /, and node r.
OUTPUT: The value number of a node in the array with signature (op, l,r).
METHOD: Search the array for node M with label op, left child I, and right child r. If there is
such a node, return the value number of M. If not, create in the array a new node N with label op,
left child I, and right child r, and return its value number.
While Algorithm produces the intended result, examining the full array every time one node is
requested is time-consuming, especially if the array contains expressions from an entire program.
A hash table, in which the nodes are divided into “buckets,” each of which generally contains
only a few nodes, is a more efficient method. The hash table is one of numerous data structures
that may effectively support dictionaries. 1 A dictionary is a data type that allows us to add and
remove elements from a set, as well as to detect if a particular element is present in the set. A
good dictionary data structure, such as a hash table, executes each of these operations in a
constant or near-constant amount of time, regardless of the size of the set.
To build a hash table for the nodes of a DAG, we require a hash function h that computes the
bucket index for a signature (op, I, r) in such a manner that the signatures are distributed across
buckets and no one bucket gets more than a fair portion of the nodes. The bucket index h(op, I, r)
is deterministically computed from the op, I, and r, allowing us to repeat the calculation and
always arrive at the same bucket index per node (op, I, r).
The buckets can be implemented as linked lists, as in the given figure. The bucket headers are
stored in an array indexed by the hash value, each of which corresponds to the first cell of a list.
Each column in a bucket’s linked list contains the value number of one of the nodes that hash to
that bucket. That is, node (op,l,r) may be located on the array’s list whose header is at index
h(op,l,r).
L-attributed definitions
L-attributed grammars are a special type of attribute grammars. They allow the attributes to be
evaluated in one depth-first left-to-right traversal of the abstract syntax tree. As a result, attribute
evaluation in L-attributed grammars can be incorporated conveniently in top-down parsing.
Let consider a syntax-directed definition S and a parse tree T for S showing the attributes of the
grammar symbols of T. Figure 1 shows an example of such a tree. Algorithm 1 provides an
order, called depth-first evaluation order, for evaluating attributes shown by T.
Algorithm 1
Top-down translation
In compiler design, top-down parsing is a parsing technique that involves starting with the
highest-level nonterminal symbol of the grammar and working downward to derive the input
string. An example of top-down parsing is recursive descent parsing.
Top-down parsing in computer science is a parsing strategy where one first looks at the highest
level of the parse tree and works down the parse tree by using the rewriting rules of a formal
grammar. LL parsers are a type of parser that uses a top-down parsing strategy.
Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing
general parse tree structures and then considering whether the known fundamental structures are
Page 85 of 139
compatible with the hypothesis. It occurs in the analysis of both natural languages and computer
languages.
Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream
by searching for parse-trees using a top-down expansion of the given formal grammar rules.
Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides
of grammar rules.
Simple implementations of top-down parsing do not terminate for left-recursive grammars, and
top-down parsing with backtracking may have exponential time complexity with respect to the
length of the input for ambiguous CFGs. However, more sophisticated top-down parsers have
been created by Frost, Hafiz, and Callaghan, which do accommodate ambiguity and left
recursion in polynomial time and which generate polynomial-sized representations of the
potentially exponential number of parse trees.
The above parse tree also shows the values assigned to B and C from parent and sibling,
respectively, and the value of A is taken from node C, which is the child of A.
Difference between inherited and synthesized attributes
If the value of an attribute at its parent, siblings, If the value of an attribute at its child
or other nodes determines the value of that nodes determines the value of its parse
attribute at that node's parse tree node, that tree node, the attribute is said to be
attribute is said to be inherited. synthesized.
One can only specify an inherited property at Only the attribute values at n's children
Page 87 of 139
Inherited Attributes Synthesized Attributes
node n in terms of those nodes' siblings, parents, are used to define a synthesized
and attribute values. attribute at node n.
A single top-down and sideways parse tree Single bottom-up traversal of the parse
traversal is used to evaluate this attribute. tree is used to evaluate this attribute.
It must have a non-terminal in its body. It must have a non-terminal as its head.
Parsers Comparison:
L-attributed SDT : If an SDT uses either synthesized attributes or inherited attributes with a
restriction that it can inherit values from left siblings only, it is called as L-attributed SDT.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner.
Activation Record : Information needed by a single execution of a procedure is managed using a
contiguous block of storage called activation record. An activation record is allocated when a
procedure is entered and it is deallocated when that procedure is exited.
Intermediate Code : They are machine independent codes. Syntax trees, postfix notation, 3-
address codes can be used to represent intermediate code.
Three address code:
1. Quadruples (4 fields : operator, operand1, operand2, result)
2. Triplets (3 fields : operator, operand1, operand2)
3. Indirect triples
Code Optimization :
Types of machine independent optimizations –
1. Loop optimizations:
Code motion: reduce the evaluation frequency of expression.
Loop unrolling: to execute less number of iterations
Loop jamming: combine body of two loops whenever they are sharing same index.
2. Constant folding: replacing the value of constants during compilation
3. Constant propagation: replacing the value of an expression during compile time.
4. Strength reduction: replacing costly operators by simple operators.
Recursive evaluators
Page 89 of 139
The evaluator is the core of the interpreter--it's what does all of the interesting work to evaluate
complicated expressions. The reader translates textual expressions into a convenient data
structure, and the evaluator actually interprets it, i.e., figures out the "meaning" of the expression.
Evaluation is done recursively. We write code to evaluate simple expressions, and use recursion
to break down complicated expressions into simple parts.
I'll show a simple evaluator for simple arithmetic expressions, like a four-function calculator,
which you can use like this, given the read-eval-print-loop above:
Scheme>(repl math-eval); start up read-eval-print loop w/arithmetic eval
repl>11
repl>(plus 1 2) 3
repl>(times (plus 1 3) (minus 4 2)) 8
As before, the read-eval-print-loop reads what you type at the repl> prompt as an s-expression,
and calls math-eval.
Here's the main dispatch routine of the interpreter, which figures out what kind of expression it's
given, and either evaluates it trivially or calls math-eval-combo to help:
(define (math-eval expr)
(cond;; self-evaluating object? (we only handle numbers)
((number? expr)
expr)
;; compound expression? (we only handle two-arg combinations)
(else
(math-eval-combo expr))))
First math-eval checks the expression to see if it's something simple that it can evaluate
straightforwardly, without recursion.
The only simple expressions in our language are numeric literals, so math-eval just uses the
predicate number? to test whether the expression is a number. If so, it just returns that value.
(Voila! We've implemented self-evaluating literals.)
If the expression is not simple, it's supposed to be an arithmetic expression with an operator and
two operands, represented as a three element list. (This is the subset of Scheme's combinations
that this interpreter can handle.) In this case, math-eval calls math-eval-combo.
(define (math-eval-combo expr)
(let ((operator-name (car expr))
(arg1 (math-eval (cadr expr)))
(arg2 (math-eval (caddr expr))))
Page 90 of 139
(cond ((eq? operator-name 'plus)
(+ arg1 arg2))
((eq? operator-name 'minus)
(- arg1 arg2))
((eq? operator-name 'times)
(* arg1 arg2))
((eq? operator-name 'quotient)
(/ arg1 arg2))
(else
(error "Invalid operation in expr:" expr)))))
math-eval-combo handles a combination (math operation) by calling math-eval recursively to
evaluate the arguments, checking which operator is used in the expression, and calling the
appropriate Scheme procedure to perform the actual operation.
Comments on the Arithmetic Evaluator
The 4-function arithmetic evaluator is very simple, but it demonstrates several important
principles of Scheme programming and programming language implementation.
Recursive style and Nested Lists. Note that an arithemetic expression is represented as an s-
expression that may be a 3-element list. If it's a three-element list, that list is made up of three
objects (pairs), but we essentially treat it as a single conceptual object--a node in a parse tree of
arithemetic expressions. The overall recursive structure of the evaluator is based on this
conceptual tree, not on the details of the lists' internal structure. We don't need recursion to
traverse the lists, because the lists are of fixed length and we can extract the relevant fields
using car, cadr, and caddr. We are essentially treating the lists as three-element structures. This
kind of recursion is extremely common in Scheme--nested lists are far more common than "pair
trees." As in the earlier examples of recursion over lists and pair trees, the main recursive
procedure can accept pointers to either interior nodes (lists representing compound
expressions), or leaves of the tree. Either counts as an expression.
Dynamic typing lets us implement this straightforwardly, so that our recursion doesn't have to
"bottom out" until we actually hit a leaf. Things would be more complicated in C or Pascal,
which don't allow a procedure to accept an argument that may be either a list or a number.\
footnote{In C or Pascal, we could represent all of the nodes in the expression tree as variant
records (in C, "unions") containing an integer or a list. We don't need to do that in Scheme,
because in Scheme every variable's type is really a kind of variant record--it can hold a (pointer
to a) number or a (pointer to a) pair or a (pointer to) anything else. C is particularly problematic
for this style of programming, because even if we bite the bullet and always define a variant
record type, the variant records are untagged. C doesn't automatically keep track of which variant
a particular record represents e.g., a leaf or nonleaf--and you must code this yourself by adding a
Page 91 of 139
tag field, and setting and checking it appropriately. In effect, must implement dynamic typing
yourself, every time.} It is possible to do Scheme-style recursion straightforwardly in some
statically-typed languages, notably ML and Haskell. These polymorphic languages allow you to
declare disjoint union types. A disjoint union is an "any of these" type--you can say that an
argument will be of some type or some other type. In Scheme, the language only supports one
very general kind of disjoint union type: pointer to anything. However, we usually think of data
structure definitions as disjoint unions. As usual, we can characterize what an arithmetic
expression recursively. It is either a numeric literal (the base case) or a three-element "node"
whose first "field" is an operator symbol and whose second and third "fields" are arithmetic
expressions.
Also as usual, this recursive characterization is what dictates the recursive structure of the
solution-not the details of how nodes are implemented. (The overall structure of recursion over
trees would be the same if the interior nodes were arrays or records, rather than linear lists.) The
conceptual "disjoint union" of leaves and interior nodes is what tells us we need a two-branch
conditional in math-eval. It is important to realize that in Scheme, we usually discriminate
between cases at edges in the graph, i.e., the pointers, rather than focusing on the nodes.
Conceptually, the type of the expr argument is an edge in the expression graph, which may point
to either a leaf node or an interior node. We apply math-eval to each edge, uniformly, and it
discriminates between the cases. We do not examine the object it points to and decide whether to
make the recursive call we always do the recursive call, and sort out the cases in the callee.
Primitive expressions and operations. In looking at any interpreter, it's important to notice which
operations are primitive, and which are compound. Primitive operations are "built into" the
interpreter, but the interpreter allows you to construct more complicated operations in terms of
those. In math-eval, the primitive operations are addition, subtraction, multiplication, and
division. We "snarf" these operations from the underlying Scheme system, in which we're
implementing our little four-function calculator. We don't implement addition, but we do
dispatch to this built-in addition operation. On the other hand, compound expressions are not
built-in.
The interpreter doesn't have a special case for each particular kind of expression e.g., there's no
code to add 4 to 5. We allow users to combine expressions by arbitrarily nesting them, and
support an effectively infinite number of possible expressions. Later, show more advanced
interpreters that support more kinds of primitive expressions not just numeric literals and more
kinds of primitive operations not just four arithmetic functions. I'll also show how a more
advanced interpreter can support more different ways of combining the primitive expressions.
Flexibility One reason for implementing your own interpreter is flexibility. You can change the
features of the language by making minor changes to the interpreter. For example, it is trivial to
modify math-eval to evaluate infix expressions rather than postfix expressions. (That is, with the
operator in the middle, e.g., (10 plus (3 times 2)). All we have to do is change the two lines
where the operator and the first operand are extracted from a compound expression. We just
swap the car and cadr, so that we treat the second element of the list as the operand and the first
element as the operator.
Page 92 of 139
Exercise
Read about:
1. Space for attribute values at compile time
2. Assigning space at compile time
Reference
https://www.csd.uwo.ca/~mmorenom/CS447/Lectures/Translation.html/node4.html
https://edurev.in/question/1755392/In-a-bottom-up-evaluation-of-a-syntax-directed-definition--
inherited-attributes-cana-Always-be-evalu
Attention Computer Science Engineering (CSE) Students! To make sure you are not studying
endlessly, EduRev has designed Computer Science Engineering (CSE) study material, with
Structured Courses, Videos, & Test Series. Plus get personalized analysis, doubt solving, and
improvement plans to achieve a great score in Computer Science Engineering (CSE).
Reference
https://www.geeksforgeeks.org/compiler-design-variants-of-syntax-tree/
Compiler Design
Phases of Compiler:
Symbol Table : It is a data structure being used and maintained by the compiler, consists all the
identifier’s name along with their types. It helps the compiler to function smoothly by finding the
identifiers quickly.
Lexical Analysis : Lexical analyzer reads a source program character by character to produce
tokens. Tokens can be identifiers, keywords, operators, separators etc.
Syntax Analysis : Syntax analyzer is also known as parser. It constructs the parse tree. It takes all
the tokens one by one and uses Context Free Grammar to construct the parse tree.
Semantic Analyzer : It verifies the parse tree, whether it’s meaningful or not. It furthermore
produces a verified parse tree.
Intermediate Code Generator : It generates intermediate code, that is a form which can be readily
executed by machine We have many popular intermediate codes.
Code Optimizer : It transforms the code so that it consumes fewer resources and produces more
speed.
Page 93 of 139
Target Code Generator : The main purpose of Target Code generator is to write a code that the
machine can understand. The output is dependent on the type of assembler.
Error handling :
The tasks of the Error Handling process are to detect each error, report it to the user, and then
make some recover strategy and implement them to handle error. An Error is the blank entries in
the symbol table. There are two types of error :
Run-Time Error : A run-time error is an error which takes place during the execution of a
program, and usually happens because of adverse system parameters or invalid input data.
Compile-Time Error: Compile-time errors rises at compile time, before execution of the
program.
1. Lexical :This includes misspellings of identifiers, keywords or operators.
2. Syntactical :missing semicolon or unbalanced parenthesis.
3. Semantical :incompatible value assignment or type mismatches between operator and
operand.
4. Logical :code not reachable, infinite loop.
Left Recursion : The grammar : A -> Aa | a is left recursive. Top down parsing techniques
A’ -> aA’ | a
Left Factoring : If a grammar has common prefixes in r.h.s of nonterminal then suh grammar
A’ -> A -> b1 | c2
FIRST(A) is a set of the terminal symbols which occur as first symbols in string derived from A
FOLLOW(A) is the set of terminals which occur immediately after the nonterminal A in the
strings derived from the starting symbol.
Top-down parser
LL(1) Parser : LL(1) grammar is unambiguous, left factored and non-left recursive.
Bottom up parser
Page 94 of 139
LR(0) Parser : Closure() and goto() functions are used to create canonical collection of LR items.
Conflicts in LR(0) parser :
1. Shift Reduce (SR) conflict : when the same state in DFA contains both shift and reduce items.
A -> B . xC (shifting) B -> a. (reduced)
2. Reduced Reduced (RR) conflict : two reductions in same state of DFA A -> a. (reduced) B ->
b. (reduced)
SLR Parser : It is powerful than LR(0).
Ever LR(0) is SLR but every SLR need not be LR(0).
Conflicts in SLR
1. SR conflict : A -> B . xC (shifting) B -> a. (reduced) if FOLLOW(B) ∩ {x} ≠ φ
2. RR conflict : A -> a. (reduced) B -> b. (reduced) if FOLLOW(A) ∩ FOLLOW(B) ≠ φ
Assignments
Read about: Analysis of syntax-directed definitions. Type Checking:Type systems,
Specification of a simple type checker, Equivalence of type expressions,
Type conversions, Overloading of functions and operators, Polymorphic
functions, An algorithm for unification. Run-Time Environments:Source
language issues, Storage organization, Storage-allocation strategies.
Read about: Access to nonlocal names, parameter passing, Symbol tables, Language
facilities for dynamic storage allocation, Dynamic storage allocation
techniques, Storage allocation in Fortran. Intermediate Code
Generation:Intermediate languages, Declarations, Assignment statements,
Boolean expressions, Case statements, Back Patching, Procedure
calls.Code generation:Issues in the design of a code generator, The target
machine, Run-time storage management, Basic blocks and flow graphs,
Next-use information.
Read about: A Simple code generator, Register allocation and assignment, The dag
representation of basic blocks, Peephole optimization, Generating code
from dags, Dynamic programming code-generation algorithm, Code-
generator generators. Code Optimization: Introduction, The Principal
sources of optimization, Optimization of basic blocks, Loops in flow
graphs, Introduction to global data-flow analysis, Iterative solution of
data-flow equations, Code improving transformations, Dealing with
aliases, Data-flow analysis of structured flow graphs, Efficient data-flow
algorithms, A tool for data-flow analysis, and Estimation of types.
Read about: Symbolic debugging of optimized code. Advanced topics include garbage
collection; dynamic data structures, pointer analysis, aliasing; code
scheduling, pipelining; dependence testing; loop level optimisation;
superscalar optimisation; profile-driven optimisation; debugging support;
Page 95 of 139
incremental parsing; type inference; advanced parsing algorithms;
practical attribute evaluation; function in-lining and partial evaluation.
Weeks 9 & 10: Analysis of syntax-directed definitions. Type Checking: Type systems,
Specification of a simple type checker, Equivalence of type expressions,
Type conversions, Overloading of functions and operators, Polymorphic
functions, An algorithm for unification. Run-Time Environments: Source
language issues, Storage organization, Storage-allocation strategies.
Polymorphic functions
What are polymorphic functions?
Answer
Those functions that can evaluate to or be applied to values of different types are known as
polymorphic functions. A data type that can appear to be of a generalized type (e.g. a list with
elements of arbitrary type) is designated polymorphic data type like the generalized type from
which such specializations are made.
In programming language theory and type theory, polymorphism is the provision of a
single interface to entities of different types or the use of a single symbol to represent multiple
different types. The concept is borrowed from a principle in biology where an organism or
species can have many different forms or stages.
Page 96 of 139
(ii) Parametric polymorphism: not specifying concrete types and instead use abstract
symbols that can substitute for any type.
(iii) Subtyping (also called subtype polymorphism or inclusion polymorphism): when a
name denotes instances of many different classes related by some common
superclass.
Ad hoc polymorphism
Christopher Strachey chose the term ad hoc polymorphism to refer to polymorphic functions that
can be applied to arguments of different types, but that behave differently depending on the type
of the argument to which they are applied (also known as function overloading or operator
overloading). The term "ad hoc" in this context is not intended to be pejorative; it refers simply
to the fact that this type of polymorphism is not a fundamental feature of the type system. In
the Pascal / Delphi example below, the Add functions seem to work generically over two types
(integer and string) when looking at the invocations, but are considered to be two entirely distinct
functions by the compiler for all intents and purposes:
program Adhoc;
begin
Writeln(Add(1, 2)); (* Prints "3" *)
Writeln(Add('Hello, ', 'Mammals!')); (* Prints "Hello, Mammals!" *)
end.
In dynamically typed languages the situation can be more complex as the correct function that needs
to be invoked might only be determinable at run time.
Implicit type conversion has also been defined as a form of polymorphism, referred to as "coercion
polymorphism".
Parametric polymorphism
Parametric polymorphism allows a function or a data type to be written generically, so that it can
handle values uniformly without depending on their type. Parametric polymorphism is a way to
make a language more expressive while still maintaining full static type-safety.
The concept of parametric polymorphism applies to both data types and functions. A function
that can evaluate to or be applied to values of different types is known as a polymorphic
function. A data type that can appear to be of a generalized type (e.g. a list with elements of
Page 97 of 139
arbitrary type) is designated polymorphic data type like the generalized type from which such
specializations are made.
Parametric polymorphism is ubiquitous in functional programming, where it is often simply
referred to as "polymorphism". The following example in Haskell shows a parameterized list
data type and two parametrically polymorphic functions on them:
class List<T> {
class Node<T> {
T elem;
Node<T> next;
}
Node<T> head;
int length() { ... }
}
John C. Reynolds (and later Jean-Yves Girard) formally developed this notion of polymorphism
as an extension to lambda calculus (called the polymorphic lambda calculus or System F). Any
parametrically polymorphic function is necessarily restricted in what it can do, working on the
shape of the data instead of its value, leading to the concept of parametricity.
Subtyping
Some languages employ the idea of subtyping (also called subtype polymorphism or inclusion
polymorphism) to restrict the range of types that can be used in a particular case of
polymorphism. In these languages, subtyping allows a function to be written to take an object of
a certain type T, but also work correctly, if passed an object that belongs to a type S that is a
subtype of T (according to the Liskov substitution principle). This type relation is sometimes
written S <: T. Conversely, T is said to be a supertype of S—written T :> S. Subtype
polymorphism is usually resolved dynamically (see below).
In the following Java example we make cats and dogs subtypes of pets. The
procedure letsHear() accepts a pet, but will also work correctly if a subtype is passed to it:
Page 98 of 139
abstract class Pet {
abstract String speak();
}
In today’s world, the life of a developer would be difficult without Polymorphism. It allows us to
treat items from different instructions as though they belong to a shared superclass. To
implement Polymorphism, we use Polymorphic functions.
What is Polymorphism?
Polymorphism is a Greek word. It comprises two words, where Poly means "many" and
morphism means "forms". It is the ability of an object to take on many forms. Polymorphism
allows you to “program in general” rather than “program in specific." It is the capability of a
method to do different things based on the object.
There are two types of Polymorphism:
1. Compile Time Polymorphism.
int main()
{
cout << add(71, 72)
<< " is Integer addition Output\n";
cout << add("Coding", " Ninjas")
<< " is String Concatenation Output\n";
}
Output:
143 is Integer addition Output
Coding Ninjas is String Concatenation Output
So, we are calling two different functions (which differ in the type of arguments), and both of
them have the same name to execute multiple operations. We have successfully achieved Ad-hoc
Polymorphism.
2. Parametric Polymorphism:
It is also known as "Early Binding Parametric Polymorphism.” It opens a way to use the same
code for different data types. It is implemented by using Templates. For example: To develop an
understanding of this sort of Polymorphism, let us execute a program to find the greater of two
Integers or two Strings.
#include <iostream>
#include <string>
using namespace std;
template <class temp>
Page 101 of 139
temp greater(temp a, temp b)
{
if (a > b)
return a;
else
return b;
}
int main()
{
cout <<:: greater(96, 69) << endl;
string str1("Coding"), str2("Ninja");
cout <<:: greater(str1, str2) << endl;
}
A unification algorithm was first discovered by Jacques Herbrand, while a first formal
investigation can be attributed to John Alan Robinson, who used first-order syntactical
unification as a basic building block of his resolution procedure for first-order logic, a great step
forward in automated reasoning technology, as it eliminated one source of combinatorial
explosion: searching for instantiation of terms. Today, automated reasoning is still the main
application area of unification. Syntactical first-order unification is used in logic
programming and programming language type system implementation, especially in Hindley–
Milner based type inference algorithms. Semantic unification is used in SMT solvers, term
rewriting algorithms and cryptographic protocol analysis. Higher-order unification is used in
proof assistants, for example Isabelle and Twelf, and restricted forms of higher-order unification
(higher-order pattern unification) are used in some programming language implementations,
such as lambdaProlog, as higher-order patterns are expressive, yet their associated unification
procedure retains theoretical properties closer to first-order unification.
Weeks 11 & 12: Access to nonlocal names, parameter passing, Symbol tables, Language
facilities for dynamic storage allocation, Dynamic storage allocation
techniques, Storage allocation in Fortran. Intermediate Code Generation:
Intermediate languages, Declarations, Assignment statements, Boolean
expressions, Case statements, Back Patching, Procedure calls. Code
generation: Issues in the design of a code generator, The target machine,
Run-time storage management, Basic blocks and flow graphs, Next-use
information.
not found within, coming from, or relating to a small area, especially of a country: Non-local
attendees to the arts festival spend significantly more than local attendees, according to the
report. Take for example one can say that “A large number of items made of nonlocal materials
were found at the archaeological site”.
The non-local means algorithm replaces the value of a pixel by an average of a selection of other
pixels values: small patches centered on the other pixels are compared to the patch centered on
the pixel of interest, and the average is performed only for pixels that have patches close to the
current patch.
In python, nonlocal variables refer to all those variables that are declared within nested
functions. The local scope of a nonlocal variable is not defined. This essentially means that the
variable exists neither in the local scope nor in the global scope.
Reference
https://estudies4you.blogspot.com/2017/09/access-to-nonlocal-names.html
For FORTRAN and other languages which allow static storage allocation, the amount of storage
required to hold each variable is fixed at translation time. Such languages have no nested
procedures or recursion and thus only one instance of each name (the same identifier may be
used in different context, however).
What is array in Fortran?
An array is a named collection of elements of the same type. It is a nonempty sequence of data
and occupies a group of contiguous storage locations. An array has a name, a set of elements,
and a type. An array name is a symbolic name for the whole sequence of data.
! Inputs
integer(IK), intent(in) :: n
Page 106 of 139
! Outputs
real(SP), allocatable, intent(out) :: x(:)
! Local variables
integer :: alloc_status
character(len=*), parameter :: srname = 'ALLOC_RVECTOR_SP'
! Preconditions
call validate(n >= 0, 'N >= 0', srname)
! According to the Fortran 2003 standard, when a procedure is invoked, any allocated
ALLOCATABLE
! object that is an actual argument associated with an INTENT(OUT) ALLOCATABLE dummy
argument is
! deallocated. So it is unnecessary to write the following line since F2003 as X is
INTENT(OUT):
!!if (allocated(x)) deallocate (x)
! Allocate memory for X
allocate (x(n), stat=alloc_status)
call validate(alloc_status == 0, 'Memory allocation succeeds (ALLOC_STATUS == 0)', srname)
call validate(allocated(x), 'X is allocated', srname)
! Initialize X to a strange value independent of the compiler; it can be costly for a large N.
x = -huge(x)
! Postconditions
call validate(size(x) == n, 'SIZE(X) == N', srname)
end subroutine alloc_rvector_sp
[Update (2022-01-25): I shuffled the lines a bit, moving validate(allocated(x), 'X is allocated',
srname) to the above of x = -huge(x).]
Practice questions:
1. What do you think about this implementation? Any comments, suggestions, and criticism will
be appreciated.
A related and more particular question is the following.
2. What is the best practice of allocating large memory in Fortran?
The question can be further detailed as follows.
Page 107 of 139
3. What does “large” mean under a modern and practical setting?
To be precise, let us consider a PC/computing node with >= 4GB of RAM. In addition, the
hardware (RAM, CPU, hard storage, etc), the compiler, and the system are reasonably
mainstream and modern, e.g., not more than 10 years old.
4. What special caution should be taken when the memory to allocate is large by the answer
to 2.1?
Boolean expressions
What is a Boolean expression?
Answer
Example 1: (a>b && a>c) is a Boolean expression. It evaluates the condition by comparing if 'a'
is greater than 'b' and also if 'a' is greater than 'c'.
Example 2: (2>1 && 10>9) It evaluates the condition by comparing if '2' is greater than '1' and
also if '10' is greater than '9'.
A logical statement that results in a Boolean value, either be True or False, is a Boolean
expression. Sometimes, synonyms are used to express the statement such as ‘Yes’ for ‘True’ and
‘No’ for ‘False’.
Also, 1 and 0 are used for digital circuits for True and False, respectively.
Boolean expressions are the statements that use logical operators, i.e., AND, OR, XOR and
NOT. Thus, if we write X AND Y = True, then it is a Boolean expression.
Boolean operators
Most programming languages have the Boolean operators OR, AND and NOT; in C and
some languages inspired by it, these are represented by "||" (double pipe character), "&&"
(double ampersand) and "!" (exclamation point) respectively, while the corresponding bitwise
operations are represented by "|", "&" and "~" (tilde). In the mathematical literature the symbols
used are often "+" (plus), "·" (dot) and overbar, or "∨" (vel), "∧" (et) and "¬" (not) or "′"
(prime).
Boolean Algebra
Boolean algebra is the category of algebra in which the variable’s values are the truth
values, true and false, ordinarily denoted 1 and 0 respectively. It is used to analyze and simplify
digital circuits or digital gates. It is also called Binary Algebra or logical Algebra. It has been
fundamental in the development of digital electronics and is provided for in all modern
programming languages. It is also used in set theory and statistics.
The important operations performed in Boolean algebra are – conjunction (∧), disjunction (∨)
and negation (¬). Hence, this algebra is far way different from elementary algebra where the
values of variables are numerical and arithmetic operations like addition, subtraction,
multiplication, and division has been performed on them.
OR + (or) ∨ Lowest
Suppose A and B are two Boolean variables, then we can define the three operations as;
Boolean Algebra: Boolean algebra is the branch of algebra that deals with logical operations and
binary variables.
Boolean Function: A Boolean function consists of binary variables, logical operators, constants
such as 0 and 1, equal to the operator, and the parenthesis symbols.
Truth Table: The truth table is a table that gives all the possible values of logical variables and
the combination of the variables. It is possible to convert the Boolean equation into a truth table.
The number of rows in the truth table should be equal to 2n, where “n” is the number of variables
Page 110 of 139
in the equation. For example, if a Boolean equation consists of 3 variables, then the number of
rows in the truth table is 8. (i.e.,) 23 = 8.
A B A∧B A∨B
A ¬A
True False
False True
(a) Variable used can have only two values. Binary 1 for HIGH and Binary 0 for LOW.
(b) The complement of a variable is represented by an overbar.
Related Links
There are six types of Boolean algebra laws. They are: Commutative law, Associative law,
Distributive law, AND law, OR law and Inversion law.
Any binary operation which satisfies the following expression is referred to as a commutative
operation. Commutative law states that changing the sequence of the variables does not have any
effect on the output of a logic circuit.
A. B = B. A
A+B=B+A
Associative Law
It states that the order in which the logic operations are performed is irrelevant as their effect is
the same.
( A. B ). C = A . ( B . C )
( A + B ) + C = A + ( B + C)
A. ( B + C) = (A. B) + (A. C)
Page 112 of 139
A + (B. C) = (A + B) . ( A + C)
These laws use the AND operation. Therefore they are called AND laws.
A .0 = 0
A.1=A
A. A = A
A. A ¯=0
(4) OR Law
These laws use the OR operation. Therefore, they are called OR laws.
A +0=A
A+1=1
A+A=A
A+ A =1
In Boolean algebra, the inversion law states that double inversion of variable results in the
original variable itself.
Á = A
The two important theorems which are extremely used in Boolean algebra are De Morgan’s First
law and De Morgan’s second law. These two theorems are used to change the Boolean
expression. This theorem basically helps to reduce the given Boolean expression in the
simplified form. These two De Morgan’s laws are used to change the expression from one form
to another form. Now, let us discuss these two theorems in detail.
The first law states that the complement of the product of the variables is equal to the sum of
their individual complements of a variable.
The truth table that shows the verification of De Morgan’s First law is given as follows:
Page 113 of 139
A B A’ B’ (A.B)’ A’+B’
0 0 1 1 1 1
0 1 1 0 1 1
1 0 0 1 1 1
1 1 0 0 0 0
The second law states that the complement of the sum of variables is equal to the product of their
individual complements of a variable.
The following truth table shows the proof for De Morgan’s second law.
A B A’ B’ (A+B)’ A’. B’
0 0 1 1 1 1
0 1 1 0 0 0
1 0 0 1 0 0
1 1 0 0 0 0
Solved Examples
C+ B C
Solution:
Given:
C+ B C
C+( B+C )
(C+C )+ B
1+ B = 1
Therefore,
1+ BC=1
A B D B+D A(B+D)
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
1 0 0 0 0
1 0 1 1 1
1 1 0 1 1
1 1 1 1 1
Stay tuned with BYJU’S – The Learning App and also explore more videos.
In Mathematics, Boolean algebra is called logical algebra consisting of binary variables that hold
the values 0 or 1, and logical operations.
(Q2) What are some applications of Boolean algebra?
In electrical and electronic circuits, Boolean algebra is used to simplify and analyze the logical or
digital circuits.
(Q3) What are the three main Boolean operators?
In Boolean logic, zero (0) represents false and one (1) represents true. In many applications, zero
is interpreted as false and a non-zero value is interpreted as true.
Reference
https://byjus.com/maths/boolean-algebra/
Question:
Why symbolic debugging of optimized code?
Symbolic debuggers
Definition: Symbolic debuggers are program development tools that allow a user to interact with
an executing process at the source level.
Page 117 of 139
Or
Definition: Symbolic debuggers are system development tools that can accelerate the validation
speed of behavioral specifications by allowing a user to interact with an executing code at the
source level.
The use of the -g switch, which is needed for source-level debugging, affects the size of the
program executable on disk, and indeed the debugging information can be quite large. However,
it has no effect on the generated code (and thus does not degrade performance)
Since the compiler generates debugging tables for a compilation unit before it performs
optimizations, the optimizing transformations may invalidate some of the debugging data.
Therefore, one need to anticipate certain anomalous situations that may arise while debugging
optimized code. These are the most common cases:
1. ‘The ‘hopping Program Counter’:’ Repeated step or next commands show the PC bouncing
back and forth in the code. This may result from any of the following optimizations:
(i) ‘Common subexpression elimination:’ using a single instance of code for a quantity that the
source computes several times. As a result, one may not be able to stop on what looks like a
statement.
(ii) ‘Invariant code motion:’ moving an expression that does not change within a loop, to the
beginning of the loop.
(iii) ‘Instruction scheduling:’ moving instructions so as to overlap loads and stores (typically)
with other code, or in general to move computations of values closer to their uses. Often this
causes you to pass an assignment statement without the assignment happening and then later
bounce back to the statement when the value is actually needed. Placing a breakpoint on a line of
code and then stepping over it may, therefore, not always cause all the expected side-effects.
Page 118 of 139
2. ‘The ‘big leap’:’ More commonly known as ‘cross-jumping’, in which two identical pieces of
code are merged and the program counter suddenly jumps to a statement that is not supposed to
be executed, simply because it (and the code following) translates to the same thing as the code
that ‘was’ supposed to be executed. This effect is typically seen in sequences that end in a jump,
such as a goto, a return, or a break in a C switch statement.
3. ‘The ‘roving variable’:’ The symptom is an unexpected value in a variable. There are various
reasons for this effect:
(a) In a subprogram prologue, a parameter may not yet have been moved to its
‘home’.
(b) A variable may be dead, and its register re-used. This is probably the most
common cause.
(d) A variable may be eliminated entirely by value propagation or other means. In this
case, GCC may incorrectly generate debugging information for the variable
In general, when an unexpected value appears for a local variable or parameter you
should first ascertain if that value was actually computed by your program, as opposed to
being incorrectly reported by the debugger. Record fields or array elements in an object
designated by an access value are generally less of a problem, once you have ascertained
that the access value is sensible. Typically, this means checking variables in the
preceding code and in the calling subprogram to verify that the value observed is
explainable from other values (one must apply the procedure recursively to those other
values); or re-running the code and stopping a little earlier (perhaps before the call) and
stepping to better see how the variable obtained the value in question; or continuing to
step ‘from’ the point of the strange value to see if code motion had simply moved the
variable’s assignments later.
In light of such anomalies, a recommended technique is to use -O0 early in the software
development cycle, when extensive debugging capabilities are most needed, and then move to -
O1 and later -O2 as the debugger becomes less critical. Whether to use the -g switch in the
release version is a release management issue. Note that if you use -g you can then use
the strip program on the resulting executable, which removes both debugging information and
global symbols.
Quiz
What is the main purpose of debugger?
Debugging tools (called debuggers) are used to identify coding errors at various development
stages. They are used to reproduce the conditions in which error has occurred, then examine the
program state at that time and locate the cause.
Code scheduling
Code scheduling is an important part of compiler design. It is necessary to know more about
the code-scheduling process because it can help someone to ensure that compiler is producing
optimal machines for programs.
Data dependence analysis is used to determine the order of execution of instructions. It is used
to determine the order of execution of instructions because it gives an indication of how much
effort will be required for a particular instruction to be executed at any given time. A data-
dependent instruction can only be executed after all its dependent instructions have been
completed.
A true data dependency exists if both operands are required for executing an instruction,
which means that neither operand can be skipped during its execution without affecting
program behavior (i.e., no bug). An anti-data dependency exists when one operand is not
required for executing an instruction but another one may change its value and affect program
behavior (i.e., bugs). An output dependence exists when more than one assembler statement
depends upon receiving a constant value from another assembler statement; this means that if
we change this constant value then our outputs will change accordingly even though they
weren’t affected by our previous changes in values!
The compiler is responsible for making tradeoffs between register usage and parallelism.
Register usage refers to the number of registers that a particular instruction uses at runtime,
while parallelism refers to how many instructions can be executed in parallel by a single
processor (or multiple processors).
The recommended strategy for producing the best performance is often called “ register-level
profiling,” which means measuring how much time your program spends using each register
during execution and then modifying your code so it runs more efficiently on those scarce
resources. A good example would be if you have an instruction like ADD_EQ which needs
two operands: A and B; this means you need both these registers available at once when
calculating addition or subtraction instead of keeping only one free for other purposes such as
storing data into memory or passing arguments from function callbacks down through loops
where needed. This can be improved by putting the two operands into separate registers, which
allows you to use one temporarily while performing other instructions on it and then later
retrieve the result of your calculation. If a processor has only one register available at any
given time, then this is known as “register pressure” or “register contention“.
The registers in a processor are not infinite, and if you try to use more than the available
number of them at any given time then performance will suffer. This means that if you have a
loop that is iterating over an array, for example, then it may be beneficial to move some of the
data into memory before going through each iteration rather than keeping all of it in registers
(which would require a lot of copying back and forth).
Register allocation and code scheduling are two phases of the compiler. Register allocation is
the first phase that allocates registers for a particular instruction, whereas code scheduling
refers to the second phase where instructions are placed into machine language. The two
Page 122 of 139
phases are independent of each other; however, they can be performed in any order or
interleaved with each other.
A specific number of registers will be allocated per each instruction in order for it to execute
properly at run time; this need not necessarily match between all possible execution paths
through an application program during one pass through its source code as part of link-time
optimization analysis efforts such as static single assignment form analysis technique.
In contrast with register allocation where only one value may be assigned per branch
conditionally executed branch instruction only needs two semantically equivalent operands but
still requires three clocks cycles total processing time due to additional explicit clock cycles
required when used together with explicit clock signal generation hardware componentry
rather than being used solely in conjunction with precompiled libraries which themselves
typically contain many branches since most programs do not have significant portions
consisting primarily out only a few big chunks like main() routine might require hundreds-
thousands lines worth runtime overhead per iteration loop counter running over N iterations
before finally reaching endpoint conditionally executed section.”
Control Dependence
Control-dependence constraints are the most common data-flow constraints to consider in code
scheduling. They can be classified as true, anti, and output.
True control dependence arises when a control statement depends on the result of another
control statement or other statements. For example, if you have an IF statement with two
expressions that use the same variable, then you must ensure that they are executed in order, or
else your program may not work correctly.
Anti-control dependence occurs when two independent computations need not be performed
together (e.g., A and B) but both of them can execute simultaneously without affecting each
other’s results (e.g., A-B). This type of constraint is often called “data sharing” because it
allows multiple computations to share access to shared resources such as variables and
registers without interfering with each other through implicit mutual exclusion mechanisms
such as memory barriers or synchronization mechanisms like critical sections within threads;
however there may still be some degree of mutual interference due simply because one thread
might try accessing before another thread has finished using those resources!
Speculative execution is when a processor makes assumptions about the future state of its
environment (i.e., instructions that are executed on other processors), then executes those
Page 123 of 139
instructions based on these assumptions. The result of this process is usually more efficient
than all assumptions made at once; however, there are costs associated with making these
kinds of decisions at runtime rather than waiting until they occur during program execution
(like in normal code). Speculative executions are often seen as an alternative approach because
they allow programmers to make runtime decisions before they actually need them instead of
waiting until after everything has been executed already!
The term speculative execution is used to describe a processor’s ability to make assumptions
about future instructions. This can be done by using thread-level parallelism, which allows
multiple instructions from different threads of execution to run at the same time.
The basic machine model consists of a register file, an instruction fetch unit, and an execution
unit. The register file is used to store the contents of memory locations while they are being
modified; it holds only one value at a time (a value can be either in or out). The instruction
fetch unit fetches instructions from memory that are needed by the program being executed (or
if there is no more space left in the register file, then it provides additional storage). After
being fetched, these instructions are decoded into machine code before being executed by the
execution unit which determines how they need to be executed within their context as well as
some other details such as whether or not there are any exceptions that have been detected
during compilation so far.
Code scheduling is used to cover dependency detection and resolution and parallel optimization.
Code scheduling is generally adept in conjunction with traditional compilation. A code scheduler
gets as input a set, or a sequence, of executable instruction, and a set of precedence constraints
enforced on them, frequently in the form of a DAG. As output, it undertakes to deliver, in each
scheduling phase, an instruction that is dependency-free and defines the best option for the
schedule to manage the precise available execution time.
Traditional non-optimizing compilers can be treated as including two major parts. The front-
end part of the compiler implements scanning, parsing, and semantic analysis of the source string
and makes an intermediate representation. This intermediate form is generally described by an
attributed abstract tree and a symbol table. The back-end part, in turn, creates the object code.
Traditional optimizing compilers speed up sequential execution and reduce the needed
memory space generally by removing redundant operations. Sequential optimization needs a
program analysis, which includes control flow, data flow, and dependency analysis in the front-
end part.
There are two different approaches to merging traditional compilation and code scheduling. In
the first, code scheduling is integrated into the compilation procedure. In this method, the code
Page 124 of 139
scheduler facilitates the results of the program analysis make by the front-end part of the
compiler.
The code scheduler generally follows the traditional sequential optimizer in the back-end part,
before register allocation and subsequent code generation. This type of code scheduling is known
as pre-pass scheduling.
The other approach is to help a traditional (sequentially) optimizing compiler and carry out code
scheduling afterward called post-pass scheduling.
Code scheduling can be implemented at three different levels such as basic block, loop, and
global level, as displayed in the Code scheduling category Figure.
The associated scheduling category or techniques are known as basic block (or local), loop, and
global techniques. These techniques increase performance in the order listed.
(i) Basic Block Scheduling In this case, scheduling and code optimization is accomplished
independently for each basic block, one after another.
(ii) Loop-Level Scheduling The next level of scheduling is loop-level scheduling. Therefore,
instructions belonging to ensuing iterations of a loop can generally be overlapped, resulting in
considerable speed-up.
Global Code Scheduling in compiler design is the process that is performed to rearrange the
order of execution of code which improves performance.
In the fifth phase of compiler design, code optimization is performed. There are various code
optimization techniques. But the order of execution of code in a computer program also
matters in code optimization. Global Code Scheduling in compiler design is the process that is
performed to rearrange the order of execution of code which improves performance. It
comprises the analysis of different code segments and finding out the dependency among them.
Code Hoisting: In this technique, the code segment is moved from inside a loop to outside the
loop. It is done when the output of the code segment does not change with loop’s iteration. It
reduces loop overhead and redundant computation.
C++
//before code hoisting
int x,y,b,a;
x=1,y=2,a=0;
while(a<10)
b=x+y;
cout<<a;
a++;
int main() {
int x,y,b,a;
x=1,y=2,a=0;
b=x+y;
while(a<10)
cout<<a;
a++;
int a,b;
a=0,b=1;
for(int i=0;i<5;i++)
cout<<a++;
for(int i=0;i<5;i++)
cout<<b++;
int main() {
int a,b;
a=0,b=1;
for(int i=0;i<5;i++)
cout<<a++;
cout<<b++;
Memory Access Optimization: In this technique, the memory’s read or write operation is
moved out from the loops or blocks. This method eliminates redundant memory accesses and
enhances cache utilization.
C++
//before upward code motion
int ans= 0;
ans+=a+b;
cout<<ans;
int ans= 0;
ans+=z;
cout<<ans;
C++
//before downward code motion
ans+= i;
cout<<ans;
cout<<ans;
ans+= i;
Steps involved:
Step 1: Analyze the code segment that is being moved and note all the variables that depends on
it.
Step 2: If a code segment is moved from outside of the block to inside, some of the functions
may become unavailable due to the change in scope of the code. We need to introduce new
declarations inside the block.
Step 3: In this step, references of the variables are updated that are being moved. Replace
references to variables that were defined outside the block with references to the new
declarations inside the block.
Step 4: If the code segment includes assignments to variables, make sure they are updated
accordingly. Example, replace references to variables that were defined outside the block with
references to the new declarations inside the block.
Step 5: At last, verification is done to ensure that the moved code gives the correct result and
doesn’t produce any error.
1. Trace Scheduling:
In trace scheduling algorithm, we rearrange the instructions along traces (frequently executed
paths). This helps in improving the performance of the code.
2. List Scheduling:
In list scheduling algorithm, the overall execution time of a program can be reduced by
rescheduling instructions. They can be rescheduled based on their availability or resource
constraints.
It is a flexible algorithm
Can handle multiple constraints like resource constraints and instructions latencies
Used in modern compilers
3. Modulo Scheduling:
4. Software Pipelining:
In this algorithms, loop iterations are overlapped to improve performance and reduce time
complexity. It executes multiple loops simultaneously. The main aim of this algorithm is to
reduce loop-level parallelism and increase instruction-level parallelism.
In Code optimization phase of compiler, the main aim is to increase the overall performance of
the code. Code Motion Techniques are used to improve the performance of the program.
int sum=0;
for(int i=0;i<5;i++)
sum+=5;
cout<<i;
int main() {
int sum=0;
sum+=5;
for(int i=0;i<5;i++)
cout<<i;
int ans= 0;
ans+=a*b;
//after
int z=a*b;
int ans= 0;
ans+=z;
cout<<ans;
C++
//before dead code elimination
int sum = 0;
if (a>0)
sum=a+b;
return sum;
else
...
int sum = 0;
if (a>0)
sum=a+b;
return sum;
In this example, the else part is removed as it is unreachable part of the code.
Loop Carried Code Motion: It is same as loop invariant code motion. In this, the loop invariant
part of the loop is moved out of the loop to reduce the number of computations.
Interaction with Dynamic Schedulers
Dynamic schedulers are used in processors. They dynamically reorder the instructions and helps
in maximizing the utilization of resources.
When interacting with Dynamic Schedulers, the analysis of instruction dependency must be done
correctly. Dynamic schedulers determine the order of execution of instructions correctly only
when the data is accurate. Thus, the analysis of instruction dependency is important when
interacting with dynamic schedulers. It involves analyzing data dependencies, control
dependencies and resource dependencies. They work by assigning priorities to instructions so
that they can be executed in a certain order. Dynamic schedulers make decision based on the
availability of resources. Dynamic Schedulers also helps in handling resource conflicts when
there is a conflict among various resources. Code optimization also helps dynamic schedulers to
improve instruction scheduling and resource utilization.
As it is clear from the above, the compiler is a very important part of the programming
language. It is helpful in making programs run quickly and smoothly. But one must also be
aware of the constraints imposed by the hardware on these machines. These constraints may
not affect all compilers equally; some might be more powerful than others while others simply
cannot handle them well at all. So, if you want to write programs that are compatible with
those available today or in future generations then there are certain things that need to be taken
care of before writing even a single line of code.
Practice questions
What is code scheduling in compiler design?
Reference
https://www.geeksforgeeks.org/code-scheduling-constraints/