Compiler Construction
Compiler Construction
▪ Don’ts
◦ Use of cell phones
◦ Discussion with fellows during class (unless told otherwise)
Compilers
▪ A compiler is a program that can read a program in one language - the source
language - and translate it into an equivalent program in another language - the target
language
Compilers
▪ An important role of the compiler is to report any errors in the source program that it
detects during the translation process.
▪ If the target program is an executable machine-language program, it can then be called
by the user to process inputs and produce outputs;
Interpreters
▪ An interpreter translates the code line by line when the program is running.
Compiler vs Interpreter
▪ A compiler takes an entire program and a lot of time to analyze the source code,
whereas the interpreter takes a single line of code and very little time to analyze it.
▪ Compiler generates intermediate object code whereas interpreter does not produces
any intermediate object code
▪ Memory requirement is more due to the creation of object code whereas in
interpreter requires as it does not create intermediate object code
Compiler vs Interpreter
▪ Compiler display all errors after compilation, all at the same time whereas display
error of each line one by one
▪ C, C++, and C# are the examples of compilers where as python are the example of an
interpreter
Example of Java Compilation Process
▪ Java language processors combine compilation and interpretation.
▪ A Java source program may first be compiled into an intermediate form called
bytecodes. The bytecodes are then interpreted by a virtual machine.
Example of Java Compilation Process
▪ A benefit of this arrangement is that bytecodes
compiled on one machine can be interpreted on
another machine.
Why need to study Compiler Construction?
▪ Machine is only understood binary language so it is important to use the compiler
which help to convert the high-level language to low level language.
▪ Anyone who does any software development needs to use a compiler. It is a good idea
to understand what is going on inside the tools that you use.
Generations of Programming Languages
▪ First Generation of PL (Machine Language)
▪ Second Generation of PL (Assembly Language)
▪ Third Generation of PL (Procedural Language)
▪ Fourth Generation of PL (Very High Level Language)
▪ Fifth Generation of PL
First Generation of PL / Machine Language
▪ The first generation of languages are also called machine languages/ 1G language. This
language is machine-dependent. The machine language statements are written in
binary code (0/1 form) because the computer can understand only binary language.
▪ The first electronic computers appeared in the 1940's and were programmed in
machine language by sequences of 0's and 1's that explicitly told the computer what
operations to execute and in what order.
▪ The operations themselves were very low level: move data from one location to
another, add the contents of two registers, compare two values, and so on.
First Generation of PL / Machine Language
▪ The main advantage of programming in 1GL is that the code can run very fast and very
efficiently, precisely because the instructions are executed directly by the central
processing unit (CPU).
▪ One of the main disadvantages of programming in a low level language is that when an
error occurs, the code is not as easy to fix.
Second Generation of PL / Assembly Language
▪ The second-generation programming language also belongs to the category of low-
level programming language. The second generation language comprises assembly
languages that use the concept of mnemonics for the writing program.
▪ Assembly languages were introduced in the 1950s to mitigate the error and excessively
difficult nature of binary programming
▪ MNEMONIC: English word MNEMONIC means "A device such as a pattern of letters,
ideas, or associations that assists in remembering something.". So, its usually used by
assembly language programmers to remember the "OPERATIONS" a machine can do,
like "ADD" and "MUL" and "MOV" etc. This is assembler specific.
Third Generation of PL / Procedural Language
▪ The third-generation programming languages were designed to overcome the various
limitations of the first and second-generation programming languages.
▪ The third generation is also called procedural language. It consists of the use of a series
of English-like words that humans can understand easily, to write instructions. Its also
called High-Level Programming Language.
▪ For execution, a program in this language needs to be translated into machine
language using a Compiler/ Interpreter.
▪ C, C++, C#, and Java are high-level languages
Fourth Generation of PL
▪ The fourth-generation programming language is one step ahead of the third-
generation programming language. The programs are much easier to write and debug
than 3GLs.
▪ There are built-in GUI (Graphical user interfaces) objects like buttons, dropdown
menus, add-ins, etc. and no separate code needs to be written for them. These
languages are particularly developed with the viewpoint of solving a particular class of
problems.
▪ Fourth-generation languages are languages designed for specific applications like SQL
for database queries
Fifth Generation of PL
▪ The fifth-generation languages are also called 5GL. It is based on the concept of
artificial intelligence.
▪ It uses the concept that rather than solving a problem algorithmically, an application
can be built to solve it based on some constraints, i.e., we make computers learn to
solve any problem.
▪ Therefore, the use of 5GL has not become a reality yet and is still in the research
phase. 5GLs are mostly used in artificial intelligence research.
High Level Language
▪ These are programmer-friendly languages that are manageable, easy to understand,
debug, and widely used in today’s times.
▪ These are very easy to execute.
▪ High-level languages require the use of a compiler or an interpreter for their
translation into machine code.
▪ These languages have a very low memory efficiency. It means that they consume more
memory than any low-level language.
▪ High-level languages are human-friendly. They are, thus, very easy to understand and
learn by any programmer.
Low Level Language
▪ These are machine-friendly languages that are very difficult to understand by human
beings but easy to interpret by machines.
▪ These are very difficult to execute.
▪ These languages have a very high memory efficiency. It means that they consume less
energy as compared to any high-level language
▪ Low-level languages are machine-friendly. They are, thus, very difficult to understand
and learn by any human
Advantages of High Level Language
▪ Easy to understand and debugging
▪ Easy to execute
▪ Portable from any one device to another.
▪ High-level languages are human-friendly
Cousins of Compiler / Language Processing System
▪ In addition to a compiler, several other programs may be required to create an
executable target program
▪ Preprocessor
▪ A preprocessor is a tool that produces input for compilers
▪ A source program may be divided into modules stored in separate files. The task of collecting
the source program is sometimes entrusted to a separate program, called a preprocessor.
▪ File Inclusion: A preprocessor may also include header files into the program text like
<iostream>
▪ Macro Processing: The preprocessor may also expand shorthand called macros into source
language statements
▪ The modified source program is then fed to a compiler.
Cousins of Compiler / Language Processing System
▪ Compiler
▪ The compiler may produce an assembly language is easier to produce as output and easier to
debug
▪ Assembler
▪ The assembly language is then processed by a program called an assembler that produces
relocatable machine code as its output
▪ Linker/Loader
▪ Linker is a tool used to link part of a program together for execution into single executable
file. A loader loads this executable file into the memory and do execution
Two Pass Compiler
▪ There are two parts to compilation:
1. Analysis phase
2. Synthesis phase
Analysis-Synthesis Model of Compilation
▪ The analysis part breaks up the source program into constituent pieces and creates an
intermediate representation of the source program
▪ The synthesis part constructs the desired target program from the intermediate
representation.
Analysis Model of Compilation
▪ The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them.