0% found this document useful (0 votes)
15 views42 pages

Unit 1 1

Unit 1 introduces the fundamentals of compilers, including their structure, evolution, and applications. It covers the roles of language processors like compilers and interpreters, detailing the compilation process from source code to executable files, including phases such as lexical analysis, syntax analysis, semantic analysis, and code generation. Additionally, it discusses the importance of symbol tables and compiler-construction tools in developing efficient compilers.

Uploaded by

Pramod Shenoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views42 pages

Unit 1 1

Unit 1 introduces the fundamentals of compilers, including their structure, evolution, and applications. It covers the roles of language processors like compilers and interpreters, detailing the compilation process from source code to executable files, including phases such as lexical analysis, syntax analysis, semantic analysis, and code generation. Additionally, it discusses the importance of symbol tables and compiler-construction tools in developing efficient compilers.

Uploaded by

Pramod Shenoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

UNIT-1

Introduction to Compilers
TEXT BOOK: Alfred V.Aho, Monica S. Lam, Ravi Sethi,
Jeffrey D.Ullman by Compilers Principles, Techniques and
Tools
TOPICS IN UNIT-1

1. Introduction to compilers: Language Processor, the


structure of a compiler,
2. Evolution of Programming language, Science of Building
Compiler,
3. Applications of Compiler Technology, Programming
Language Basics
4. Lexical Analysis: - The role of lexical analyzer, Input
Buffering, Specification of Tokens, Recognition of
Tokens,
5. Syntax Analysis: Introduction, Context-Free Grammar,
Writing a Grammar
LANGUAGE PROCESSORS: COMPILERS

• A compiler is a program that can read a program in one


language (source language) and translate it into an
equivalent program in another language (target language).
• An important role of the compiler is to report any errors in the
source program that it detects during the translation
process.
LANGUAGE PROCESSORS: INTERPRETERS
• An interpreter is a program
that directly compiles the
program line by line.
• Working of interpreters:
 An interpreter reads the source
code line by line
 It translates the code into machine-
readable instructions
 It executes the instructions • Some programming
immediately
languages (scripting
• Compilers translate and
execute a program at once,
languages) that use
while interpreters convert it interpreters Python,
one line at a time. Ruby, JavaScript,
• Compilers are used for Perl, and PHP.
programming languages like
C, C++, and Java.
JAVA LANGUAGE PROCESSOR
• Example 1.1 : Java language processors
combine compilation and interpretation, as
shown in Fig. 1.4. A Java source program
may first be compiled into an intermediate
form called bytecodes.
• The bytecodes are then interpreted by a
virtual machine. A benefit of this
arrangement is that bytecodes compiled on
one machine can be interpreted on another
machine, perhaps across a network.
• In order to achieve faster processing of inputs
to outputs, some Java compilers, called just-
in-time compilers, translate the bytecodes A Java virtual machine (JVM) is a virtual
into machine language immediately before computer that runs Java programs on
different operating systems
they run the intermediate program to process
the input
LANGUAGE PROCESSING SYSTEM
• Source Program: The original code written by
the programmer in a high-level language.
• Preprocessor: Handles directives (like
#include in C/C++) and macro expansions,
preparing the code for compilation.
• Compiler: Translates the preprocessed code
into assembly language, a low-level
representation of the program.
• Assembler: Converts assembly language code
into machine code, producing object files.
• Linker: Combines multiple object files and
libraries into a single executable file, resolving
external references between them.
• Loader: Loads the executable file into memory,
setting up the environment for the program to
run.
• Execution: The program is executed by the
system's processor.
LANGUAGE PROCESSING SYSTEM: PREPROCESSOR

• A Language Processing System consists of multiple programs


working together to convert a source program into an executable
file that a computer can run. Process is as follows:
1. Source Program and Preprocessor
• A source program is written in a high-level language (e.g., C,
Java).
• It may be split into multiple modules (separate files).
• A preprocessor is a special program that:
• Combines multiple source files into one.
• Expands macros (short abbreviations) into full statements.
• Handles directives like #include (in C/C++).
• The preprocessor replaces PI with 3.14 before compilation
LANGUAGE PROCESSING SYSTEM:
Compilation and Relocatable code
2. Compilation and Relocatable Machine Code
• The compiler converts the preprocessed source code into
relocatable machine code (object file).
• Since large programs are compiled in pieces, different object files
are created.

(main.c is converted into main.o object


file)

(math.c is converted into math.o object


file)
Object files Vs. Executable Files
Stage of Creation:
• Object Files: Created during the compilation stage as intermediate outputs.
• Executable Files: Generated during the linking stage as the final output.

Completeness:
• Object Files: Incomplete; may have unresolved external references.
• Executable Files: Complete; all references are resolved, and the program is
ready to run.

Dependency:
• Object Files: Depend on other object files and libraries to form a complete
program.
• Executable Files: Self-contained and can be executed independently.
LANGUAGE PROCESSING SYSTEM: LINKING and LOADING

3) LINKER: A linker takes multiple object files and library


files and combines them into a single executable file.
• It resolves external references, meaning it connects
function calls from one file to the actual function
definitions in another file.
• Example:
• main.o calls sqrt() function.
• math.o contains the definition of sqrt().
• The linker connects them into a final executable file.
4. LOADER:
• A loader loads the executable file into memory.
• It assigns actual memory addresses and prepares the
program for execution.
• The a.out or .exe file is loaded into memory and executed by the
CPU.
LANGUAGE PROCESSING SYSTEM
• Preprocessor – Expands macros, includes files.
• Compiler – Converts source code to machine code
(object files).
• Linker – Combines object files into an executable.
• Loader – Loads the program into memory for execution
1.2 STRUCTURE OF A COMPILER
• Compiler is divided into two parts : analysis and
synthesis.
• The analysis part breaks up the source
program into constituent pieces and
imposes a grammatical structure on
them.
• It then uses this structure to create an
intermediate representation of the
source program. If the analysis part
detects that the source program is
either syntactically ill formed or
semantically unsound, then it must
provide informative messages,
• So, the user can take corrective action.
The analysis part also collects
information about the source program
and stores it in a data structure called a
symbol table, which is passed along with
the intermediate representation to the
synthesis part.
COMPILER
STRUCTURE:
EXAMPLE
1.2.1 Lexical Analysis (Scanning)
• The first phase of a compiler is called lexical analysis or scanning.
• The lexical analyzer reads the stream of characters making up the
source program and groups the characters into meaningful sequences
called lexemes.
• For each lexeme, the lexical analyzer produces as output a token of
the form (token-name, attribute-value) that it passes on to the
subsequent phase, syntax analysis.
• In the token, the first component token-name is an abstract symbol
that is used during syntax analysis, and the second component
attribute-value points to an entry in the symbol table for this token.
• Information from the symbol-table entry 'is needed for semantic
analysis and code generation.
1.2.1 Lexical Analysis for position = initial + rate * 60

• For example, suppose a source program contains the


assignment statement position = initial + rate * 60
• The characters in this assignment could be grouped into
the following lexemes and mapped into the following
tokens passed on to the syntax analyzer:
 “position” is a lexeme that would be mapped into a token
(id, 1), where id is an abstract symbol standing for
identifier and 1 points to the symbol table entry for
“position”. The symbol-table entry for an identifier holds
information about the identifier, such as its name and type.
 initial is a lexeme that is mapped into the token (id, 2),
where 2 points to the symbol-table entry for initial .
 “+” is a lexeme that is mapped into the token (+).
 “rate” is a lexeme that is mapped into the token (id, 3),
where 3 points to the symbol-table entry for rate
 “*” is a lexeme that is mapped into the token (*) .
 “60” is a lexeme that is mapped into the token (60)
• Blanks separating the lexemes would be discarded by the
lexical analyzer.
1.2.2 SYNTAX
ANALYSIS
• The second phase of the compiler is
syntax analysis or parsing.
• The parser uses the first components of the
tokens produced by the lexical analyzer to
create a tree-like intermediate
representation that depicts the
grammatical structure of the token
stream.
• A typical representation is a syntax tree in
which each interior node represents an
operation, and the children of the node
represent the arguments of the operation.
1.2.2 SYNTAX ANALYSIS for position = initial +
rate * 60
• This tree shows the order in which the operations in the
assignment position = initial + rate * 60 are to be performed.
• The tree has an interior node labeled * with (id, 3) as its left
child and the integer 60 as its right child. The node (id, 3)
represents the identifier rate. The node labeled * makes it
explicit that we must first multiply the value of r a t e by 60.
• The node labeled + indicates that we must add the result of
this multiplication to the value of initial
• The root of the tree, labeled =, indicates that we must store
the result of this addition into the location for the identifier
posit ion.
• This ordering of operations is consistent with the usual
conventions of arithmetic which tell us that multiplication
has higher precedence than addition, and hence that the
multiplication is to be performed before the addition.
1.2.3 SEMANTIC ANALYSIS
• The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
• It also gathers type information and saves it in either the syntax tree or the symbol
table, for subsequent use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the compiler checks
that each operator has matching operands.
• For example, many programming language definitions require an array index to be an
integer; the compiler must report an error if a floating-point number is used to
index an array.
• The language specification may permit some type conversions called coercions.
• For example, a binary arithmetic operator may be applied to either a pair of integers or
to a pair of floating-point numbers. If the operator is applied to a floating-point number
and an integer, the compiler may convert or coerce the integer into a floating-point
number.
1.2.3 SEMANTIC
ANALYSIS EXAMPLE
• Such a coercion appears in Figure Suppose that
position, initial , and rate have been declared to be
floating-point numbers, and that the lexeme 60 by
itself forms an integer.
• The type checker in the semantic analyzer in Figure
discovers that the operator * is applied to a floating-
point number rate and an integer 60.
• In this case, the integer may be converted into a
floating-point number.
• In Fig. 1.7, notice that the output of the semantic
analyzer has an extra node for the operator
inttofloat, which explicitly converts its integer
argument into a floating-point number
1.2.4 INTERMEDIATE CODE
GENERATION
• In the process of translating a source program into target code, a compiler may
construct one or more intermediate representations, which can have a variety of
forms.
• Syntax trees are a form of intermediate representation; they are commonly used
during syntax and semantic analysis.
• After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation, which
we can think of as a program for an abstract machine.
• This intermediate representation should have two important properties: it should
be easy to produce and it should be easy to translate into the target machine.
• we consider an intermediate form called three-address code, which consists of a
sequence of assembly-like instructions with three operands per instruction.
Each operand can act like a register
INTERMEDIATE
CODE
GENERATION
• The output of the intermediate code generator in Fig. 1.7 consists of
the three-address code sequence for position = initial + rate * 60
tl = inttofloat (60)
t2 = id3 * tl
t3 = id2 + t 2
id1 = t3
• First, each three-address assignment instruction has at most one
operator on the right side. Thus, these instructions fix the order in
which operations are to be done; the multiplication precedes the
addition in the source program
• Second, the compiler must generate a temporary name to hold the
value computed by a three-address instruction. Third, some "three-
address instructions" like the first and last in the sequence (1.3),
above, have fewer than three operands.
1.2.5 CODE OPTIMIZATION
• The machine-independent code-
optimization phase attempts to improve the
intermediate code so that better target code
will result
• Usually better means faster, but other
objectives may be desired, such as shorter
code, or target code that consumes less
power.
• The optimizer can deduce that the
conversion of 60 from integer to floating
point can be done once and for all at
compile time, so the inttofloat operation
can be eliminated by replacing the integer
60 by the floating-point number 60.0.
1.2.6. CODE GENERATION for position = initial +
rate * 60
• The code generator takes as input an intermediate representation of the source
program and maps it into the target language.
• If the target language is machine code, registers or memory locations are selected
for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of machine
instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of registers to hold
variables.
• For example, the following intermediate code
tl = id3 * 60.0
id1 = id2 + t1
is converted into following where 1 st operand of each instruction is destination
LDF R2, id3  R2=id3
MULF R2, R2, 60.0  R2=R2*60.0
LDF R1, id2  R1=id2
ADDF R1, R1, R2  R1=R1+R2
STF idl, Rl  id1=R1
1.2.7. SYMBOL TABLE MANAGEMENT
• An essential function of a compiler is to record the variable names used in the
source program and collect information about various attributes of each name.
• These attributes may provide information about the storage allocated for a name,
its type, its scope (where in the program its value may be used), and in the case of
procedure names, such things as the number and types of its arguments, the
method of passing each argument (for example, by value or by reference), and the
type returned.
• The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.
• The data structure should be designed to allow the compiler to find the record for
each name quickly and to store or retrieve data from that record quickly
1.2.8. Compiler-Construction Tools
• The compiler writer, like any software developer, can profitably use modern software
development environments containing tools such as language editors, debuggers, version
managers, profilers, test harnesses, and so on.
• In addition to these general software-development tools, other more specialized tools have
been created to help implement various phases of a compiler
• These tools use specialized languages for specifying and implementing specific
components, and many use quite sophisticated algorithms.
• The most successful tools are those that hide the details of the generation algorithm and
produce components that can be easily integrated into the remainder of the compiler.
• Some commonly used compiler-construction tools include:
• Parser generators
• Scanner generators
• Syntax-directed translation
• Code-generator generators
• Data-flow analysis engines
• Compiler-construction toolkits
1.2.8. Compiler-Construction Tools
1. Parser generators that automatically produce syntax analyzers from a
grammatical description of a programming language.
2. Scanner generators that produce lexical analyzers from a regular-expression
description of the tokens of a language.
3. Syntax-directed translation engines that produce collections of routines for
walking a parse tree and generating intermediate code.
4. Code-generator generators that produce a code generator from a collection of
rules for translating each operation of the intermediate language into the machine
language for a target machine.
5. Data-flow analysis engines that facilitate the gathering of information about
how values are transmitted from one part of a program to each other part. Data-
flow analysis is a key part of code optimization.
6. Compiler-construction toolkits that provide an integrated set of routines for
constructing various phases of a compiler.
1.3 THE EVOLUTION OF PROGRAMMING LANGUAGES

• The first electronic computers


appeared in the 1940's and were
programmed in machine language by
sequences of 0's and 1's that explicitly
told the computer what operations to
execute and in what order.
• The operations themselves were very
low level: move data from one
location to another, add the contents of
two registers, compare two values, and
so on.
• Needless to say, this kind of
programming was slow, tedious, and
error prone. And once written, the
programs were hard to understand and
modify.
1.3.1 The Move to Higher-level Languages
Classification by Languages/Concepts Description/Features
Generation
1st Generation Machine Languages Low-level binary code
understood directly by
computers.

2nd Generation Assembly Languages Mnemonic codes


representing machine
instructions.
3rd Generation Higher-level languages
Fortran, COBOL, Lisp, C, C+ improving programmer
+, C#, Java productivity and program
readability.
4th Generation NOMAD, SQL, Postscript Application-specific
languages for report
generation, database
queries, and text formatting.
5th Generation Logic- and constraint-based
languages focusing on
Prolog, OPS5
problem-solving and
1.3.1 The Move to Higher-level Languages
Classification by Languages/Concepts Description/Features
Paradigm
Specify how computations are to
Imperative Languages C, C++, C#, Java be done; involve program state
and state-changing statements.
ML, Haskell, Prolog Specify what computation is to be
done; focus on functional and
Declarative Languages
logic-based programming
paradigms.
Based on Languages/Concepts Description/Features
Architecture
Computational model based on von
von Neumann
Fortran, C Neumann architecture (sequential
Languages
execution, shared memory).
Object-Oriented Support object-oriented
Languages Simula 67, Smalltalk, C+ programming with interacting
+, C#, Java, Ruby objects; early examples include
Simula 67 and Smalltalk.
Scripting Languages Interpreted languages with high-
1.3.2 IMPACTS ON COMPILERS
• Advances in programming languages placed new demands
on compiler writers.
• Compiler writers had to create algorithms and
representations to support new language features.
• Since the 1940s, computer architecture has evolved:
Compiler writers had to devise translation algorithms to
maximize new hardware capabilities.
• Compilers minimize execution overhead for programs
written in high-level languages and make high-performance
computer architectures effective for user applications.
• Compilers are used as tools to evaluate architectural
concepts before building new computers.
1.3.2 IMPACTS ON COMPILERS
• Writing compilers is a challenging task:
• A compiler is a large program, often consisting of millions of lines of
code.
• Modern language-processing systems may handle multiple source
languages and target machines within the same framework, acting as
collections of compilers.
• Good software engineering techniques are essential
for:
• Creating and evolving modern language processors.
• A compiler must translate correctly the potentially
infinite set of programs written in the source language:
• Generating optimal target code from source programs is undecidable
in general.
• Compiler writers must evaluate tradeoffs and apply heuristics to
generate efficient code.
1.4 THE SCIENCE OF BUILDING A COMPILER
• Compiler design solves complex real-world problems through mathematical
abstraction.
• Abstraction process:
• Identify the problem.
• Formulate a mathematical model capturing key characteristics.
• Apply mathematical techniques to solve it.
• Problem formulation requires a deep understanding of computer program
characteristics.
• Solutions must be validated and refined through empirical testing.
• Compilers must accept all valid source programs, which can be infinitely
varied and extremely large (millions of lines).
• Transformations by the compiler must preserve the original program's
meaning.
• Compiler writers influence all programs compiled by their tools, amplifying their
impact.
• Compiler development is rewarding due to this broad influence but also
challenging because of its complexity and responsibility.
1.4.1 Modeling in Compiler Design and Implementation
• The study of compilers is mainly a study of how we design the right
mathematical models and choose the right algorithms, while balancing
the need for generality and power against simplicity and efficiency.
• Some of most fundamental models are finite-state machines and regular
expressions, These models are useful for describing the lexical units of
programs (keywords, identifiers, and such) and for describing the
algorithms used by the compiler to recognize those units.
• Also among the most fundamental models are context-free grammars,
used to describe the syntactic structure of programming languages such
as the nesting of parentheses or control constructs.
• Similarly, trees are an important model for representing the structure of
programs and their translation into object code.
1.4.2 The science of code optimization
• Code optimization refers to a compiler's efforts to
produce more efficient code than the straightforward
version.
• Modern compiler optimization has become more critical
and complex.
• Complexity increases due to advanced processor
architectures offering more optimization opportunities.
• Importance grows with massively parallel computers
needing substantial optimization to avoid
performance degradation.
• Multicore machines require compilers to optimize for
multiprocessor usage.
1.4.2 The science of code optimization
• Robust compilers can't be built on simple "hacks."
• A solid mathematical foundation helps ensure
optimizations are correct for all inputs.
• Tools like graphs, matrices, and linear
programming are essential for producing optimized
code.
• Pure theory is insufficient—many optimization problems
are undecidable.
• A balance of theory and empirical testing is
needed to validate solutions.
Key Objectives of Compiler Optimization
• Correctness: Must preserve the original program's meaning.
• Performance Improvement: Should enhance performance for many
programs, focusing on:
• Speed (primary focus).
• Code size (important for embedded systems).
• Power consumption (critical for mobile devices).
• Usability (error reporting, debugging).
• Reasonable Compilation Time:
• Short compilation supports rapid development and debugging cycles.
• Programs are often debugged without optimizations, as optimizations
complicate debugging.
• Manageable Engineering Effort:
• Compilers are complex; keeping them simple reduces engineering and
maintenance costs.
• Focus on implementing optimizations that offer significant practical benefits.
Challenges in Compiler
Optimization
• Ensuring correctness is paramount, as incorrect
optimizations can break programs.
• Even well-designed optimizing compilers may not
be entirely error-free.
• Optimizations may introduce new issues, requiring
additional testing.
• Performance may not always justify the effort required
for optimization, especially in non-critical applications.
1.5 APPLICATIONS OF COMPILER TECHNOLOGY

1. Software Development
• Programming Language Translation:
Converts high-level source code into machine
code or intermediate code.
• Code Optimization: Enhances execution speed,
reduces memory usage, and improves efficiency.
• Cross-Compilation: Enables code written for
one platform to be compiled and run on another.
• Error Detection and Debugging: Identifies
syntax and semantic errors during compilation.
1.5 APPLICATIONS OF COMPILER
TECHNOLOGY
2. System Software Development
• Operating System Development: Compilers
are used to develop kernels, drivers, and system
utilities.
• Embedded Systems: Generates optimized
code for resource-constrained devices like IoT,
microcontrollers, and firmware.
• Device Drivers: Helps in creating efficient
drivers for hardware components.
1.5 APPLICATIONS OF COMPILER
TECHNOLOGY
3) High-Performance Computing (HPC)
• Parallel Computing: Optimizes code to utilize multi-
core and GPU architectures.
• Vectorization and Pipelining: Enhances processing
speed using hardware acceleration techniques

4) Database and Query Optimization


• SQL Query Compilation: Converts high-level queries
into optimized execution plans.
• Just-in-Time (JIT) Compilation: Used in modern
databases for runtime optimization.
1.5 APPLICATIONS OF COMPILER TECHNOLOGY
5) Artificial Intelligence and Machine Learning
• Deep Learning Frameworks: TensorFlow and PyTorch
use compiler optimizations for hardware acceleration.
• Graph Compilation: AI models are optimized for GPUs,
TPUs, and NPUs using compilers like XLA and TVM.
6) Web and Mobile Development
• JavaScript Just-in-Time Compilation: Optimizes
performance of web applications.
• Mobile App Development: Compilers convert
Java/Kotlin/Swift code into platform-specific executables.
1.5 APPLICATIONS OF COMPILER
TECHNOLOGY
7) Natural Language Processing
(NLP)
• Compiler-Based Text Processing:
Used in language processing engines.
• Parsing and Syntax Analysis: Helps
in constructing abstract syntax trees for
NLP applications

You might also like