0% found this document useful (0 votes)
20 views13 pages

Midsem

The document covers the fundamentals of compiler design, detailing the phases of compilation including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It explains the roles of high-level languages, assembly languages, and the function of preprocessors, assemblers, loaders, and linkers. Additionally, it discusses error types and handling, symbol tables, and the automation of lexical analysis using tools like LEX.

Uploaded by

A7 Roll No 40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Midsem

The document covers the fundamentals of compiler design, detailing the phases of compilation including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It explains the roles of high-level languages, assembly languages, and the function of preprocessors, assemblers, loaders, and linkers. Additionally, it discusses error types and handling, symbol tables, and the automation of lexical analysis using tools like LEX.

Uploaded by

A7 Roll No 40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Compiler Design

UNIT 1

LANGUAGE PROCESSING SYSTEM

High Level Language

If a program contains #define or #include directives it is called HLL.

They are closer to humans but far from machines.

These (#) tags are called pre-processor directives.

They direct the pre-processor about what to do.

Pre-Processor:

Pre-processor removes all the #include directives by including the files


called file inclusion.

Compiler Design 1
All the #define directives using macro expansion. It performs file Inclusion ,
macro-processing, short hand operators etc.

Pure High-Level Language

HLL which can be directly understood by the compiler.

Assembly Language

Neither in binary form nor high level.

It is an intermediate state that is a combination of machine instructions and


some other useful data needed for execution.

Assembler

For every platform (Hardware + OS) we will have an assembler.

They are not universal since for each platform we have one.

Output of the assembler is called an object file. Its translates assembly


language to machine code.

Compiler :

It is a computer program that translates computer code written in one


programming language (the source language) into another language (the
target language).

Compiler Design

Two phases of compilers are - Analysis phase and Synthesis phase.

Analysis phase creates an intermediate representation from the given source


code.

Compiler Design 2
Synthesis phase creates an equivalent target program from the intermediate
representation.

Relocatable Machine Code

It can be loaded at any point and can be run.

The address within the program will be in such a way that it will cooperate with
the program movement.

In given picture the “I” is the memory location which is modifiable (mtlb
location change hogi toh I autimatic shift /reloacate hota rheaga.)

Loader/Linker:

It converts the relocatable code into absolute code.

Tries to run the program resulting in a running program or an error message


(or sometimes both can happen).

Linker loads a variety of object files into a single file to make it executable.
Then loader loads it in memory and executes it.

Phases and Passes:


Compiler can have many phases and passes.

Compiler Design 3
Pass - Traversal of a compiler through the entire program
Phase: Phase of a compiler is a distinguishable stage, which takes input from the
previous stage, processes and yields output that can be used as input for the next
stage. A pass can have more than one phase.

Compiler Passes
Pass is a complete traversal of the source program. Compiler has two passes to
traverse the source program.

Multi-pass Compiler

Used to process the source code of a program several times.

In the first pass, compiler can read the source program, scan it, extract the
tokens and store the result in an output file.

In the second pass, compiler can read the output file produced by first pass,
build the syntax tree and perform the syntactical analysis. The output of this
phase is a file that contains the syntax tree.

In the third pass, compiler can read the output file produced by second pass
and check that the tree follows the rules of language or not. The output of
semantic analysis phase is the annotated tree syntax.

This pass is going on, until the target output is produced.

One-pass Compiler

One-pass compiler is used to traverse the program only once.

One-pass compiler passes only once through the parts of each compilation
unit.

It translates each part into its final machine code.

In the one pass compiler, when the line source is processed, it is scanned and
the token is extracted.

Then the syntax of each line is analyzed and the tree structure is build. After
the semantic part, the code is generated.

The same process is repeated for each line of code until the entire program is
compiled.

Compiler Design 4
PHASES OF COMPILER :

1. LEXICAL ANALYZER

It reads the program and converts it into Lexemes.

A stream of lexemes into a stream of tokens.

Compiler Design 5
Tokens are defined by regular expressions which are understood by the lexical
analyser.

It also removes white-spaces and comments.

Lexical analyzer phase is the first phase of compilation process.

It takes source code as input.

It reads the source program one character at a time and converts it into
meaningful lexemes.

Lexical analyzer represents these lexemes in the form of tokens.

2. SYNTAX ANALYSIS-

Second phase of a compiler is syntax analysis, also known as parsing.

This phase takes the stream of tokens generated by the lexical analysis
phase and checks whether they conform to the grammar of the
programming language.

Output of this phase is usually an Abstract Syntax Tree (AST).

3. SEMANTIC ANALYSIS

This phase checks whether the code is semantically correct, i.e., whether it
conforms to the language’s type system and other semantic rules.

Compiler checks the meaning of the source code to ensure that it makes
sense.

Compiler performs type checking, which ensures that variables are used
correctly and that operations are performed on compatible data types.

Compiler Design 6
Compiler also checks for other semantic errors, such as undeclared variables
and incorrect function calls.

4. INTERMEDIATE CODE

It generates intermediate code, that is a form which can be readily


understood by machine.

Many popular intermediate codes. Example — Three address code, Syntax


Tree etc.

Intermediate code is converted to machine language using the last two


phases which are platform dependent.

Till intermediate code, it is same for every compiler, but after that, it
depends on the platform.

To build a new compiler we don't need to build it from scratch. We can take
the intermediate code from the already existing compiler and build the last two
parts.

5. CODE OPTIMIZATION

It is an optional phase.

It is used to improve the intermediate code so that the output of the program
could run faster and take less space.

Removes the unnecessary lines of the code.

Arranges the sequence of statements in order to speed up the program


execution.

Meaning of the code optimizer is code being transformed but not altered.

Optimization can be categorized into two types; machine dependent and


machine independent.

6. CODE GENERATION

Final phase of a compiler is code generation.

This phase takes the optimized intermediate code and generates the actual
machine code that can be executed by the target hardware.

Compiler Design 7
SYMBOL TABLE

It is a data structure being used and maintained by the compiler, consisting of


all the identifier’s names along with their types.

It helps the compiler to function smoothly by finding the identifiers quickly.

Compiler has two modules called


a. Front-end constitutes of the Lexical analyser, semantic analyser, syntax
analyser and intermediate code generator.
b. Rest are assembled to form the back end.

LA is the first phase to communicate with the symbol and the compiler
generate the symbol table during the lexical analysis phase.

Implementation of symbol tables can be done using anyone of the data


structure -linear table, Binary Search Tree, Linked List and Hash Table

Operation on the symbol table can be performed on symbol table are - insert,
lookup/search , modify and delete

Other information in case of array, records and procedures - array a size,


records a column names, procedure a i/p parameter, functions a i/p, o/p
parameter, actual, formal parameter

Information store in the symbol table about identifier - name type scope size
offset

In general, during the first two phases, we store the information in the symbol
table and in the memory and in the later phases, we make use of the
information available in symbol table.

Every phase of the compiler will be interacting with the symbol table.

Compiler is responsible to provide the memory for symbol table. at every


phase if any new variable occurs, then they will be stored in the symbol table.

TYPES OF ERROR

Lexical Error: Happens when the compiler finds a word it doesn't know.

Syntax Error: Happens when the rules of the language are broken.

Compiler Design 8
Semantic Error: Happens when the meaning of the sentence is wrong, even if
the sentence is written correctly.

Handling Errors:

The compiler finds errors, which are called exceptions. The programmer must
fix these exceptions.

During program execution, fatal errors can occur. These are serious, and the
system administrator must fix them.

Error Handler:

An error handler is a tool that helps the compiler keep going, even if errors
happen at different stages.

If no errors are found after the last stage (phase 3), the program is correct and
can be turned into an executable form.

If errors are still present after phase 3, they will be shown to the programmer

LEXICAL ANALYZER

LA or tokenization is the process of converting a sequence of characters


(such as in a computer program or web page) into a sequence of tokens.

It reads the character of the source program groups them into lexically
meaningful units called lexems and produces tokens as output representing

Compiler Design 9
these lexems.

Other tasks of LA-

a. Stripping out comment from source program.

b. Stripping out white space character from source program.

c. Associate the error, with line no. of source program.

Lexical analyzer can be implemented in following step :

1. Input to the lexical analyzer is a source program.

2. By using input buffering scheme, it scans the source program.

3. Regular expressions are used to represent the input patterns.

4. Now this input pattern is converted into NFA by using finite automation
machine.

5. This NFA are then converted into DFA and DFA are minimized by using
different method of minimization.

6. The minimized DFA are used to recognize the pattern and broken into lexemes.

7. Each minimized DFA is associated with a phase in a programming language


which will evaluate
the lexemes that match the regular expression.

8. The tool then constructs a state table for the appropriate finite state machine
and creates
program code which contains the table, the evaluation phases, and a routine
which uses them
appropriately

Lexical Analyzer Generator

For efficient design of compiler, various tools are used to automate the phases
of compiler.

LA phase can be automated using a tool called LEX.

Compiler Design 10
LEX is a Unix utility which generates lexical analyzer.

Lexical analyzer is generated with the help of regular expressions.

LEX is very fast in finding the tokens as compared to handwritten LEX


program in C.

LEX scans the source program in order to get the stream of tokens and these
tokens can be related together so that various programming structure such as
expression, block statement, control structures, procedures can be
recognized.

LEX compiler

Automatic generation of lexical analyzer is done using LEX programming


language.

LEX specification file can be denoted using the extension .l (often pronounced
as dot L).

Example-

a. Consider specification file as x.l.

b. This x.l file is then given to LEX compiler to produce lex.yy.c

c. lex.yy.c is a C program which is actually a lexical analyzer program.

LEX specification file stores the regular expressions for the token.

lex.yy.c file consists of the tabular representation of the transition diagrams


constructed for the regular expression.

In specification file, LEX actions are associated with every regular expression.

These actions are simply the pieces of C code that are directly carried over to
the lex.yy.c.

Compiler Design 11
Generation of lexical analyzer using LEX

• Finally, the C compiler compiles this generated lex.yy.c and produces an object
program a.out.

• When some input stream is given to a.out then sequence of tokens gets
generated.

Components of LEX program


LEX program consists of three parts :

1. Declaration section

In the declaration section, declaration of variable and constants can be done.

Some regular definitions can also be written in this section.

The regular definitions are basically components of regular expressions.

2. Rule section

The rule section consists of regular expressions with associated actions.

These translation rules can be given in the form as :

Compiler Design 12
R1 {action1}
R2 {action2}

Rn {actionn}

Where each Ri is a regular expression and each action is a program fragment


describing what action is to be taken for corresponding regular expression.

These actions can be specified by piece of C code.

3. Auxiliary procedure section

All the procedures are defined which are required by the actions in the rule
section.

This section consists of two functions :

a. main() function

b. yywrap() function

Compiler Design 13

You might also like