0% found this document useful (0 votes)
107 views

Compiler Notes

The document discusses several topics related to compilers including: 1. Assembly language which uses symbols and abbreviations instead of binary and is slower than machine language. 2. Lexical analysis which identifies tokens by scanning the input string and maintains two pointers in a buffered input. 3. Type checking which verifies types in values at compile-time or run-time to catch errors.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Compiler Notes

The document discusses several topics related to compilers including: 1. Assembly language which uses symbols and abbreviations instead of binary and is slower than machine language. 2. Lexical analysis which identifies tokens by scanning the input string and maintains two pointers in a buffered input. 3. Type checking which verifies types in values at compile-time or run-time to catch errors.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Assembly language 

is the more than low level and less than high-level


language so it is intermediary language. Assembly languages use numbers,
symbols, and abbreviations instead of 0s and 1s.For example: For addition,
subtraction and multiplications it uses symbols likes Add, sub and Mul, etc.
Execution is slow as compared to machine language.

Pattern is the set of rules that determines whether a given lexeme is a valid token or not.
Token is a sequence of characters that can be treated as a single logical entity.
Lexeme is a sequence of characters in the source program that is matched for a token.
Parse tree: it is a graphical representation of how the start symbol of grammar generates
the string.

Topdown parser cant accept left recursive grammar bcz it will fall in infinite loop, so we
have to remove left recursion. Also it can’t take ambiguous grammar and non deterministic
grammar. It uses leftmost derivation.
Bottom up parser will work on left recursive and non deterministic but not on ambiguous
grammar except operator precedence which will work on any grammar. It uses rightmost
derivation.
To convert ambiguous grammar to unambiguous we have to:
1.ensure that higher precedence operator remains at lower lever.
2.if operator is left associative make grammar left recursive, otherwise right recursive.

If RHS of more than one production starts with the same symbol, then such a grammar is
called as Grammar With Common Prefixes or non deterministic grammar.
 This kind of grammar creates a problematic situation for Top down parsers.
 Top down parsers can’t decide which production must be chosen to parse the string in
hand.
To remove this confusion, we use left factoring
Left factoring is the process to remove common prefix or converting non det. Grammar
to det. Grammar.

Predictive parser is a recursive descent parser, which has the capability to predict
which production is to be used to replace the input string. The predictive parser
does not suffer from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which
points to the next input symbols. To make the parser back-tracking free, the
predictive parser puts some constraints on the grammar and accepts only a class of
grammar known as LL(k) grammar.

Predictive parsing uses a stack and a parsing table to parse the input and generate
a parse tree. Both the stack and the input contains an end symbol $ to denote that
the stack is empty and the input is consumed. The parser refers to the parsing table
to take any decision on the input and stack element combination.

LR PARSER
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide
class of context-free grammar which makes it the most efficient syntax analysis
technique. LR parsers are also known as LR(k) parsers, where L stands for left-to-
right scanning of the input stream; R stands for the construction of right-most
derivation in reverse, and k denotes the number of lookahead symbols to make
decisions.
There are three widely used algorithms available for constructing an LR parser:

 SLR(1) – Simple LR Parser:


o Works on smallest class of grammar
o Few number of states, hence very small table
o Simple and fast construction
 LR(1) – LR Parser:
o Works on complete set of LR(1) Grammar
o Generates large table and large number of states
o Slow construction
 LALR(1) – Look-Ahead LR Parser:
o Works on intermediate size of grammar
o Number of states are same as in SLR(1)

A three-address code has at most three address locations to calculate the


expression. A three-address code can be represented in two forms : quadruples
and triples.

Quadruples
Each instruction in quadruples presentation is divided into four fields: operator,
arg1, arg2, and result. The above example is represented below in quadruples
format:

Op arg1 arg2 result

* c d r1

+ b r1 r2

+ r2 r1 r3

= r3 a

Triples
Each instruction in triples presentation has three fields : op, arg1, and arg2.The
results of respective sub-expressions are denoted by the position of expression.
Triples represent similarity with DAG and syntax tree. They are equivalent to DAG
while representing expressions.

Op arg1 arg2

* c d

+ b (0)
+ (1) (0)

= (2)

Triples face the problem of code immovability while optimization, as the results are
positional and changing the order or position of an expression may cause problems.

Indirect Triples
This representation is an enhancement over triples representation. It uses pointers
instead of position to store results. This enables the optimizers to freely re-position
the sub-expression to produce an optimized code.

Symbol table is data structure created and maintained by compilers in order to


store information about the occurrence of various entities such as variable names,
function names, objects, classes, interfaces, etc. Symbol table is used by both the
analysis and the synthesis parts of a compiler.
A symbol table may serve the following purposes depending upon the language in
hand:
 To store the names of all entities in a structured form at one place.
 To verify if a variable has been declared.
 To implement type checking, by verifying assignments and expressions in the
source code are semantically correct.
 To determine the scope of a name (scope resolution).

Implementation
If a compiler is to handle a small amount of data, then the symbol table can be
implemented as an unordered list, which is easy to code, but it is only suitable for
small tables only. A symbol table can be implemented in one of the following ways:

 Linear (sorted or unsorted) list


 Binary Search Tree
 Hash table
INPUT BUFFERING: Lexical Analysis has to access secondary memory each time to
identify tokens. It is time-consuming and costly. So, the input strings are stored into a buffer
and then scanned by Lexical Analysis.
Lexical Analysis scans input string from left to right one character at a time to identify tokens.
It uses two pointers to scan tokens −
 Begin Pointer (bptr) − It points to the beginning of the string to be read.
 Look Ahead Pointer (lptr) − It moves ahead to search for the end of the token.
Example − For statement int a, b;
 Both pointers start at the beginning of the string, which is stored in the buffer.

 Look Ahead Pointer scans buffer until the token is found.

 The character ("blank space") beyond the token ("int") have to be examined before
the token ("int") will be determined.

 After processing token ("int") both pointers will set to the next token ('a'), & this
process will be repeated for the whole program.
A buffer can be divided into two halves. If the look Ahead pointer moves towards halfway in
First Half, the second half is filled with new characters to be read. If the look Ahead pointer
moves towards the right end of the buffer of the second half, the first half will be filled with
new characters, and it goes on.

Sentinels − Sentinels are used to making a check, each time when the forward pointer is
converted, a check is completed to provide that one half of the buffer has not converted off.
If it is completed, then the other half should be reloaded.
Buffer Pairs − A specialized buffering technique can decrease the amount of overhead,
which is needed to process an input character in transferring characters. It includes two
buffers, each includes N-character size which is reloaded alternatively.

Type Checking : Type checking is the process of verifying and enforcing


constraints of types in values. It checks the type of objects and reports a type
error in the case of a violation, and incorrect types are corrected
There are two kinds of type checking:
1. Static Type Checking.(done at compile time)
2. Dynamic Type Checking.(done at run time)
In C, C++, C# and other programming languages, an identifier is a name
that is assigned by the user for a program element such as variable,
type, template, class, function or namespace. It is usually limited to letters,
digits, and underscores. Certain words, such as "new," "int" and "break,"
are reserved keywords and cannot be used as identifiers

Storage allocation techniques:


Static Allocation
It is the simplest allocation scheme in which allocation of data objects is done at
compile time because the size of every data item can be determined by the compiler.

 In static allocation, the compiler can decide the amount of storage needed by each
data object. Thus, it becomes easy for a compiler to identify the address of these
data in the activation record. It is not possible to use variables whose size has to
be determined at run time.
 FORTRAN uses this

Stack Allocation: The stack allocation is a runtime storage management technique.


The activation records are pushed and popped as activations begin and end
respectively.
It can be determined the size of the variables at a run time & hence local variables
can have different storage locations & different values during various activations.
It allows recursive subprograms
If procedure A calls B, and then B calls C, then stack allocation will be

ALGOL language uses this strategy

Heap Storage Allocation


It enables the allocation of memory in a Non-nested design. Storage can be
allocated & freed arbitrarily from an area known as Heap.
Heap Allocation is helpful for executing data whose size varies as the program is
running.
Heap is maintained as a list of free space called free space list.
It creates the problem of fragmentation.

BackPatching: While generating three address codes for the given expression, it can
specify the address of the Label in goto statements. It is very difficult to assign
locations of these label statements in one pass so, two passes are used. In the first
pass, it can leave these addresses unspecified & in the next pass, and it can fill
these addresses. Therefore filling of incomplete transformation is called
Backpatching.

You might also like