Programming and Data Structures
Programming and Data Structures
Algorithms
Compiler Design
Linked list is also known as self-referential structure. Get complete notes on Linked List
Here.
Trees: Trees are used to represent data containing a hierarchical relationship between
elements e. g., records, family trees and table contents. For complete details on Trees, click
here.
Binary search trees: A binary tree T, is called binary search tree (or binary sorted tree), if
each node N of T has the following property. The value at N is greater than every value in the
left subtree of N and is less than or equal to every value in the right subtree of N. Click here
to get detailed notes on Binary Search Trees.
Binary heaps: The binary heap data structure is an array that can be viewed as a complete
binary tree. Each node of the binary tree corresponds to an element of the array. The array is
completely filled on all levels except possibly lowest (lowest level is filled in left to right
order and need not be complete). For complete study notes on Binary heaps, click here.
Graphs: A graph is a collection of nodes called vertices, and the connections between them,
called edges. Get complete Study notes on Graphs here.
Programming in C
One of the Important topics of GATE 2016 Computer Science Exam is Programming &
Data structure. The programming language asked in GATE 2016 Exam is C Programming
Language.
Programming in C : GATE 2016 Exam
Character Set: The characters that can be used to form words, numbers and expressions
depend upon the computer on which the program runs. The characters in C are grouped into
the following categories: Letters, Digits, Special characters and White spaces.
C Tokens: The smallest individual units are known as C tokens. C has six types of tokens.
They are: Keywords, Identifiers, Constants, Operators, String, and Special symbols.
Keywords: All keywords are basically the sequences of characters that have one or more
fixed meanings. All C keywords must be written in lowercase letters. e.g., break, char, int,
continue, default, do etc.
Identifiers: A C identifier is a name used to identify a variable, function, or any other userdefined item. An identifier starts with a letter A to Z, a to z, or an underscore _ followed by
zero or more letters, underscores, and digits (0 to 9).
Constants: Fixed values that do not change during the execution of a C program.
Backslash character constants are used in output functions. e.g., \b used for backspace and
\n used for new line etc.
Operator: It is symbol that tells computer to perform certain mathematical or logical
manipulations. e.g., Arithmetic operators (+, -, *, /) etc.
String: A string is nothing but an array of characters (printable ASCII characters).
Delimiters / Separators: These are used to separate constants, variables and statements e.g.,
comma, semicolon, apostrophes, double quotes and blank space etc.
Variable:
A variable is nothing but a name given to a storage area that our programs can
manipulate.
Each variable in C has a specific type, which determines the size and layout of the
variables memory.
The range of values that can be stored within that memory and the set of operations
that can be applied to the variable.
Data Types
Different Types of Modifier with their Range:
Type Conversions
Implicit Type Conversion: There are certain cases in which data will get
automatically converted from one type to another:
o When data is being stored in a variable, if the data being stored does not match
the type of the variable.
o The data being stored will be converted to match the type of the storage
variable.
o When an operation is being performed on data of two different types. The
smaller data type will be converted to match the larger type.
The following example converts the value of x to a double precision
value before performing the division. Note that if the 3.0 were changed
to a simple 3, then integer division would be performed, losing any
fractional values in the result.
average = x / 3.0;
o When data is passed to or returned from functions.
Explicit Type Conversion: Data may also be expressly converted, using the typecast
operator.
o The following example converts the value of x to a double precision value
before performing the division. ( y will then be implicitly promoted, following
the guidelines listed above. )
average = ( double ) x / y;
Note that x itself is unaffected by this conversion.
Expression:
lvalue:
Expressions that refer to a memory location are called lvalue expressions.
An lvalue may appear as either the left-hand or right-hand side of an
assignment.
o Variables are lvalues and so they may appear on the left-hand side of an
assignment
rvalue:
o The term rvalue refers to a data value that is stored at some address in
memory.
o An rvalue is an expression that cannot have a value assigned to it which means
an rvalue may appear on the right-hand side but not on the left-hand side of an
assignment.
o Numeric literals are rvalues and so they may not be assigned and cannot
appear on the left-hand side.
o
o
for Loop:
for (initialize counter; test counter; increment/decrement counter)
{
<Statement1>
<Statement2>
}
do while Loop:
initialize loop counter;
do
{
<Statement1>
<Statement2>
}
while (this condition is true);
The break Statement: The break statement is used to jump out of a loop instantly, without
waiting to get back to the conditional test.
The continue Statement: The continue statement is used to take the control to the
beginning of the loop, by passing the statement inside the loop, which have not yet been
executed.
goto Statement: C supports an unconditional control statement, goto, to transfer the control
from one point to another in a C program.
C Variable Types
A variable is just a named area of storage that can hold a single value. There are two main
variable types: Local variable and Global variable.
Local Variable: Scope of a local variable is confined within the block or function, where it is
defined.
Global Variable: Global variable is defined at the top of the program file and it can be
visible and modified by
any function that may reference it. Global variables are initialized automatically by the
system when we define them. If same variable name is being used for global and local
variables, then local variable takes preference in its scope.
Storage Classes in C
A variable name identifies some physical location within the computer, where the string of
bits representing the variables value, is stored.
There are basically two kinds of locations in a computer, where such a value maybe kept:
Memory and CPU registers.
It is the variables storage class that determines in which of the above two types of locations,
the value should be stored.
We have four types of storage classes in C: Auto, Register, Static and Extern.
Auto Storage Class: Features of this class are given below.
Static Storage Class: Static is the default storage class for global variables. Features of static
storage class are given below.
Extern Storage Class: Extern is used of give a reference of a global variable that is
variable to all the program files. When we use extern, the variable cant be initialized as all it
does, is to point the variable name at a storage location that has been previously defined.
Operator Precedence Relations: Operator precedence relations are given below from
highest to lowest order:
Functions
Call by Value: If we pass values of variables to the function as parameters, such kind of
function calling is known as call by value.
Call by Reference: Variables are stored somewhere in memory. So, instead of passing the
value of a variable, if we pass the location number / address of the variable to the function,
then it would become a call by reference.
Pointers
A pointer is a variable that stores memory address. Like all other variables, it also has a
name, has to be declared and occupies some spaces in memory. It is called pointer because it
points to a particular location.
NULL Pointers
Uninitilized pointers start out with random unknown values, just like any other
variable type.
Accidentally using a pointer containing a random address is one of the most common
errors encountered when using pointers, and potentially one of the hardest to
diagnose, since the errors encountered are generally not repeatable.
Combinations of * and ++
Pointer Operations:
Assignment: You can assign an address to a pointer. Typically, you do this by using
an array name or by using the address operator (&).
Value finding (dereferencing): The * operator gives the value stored in the pointedto location.
Taking a pointer address: Like all variables, pointer variables have an address and a
value. The & operator tells you where the pointer itself is stored.
Adding an integer to a pointer: You can use the + operator to add an integer to a
pointer or a pointer to an integer. In either case, the integer is multiplied by the
number of bytes in the pointed-to type, and the result is added to the original address.
Incrementing a pointer: Incrementing a pointer to an array element makes it move
to the next element of the array.
Subtracting an integer from a pointer: You can use the operator to subtract an
integer from a pointer; the pointer has to be the first operand or a pointer to an integer.
The integer is multiplied by the number of bytes in the pointed-to type, and the result
is subtracted from the original address.
Note that there are two forms of subtraction. You can subtract one pointer from
another to get an integer, and you can subtract an integer from a pointer and get a
pointer.
Decrementing a pointer: You can also decrement a pointer. In this example,
decrementing ptr2 makes it point to the second array element instead of the third.
Note that you can use both the prefix and postfix forms of the increment and
decrement operators.
Differencing: You can find the difference between two pointers. Normally, you do
this for two pointers to elements that are in the same array to find out how far apart
the elements are. The result is in the same units as the type size.
Comparisons: You can use the relational operators to compare the values of two
pointers, provided the pointers are of the same type.
Recursion
A function that calls itself directly or indirectly is called a recursive function. The recursive factorial
function uses more memory than its non-recursive counter part. Recursive function requires stack
support to save the recursive function calls.
Arrays
Example: a[i]: The name a of the array is a constant expression, whose value is the
address of the 0th location.
a = a+0 &a[0]
a+1 &a[1]
a+i &a[i]
Multi-Dimensional Array
In C language, one can have arrays of any dimensions. Let us consider a 3 3 matrix
Strings
In C language, strings are stored in an array of character (char) type along with the null
terminating character \0 at the end.
Stacks
A stack is an ordered collection of items into which new items may be inserted and from
which items may be deleted at one end, called the TOP of the stack. It is a LIFO (Last In First
Out) kind of data structure.
Operations on Stack
Push: Adds an item onto the stack. PUSH (s, i); Adds the item i to the top of stack.
Pop: Removes the most-recently-pushed item from the stack. POP (s); Removes the
top element and returns it as a function value.
Implementation of Stack
A stack can be implemented using two ways: Array and Linked list.
But since array sized is defined at compile time, it cant grow dynamically. Therefore, an
attempt to insert/push an element into stack (which is implemented through array) can cause a
stack overflow situation, if it is already full.
Go, to avoid the above mentioned problem we need to use linked list to implement a stack,
because linked list can grow dynamically and shrink at runtime.
Applications of Stack
There are many applications of stack some of the important applications are given below.
Backtracking. This is a process when you need to access the most recent data
element in a series of elements.
Function Calls: Different ways of organising the data are known as data structures.
Simulation of Recursive calls: The compiler uses one such data structure called
stack for implementing normal as well as recursive function calls.
Infix expression: It is the one, where the binary operator comes between the
operands.
e. g., A + B * C.
Postfix expression: Here, the binary operator comes after the operands.
e.g., ABC * +
Reversing a List: First push all the elements of string in stack and then pop elements.
Expression conversion: Infix to Postfix, Infix to Prefix, Postfix to Infix, and Prefix
to Infix
Queues
It is a non-primitive, linear data structure in which elements are added/inserted at one end
(called the REAR) and elements are removed/deleted from the other end (called the FRONT).
A queue is logically a FIFO (First in First Out) type of list.
Operations on Queue
Enqueue: Adds an item onto the end of the queue ENQUEUE(Q, i); Adds the item i
onto the end of queue.
Dequeue: Removes the item from the front of the queue. DEQUEUE (Q); Removes
the first element and returns it as a function value.
Circular Queue: In a circular queue, the first element comes just after the last element or a
circular queue is one in which the insertion of a new element is done at the very first location
of the queue, if the last location of queue is full and the first location is empty.
Note: A circular queue overcomes the problem of unutilised space in linear queues
implemented as arrays.
We can make following assumptions for circular queue.
Front will always be pointing to the first element (as in linear queue).
Each time a new element is inserted into the queue, the Rear is incremented by 1.
Rear = Rear + 1
Each time, an element is deleted from the queue, the value of Front is incremented by
one.
Front = Front + 1
Double Ended Queue (DEQUE): It is a list of elements in which insertion and deletion
operations are performed from both the ends. That is why it is called double-ended queue or
DEQUE.
Priority Queues: This type of queue enables us to retrieve data items on the basis of priority
associated with them. Below are the two basic priority queue choices.
Sorted Array or List
It is very efficient to find and delete the smallest element. Maintaining sorted ness make the
insertion of new elements slow.
Applications of Queue:
CPU Scheduling
Routing Algorithms
Linked Lists
Linked list is a special data structure in which data elements are linked to one another. Here,
each element is called a node which has two parts
Address or pointer part which holds the address of next element of same type. Linked
list is also known as self-referential structure.
Syntax of declaring a node which contains two fields in it one is for storing information and
another is for storing address of other node, so that one can traverse the list.
Advantages of Linked List: Linked lists are dynamic data structure as they can grow and
shrink during the execution time.
Insertions and deletions can be done very easily at the desired position.
Disadvantages of Linked List: More memory is required, if the number of fields are, more.
Singly Linked List: In this type of linked list, each node has only one address field which
points to the next node. So, the main disadvantage of this type of list is that we cant access
the predecessor of node from the current node.
Doubly Linked List: Each node of linked list is having two address fields (or links) which
help in accessing both the successor node (next node) and predecessor node (previous node).
Circular Linked List: It has address of first node in the link (or address) field of last node.
Circular Doubly Linked List: It has both the previous and next pointer in circular manner.
Operations on Linked Lists: The following operations involve in linked list are as given
below
Creation: Used to create a lined list.
Insertion: Used to insert a new node in linked list at the specified position. A new node may
be inserted
Deletion: This operation is basically used to delete as item (a node). A node may be deleted
from the
Traversing: It is a process of going through (accessing) all the nodes of a linked list from
one end to the other end.
Trees
Tree (Non-linear Data Structures): Trees are used to represent data containing a
hierarchical relationship between elements e. g., records, family trees and table contents. A
tree is the data structure that is based on hierarchical tree structure with set of nodes.
Depth or Height: Maximum level number of a node + 1(i.e., level number of farthest
leaf node of a tree + 1).
Non-terminal Node: Any node except root node whose degree is not zero.
Path: Sequence of consecutive edges from the source node to the destination node.
Internal nodes: All nodes those have children nodes are called as internal nodes.
Leaf nodes: Those nodes, which have no child, are called leaf nodes.
The depth of a node is the number of edges from the root to the node.
The height of a node is the number of edges from the node to the deepest leaf.
Binary Tree: A binary tree is a tree like structure that is rooted and in which each node has
at most two children and each child of a node is designated as its left or right child. In this
kind of tree, the maximum degree of any node is at most 2.
T contains a distinguished Node R called the root of T and the remaining nodes of T
form an ordered pair of disjoint binary trees T1 and T2.
Any node N in a binary tree T has either 0, 1 or 2 successors. Level l of a binary tree T can
have at most 2l nodes.
Complete Binary Tree: A complete binary tree is a tree in which every level, except
possibly the last, is completely filled.
In which first, we need to fill left node, then right node in a level.
In which, we can start putting data item in next level only when the previous level is
completely filled.
Preorder
o
Inorder
o Traverse the left subtree of R in inorder.
o
Postorder
o Traverse the left subtree of R in postorder.
o
Breadth First Traversal (BFT): The breadth first traversal of a tree visits the nodes in the
order of their depth in the tree.
BFT first visits all the nodes at depth zero (i.e., root), then all the nodes at depth 1 and so on.
At each depth, the nodes are visited from left to right.
Depth First Traversal (DFT): In DFT, one starts from root and explores as far as possible
along each branch before backtracking.
Perfect Binary Tree or Full Binary Tree: A binary tree in which all leaves are at the same
level or at the same depth and in which every parent has 2 children.
Here, all leaves (D, E, F, G) are at depth 3 or level 2 and every parent is having exactly 2
children.
Maximum number of nodes in a binary tree: Let a binary tree contain MAX, the
maximum number of nodes possible for its height h. Then h= log(MAX + 1) 1.
The height of the Binary Search Tree equals the number of links from the root node to
the deepest node.
The left subtree of a node contains only nodes with keys less than the nodes key.
The right subtree of a node contains only nodes with keys greater than the nodes key.
The left and right subtree each must also be a binary search tree.
Preorder Tree Walk: In which we visit the root node before the nodes in either subtree.
Preorder (x) If x NIL then PRINT key[x] Preorder (left[x]) Preorder (right[x])
Postorder Tree Walk: In which we visit the root node after the nodes in its subtrees.
Postorder(x) If x NIL then Postorder (left[x]) Postorder (right[x]) PRINT key [x]
Search an element in BST: The most basic operator is search, which can be a recursive or
an iterative function. A search can start from any node, If the node is NULL (i.e. the tree is
empty), then return NULL which means the key does not exist in the tree. Otherwise, if the
key equals that of the node, the search is successful and we return the node. If the key is less
than that of the node, we search its left subtree. Similarly, if the key is greater than that of the
node, we search its right subtree. This process is repeated until the key is found or the
remaining subtree is null. To search the key in the BFS, just call the method from the root
node.
Insertion of an element: Insertion begins as a search would begin; We examine the root and
recursively insert the new node to the left subtree if its key is less than that of the root, or the
right subtree if its key is greater than the root. If the key exists, we can either replace the
value by the new value or just return without doing anything.
Deletion of an element: The deletion is a little complex. Basically, to delete a node by a
given key, we need to find the node with the key, and remove it from the tree. There are three
possible cases to consider:
Deleting a node with one child: remove the node and replace it with its child.
Deleting a node with two children: find its in-order successor (left-most node in its right subtree), lets say R. Then copy Rs key and value to the node, and remove R from its right subtree.
It takes (n) time to walk (inorder, preorder and pastorder) a tree of n nodes.
The height of the Binary Search Tree equals the number of links from the root node to the
deepest node.
The disadvantage of a BST is that if every item which is inserted to be next is greater than the
previous item, then we will get a right skewed BST or if every item which is to be inserted is
less than to the previous item, then we will get a left skewed BST.
So, to overcome the skewness problem in BST, the concept of AVL- tree or height balanced
tree came into existence.
Balanced Binary Trees: Balancing ensures that the internal path lengths are close to the
optimal n log n. A balanced tree will have the lowest possible overall height. AVL trees and
B trees are balanced binary trees.
AVL Trees: An AVL (Adelson-Velskii and Land is) is a binary tree with the following
properties.
For any node in the tree, the height of the left and right subtrees can differ by atmost 1.
The objective is to keep the structure of the binary tree always balanced with n given nodes so
that the height never exceeds O(log n).
After every insert or delete we must ensure that the tree is balanced.
A search of the balanced binary tree is equivalent to a binary search of an ordered list.
In both cases, each check eliminates half of the remaining items. Hence searching is O(log n).
Rotations: A tree rotation is required when we have inserted or deleted a node which leaves
the tree in an unbalanced form.
h(TL) h(TR) is also known as Balance Factor (BF). For an AVL (or height balanced tree),
the balance factor can be either 0, 1 or 1. An AVL search tree is binary search tree which is
an AVL-tree.
Binary Heaps
The binary heap data structure is an array that can be viewed as a complete binary tree. Each
node of the binary tree corresponds to an element of the array. The array is completely filled
on all levels except possibly lowest (lowest level is filled in left to right order and need not be
complete).
There are two types of heap trees: Max heap tree and Min heap tree.
Max heap: In a heap, for every node i other than the root, the value of a node is greater than
or equal (at most) to the value of its parent. A[PARENT (i)] A[i]. Thus, the largest element
in a heap is stored at the root.
Min heap: In a heap, for every node i other than the root, the value of a node is less than or
equal (at most) to the value of its parent. A[PARENT (i)] A[i]. Thus, the smallest element in
a heap is stored at the root.
The root of the tree A[1] and given index i of a node, the indices of its parent, left child and
right child can be computed as follows:
PARENT (i): Parent of node i is at floor(i/2)
LEFT (i): Left child of node i is at 2i
RIGHT (i): Right child of node i is at (2i + 1)
Heapify: Heapify is a procedure for manipulating heap data structures. It is given an array A
and index i into the array. The subtree rooted at the children of A[i] are heap but node A[i]
itself may possibly violate the heap property.
A[i] < A[2i] or A[i] < A[2i +1].
The procedure Heapify manipulates the tree rooted at A[i] so it becomes a heap.
Heapify (A, i)
1. l left [i]
2. r right [i]
3. if l heap-size [A] and A[l] > A[i]
4. then largest l
5. else largest i
6. if r heap-size [A] and A[i] > A[largest]
7. then largest r
8. if largest i
9. then exchange A[i] A[largest]
10. Heapify (A, largest)
Graphs
A graph is a collection of nodes called vertices, and the connections between them, called
edges.
Directed Graph: When the edges in a graph have a direction, the graph is called a directed
graph or digraph and the edges are called directed edges or arcs.
Adjacency: If (u,v) is in the edge set we say u is adjacent to v.
Path: Sequence of edges where every edge is connected by two vertices.
Loop: A path with the same start and end node.
Connected Graph: There exists a path between every pair of nodes, no node is disconnected.
Acyclic Graph: A graph with no cycles.
There are many ways of representing a graph:
Adjacency List
Adjacency Matrix
Incidence Matrix
Incidence List
Graph Traversals: Visits all the vertices that it can reach starting at some vertex. Visits all
vertices of the graph if and only if the graph is connected (effectively computing Connected
Components). Traversal never visits a vertex more than once.
The breadth first search (BFS) and the depth first search (DFS) are the two algorithms used
for traversing and searching a node in a graph.
Depth first search (DFS) Algorithm
Step 1: Visit the first vertex, you can choose any vertex as the first vertex (if not explicitly
mentioned). And push it to the Stack.
Step 2: Look at the undiscovered adjacent vertices of the top element of the stack and visit
one of them (in any particular order).
Step 3: Repeat Step 2 till there is no undiscovered vertex left.
Step 4: Pop the element from the top of the Stack and Repeat Step 2, 3 and 4 till the stack is
not empty.
Applications of DFS
Topological sorting
Analysis of the DFS: The running time of the algorithm would be O(|V|+|E|).
Breadth First Search Algorithm
Step 1: Visit the first vertex, you can choose any node as the first node. And add it into the a
queue.
Step 2: Repeat the below steps till the queue is not empty.
Step 3: Remove the head of the queue and while staying at the vertex, visit all connected
vertices and add them to the queue one by one (you can choose any order to visit all the
connected vertices).
Step 4: When all the connected vertices are visited. Repeat Step 3.
Applications of BFS
Algorithms
The following topics covered under Algorithm:
Introduction of Algorithms: Algorithm can be classified by the amount of time they need to
complete compared to their input size. The analysis of an algorithm focuses on the
complexity of algorithm which depends on time or space. For more details on Introduction of
Algorithms, click here
Searching: Searching is majorly of two types: Sequential and binary search. To know more
on sequential and binary search, click here
Sorting: Sorting can be of two types, namely, In-place sorting and Stable Sorting . The inplace sorting algorithm does not use extra storage to sort the elements of a list. Stable sorting
algorithm maintain the relative order of records with equal values during sorting. Get detailed
study notes on Sorting here
Hashing: Hashing is a common method of accessing data records. A hash system that stores
records in an array, called a hash table. Hash function primarily is responsible to map
between the original data items and the smaller table. For complete Notes on Hashing, click
here
Space and time complexities: Space and Time complexities covers Asymptotic Notations
and Analysis of Algorithms. To get complete study notes on Space and time complexities,
click here
Algorithm design techniques: The idea of the technique is to divide the problem into
smaller but similar sub problems (divide), solve it (conquer), and (combine) these solutions to
create a solution to the original problem. To know more on Algorithm design technique, click
here
Minimum Spanning Trees: Minimum Spanning Trees can be shown through Prims
algorithms and Kruskals algorithms. To know more on Minimum Spanning Trees, Click
Here
Shortest Paths: Shortest path problem is to determine one or more shortest path between a
source vertex and a target vertex, where a set of edges are given. To know more on Shortest
Path, Click here
Introduction of Algorithm
An algorithm is well defined computational procedure that transforms inputs into outputs,
achieving the desired input-output relationship.
An algorithm is a sequence of computational steps that transform the input into the
output.
Searching
Sequential Search (Linear Search)
Pseudo code of Sequential search:
Analysis of Sequential Search: The time complexity in sequential search in all three cases is
given below.
Best case: When we find the key on the first location of array, then the complexity is
O(1).
Worst case: When the key is not found in the array and we have to scan the complete
array, then the complexity is O(n).
Average case: When the key could appear anywhere in the array, then a successful
search will take total 1 + 2 + 3 ++ n comparisons.
comparisons.
Middle element
Elements on left side of the middle element
Elements on right side of the middle element
Analysis of Binary Search: The time complexity of binary search in all three cases is given
below
Sorting
Sorting can be of two types: In-place sorting and Stable Sorting.
In-place Sorting: The in-place sorting algorithm does not use extra storage to sort the
elements of a list.
Stable Sorting: Stable sorting algorithm maintain the relative order of records with equal
values during sorting.
Bubble Sorting
Analysis of Bubble Sort: The time complexity of bubble sort in all three cases is given
below
Insertion Sorting
The insertion sort only passes through the array once. Therefore, it is very fast and efficient
sorting algorithm with small arrays. Insertion sort is useful only for small files or very nearly
sorted files.
Pseudo Code of Insertion Sort
Selection Sorting
{
if(A[j] < A[k] then k=j;
}
if(k i) then swap(A[i], A[k]);
}
Analysis of Selection Sort
Heap Sort
Heap sort is simple to implement and is a comparison based sorting. It is in-place sorting but
not a stable sort.
Max heap: A heap in which the parent has a larger key than the childs is called a max heap.
Min heap: A heap in which the parent has a smaller key than the childs is called a min heap.
Heapify:
Heapify (A, i)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
l left [i]
r right [i]
if l heap-size [A] and A[l] > A[i]
then largest l
else largest i
if r heap-size [A] and A[i] > A[largest]
then largest r
if largest i
then exchange A[i] A[largest]
Heapify (A, largest)
Build Heap
BUILD_HEAP (A)
1. heap-size (A) length [A]
2. For i floor(length[A]/2) down to 1 do
3. Heapify (A, i)
Heap Sort
HEAPSORT (A)
1. BUILD_HEAP (A)
2. for i length (A) down to 2 do
exchange A[1] A[i]
heap-size [A] heap-size [A] 1
Heapify (A, 1)
Analysis of Heap Sort: The total time for heap sort is O (n log n) in all three cases (best,
worst and average).
Note:
Merge sort is a fast sorting algorithm whose best, worst, and average case complexity are all
in O(n log n), but unfortunately it uses O(n) extra space to do its work.
Quicksort has best and average case complexity in O(n log n), but unfortunately its worst
case complexity is in O(n2).
Hashing
Hashing is a common method of accessing data records.
Hash Table: A hash system that stores records in an array, called a hash table.
Hash Function: Hash function primarily is responsible to map between the original data
items and the smaller table. There are many hash functions approaches as follows
Division Method: Mapping a key K into one of m slots by taking the remainder of K divided
by m.
h (K) = K mod m
Mid-Square Method: Mapping a key K into one of m slots, by getting the some middle
digits from value K2.
h(k) = K2 and get middle (log10 m) digits
Folding Method: Divide the key K into some sections, besides the last section, have same
length. Then, add these sections together.
Shift folding
Folding at the boundaries
Big Oh (O): If we write f(n) = O(g(n)), then there exists a function f(n) such that f(n) cg
(n) with any constant c. Or we can say g(n) is an asymptotic upper bound for f(n).
2
3
o Example: 2n = O(n), 2n= O(n ), 2n =O(n )
Big Omega (): If we write f(n) = (g(n)), then there exists a function f(n) such that f(n)
cg(n) with any constant c. Or we can say Function g(n) is an asymptotic lower bound for f(n).
o Example: n = (log2n) with constant c = 1
Big Theta (): If we write f(n) = (g(n)), then there exists a function f(n) such that c1g(n)
f(n) c2g(n) with any positive constants c1 and c2. Or we can say Function g(n) is an
asymptotically tight bound for f(n).
o f(n) = (g(n)) if and only if f = O(g(n)) and f(n) = (g(n)).
Small Oh (o): Notation if we write f(n) = o(g(n), then there exists a function such that f(n) <
c g(n) with any positive constant c. Or we can say Function g(n) is an asymptotically tight
upper bound of f(n).
Example: n1.99 = o(n2)
Small Omega (): Notation if we write f(n) = (g(n)), then these exists a function such
that f(n) > cg(n) with any positive constant c. Or we can say g(n) is asymptotically tight
lower bound of f(n).
2.00001
o Example: n
= (n2) and n2 (n2)
Analysis of Algorithms
Algorithm can be classified by the amount of time they need to complete compared to their
input size. The analysis of an algorithm focuses on the complexity of algorithm which
depends on time or space.
Time Complexity: The time complexity is a function that gives the amount of time required
by an algorithm to run to completion.
o Worst case time complexity: It is the function defined by the maximum amount of
time needed by an algorithm for an input of size n.
o Average case time complexity: It is the execution of an algorithm having typical
input data of size n.
o Best case time complexity: It is the minimum amount of time that an algorithm
requires for an input of size n.
Space Complexity: The space complexity is a function that gives the amount of space
required by an algorithm to run to completion.
Recurrence Relations
A recurrence is a function defined in terms of One or more base cases and Itself with smaller
arguments.
Example:
There are two methods to solve the recurrence relation given as: Substitution method
and Master method.
1. Substitution Method: There are two steps in this method
2. Master Method: The master method gives us a quick way to find solutions to recurrence
relations of the form T(n) = aT (n/b) + f(n). Where, a and b are constants, a 1 and b > 1)
Breaking the problem into several sub-problems that are similar to the original
problem but smaller in size,
Solve the sub-problem recursively (successively and independently), and then
Combine these solutions to sub-problems to create a solution to the original problem.
Divide-and-Conquer Examples:
Mergesort
Quicksort
Binary tree traversals
Binary search
Multiplication of large integers
Matrix multiplication: Strassens algorithm
Closest-pair and convex-hull algorithms
Merge Sort
Merge sort is a comparison based sorting algorithm. Merge sort is a stable sort.
Algorithm MERGE
Input: An array A[1m] of elements and three indices p, q and r, with 1 p q< r m, such
that both the sub arrays A[pq] and A[q+1r] are sorted individually in non-decreasing
order.
Output: A[pr] contains the result of merging the two subarrays A[pq] and A[q+1..r].
// B[] is an auxiliary array.
s=p;
t=q+1;
k=p;
while (s p && t r)
{
if (A[s] A[t] )
{
B[k]=A[s];
s=s+1;
}
else
{
B[k] = A[t];
t=t+1;
}
k=k+1;
}
if (s=q+1)
{ B[kr] = A[tr];}
else { B[kr] = A[sq]; }
A[pr]=B[pr]
Analysis of Merge Sort
The merge sort algorithm always divides the array into two balanced lists.
So, the recurrence relation for merge sort is:
T(n)
= 1; if(n <1)
= 2T(n/2)+4n otherwise
Quick Sort
It is in-place sorting. It is also known as partition exchange sort.
The elements A[low..high] to be sorted are rearranged using Algorithm split so that the pivot
element, which is always A[low], occupies its correct position A[w], and all elements that are
less than or equal to A[w] occupy the positions A[low..w 1], while all elements that are
greater than A[w] occupy the positions A[w + 1..high]. The subarrays A[low..w 1] and
A[w+1..high] are then recursively sorted to produce the entire sorted array. The formal
algorithm is shown as Algorithm quicksort.
Algorithm: Quicksort
Input: An array A[1n] of n elements.
Output: The elements in A sorted in non-decreasing order.
1. quicksort(A,1,n)
Procedure quicksort(A, low, high)
{
if (low < high)
{
SPLIT ( A[lowhigh], w)// w is the new position of A[low];
quicksort(A, low, w-1);
quicksort(A,w+1,high);
}
}
Analysis of Quick Sort
Worst case O(n2) This happens when the pivot is the smallest (or the largest)
element.
Best case O(n log n) The pivot is in the middle and the subarrays divide into balanced
partition every time.
Average case O(n log n).
Greedy Algorithms
A greedy algorithm is an algorithm that uses the heuristic of making the locally optimal
choice at each stage of problem solving, with the hope of finding a globally optimal.
Prims algorithm
The basic idea is to start at some arbitrary node and grow the tree one edge at a time, always adding
the smallest edge that does not create a cycle. What makes Prims algorithm distinct from Kruskals is
that the spanning tree grows connected from the start node. We need to do this n-1 times to make sure
that every node in the graph is spanned. The algorithm is implemented with a priority queue. The
output will be a tree represented by a parent array whose indices are nodes.
We keep a priority queue filled with all the nodes that have not yet been spanned. The value of each
of these nodes is equal to the smallest weight of the edges that connect it to a the partial spanning tree.
1. Initialize the Pqueue with all the nodes and set their values to a number larger than any edge,
set the value of the root to 0, and the parent of the root to nil.
2. While Pqueue is not empty do { Let x be the minimum value node in the Pqueue; For every
node y in xs adjacency list do { if y is in Pqueue and the weight on the edge (x,y) is less than
value(y) { Set value(y) to weight of (x,y); Set parent of y to x;} } }
An example will be done in class, and you can find one in your text on page 508.
there may be a need to change a value in the heap, which takes O(lg n). Hence the total time
complexity is O(n) + O(n log n) + O(e log n), and the last term dominates.
Kruskals Algorithm
Kruskals algorithm also works by growing the tree one edge at a time, adding the smallest edge that
does not create a cycle.
We start with n distinct single node trees, and the spanning tree is empty. At each step we add the
smallest edge that connects two nodes in different trees.
In order to do this, we sort the edges and add edges in ascending order unless an edge is already in a
tree.
For each edge (u,v) in the sorted list in ascending order do {
If u and v are in different trees then add (u,v) to the spanning tree, and union the trees that contain u
and v.}
Hence we need some data structure to store sets of edges, where each set represents a tree and the
collections of sets represents the current spanning forest. The data structure must support the
following operations: Union (s, t) which merges two trees into a new tree, and Find-Set(x) which
returns the tree containing node x.
is an O(n) operation, because the tree can get long and thin, depending on the order of the parameters
in the calls to the Union. In particular it is bad to point the taller tree to the root of the shorter tree.
We can fix this by changing Union. Union(x,y) will not just set the parent of x to y. Instead it will
first calculate which tree, x or y, has the greater number of nodes. Then it points the parent of the tree
with the fewer nodes to the root of the tree with the greater nodes. This simple idea guarantees that the
height of a tree is at most lg n. This means that the Find operation has become O(log n).
Compiler Design
Here is the list of topics covered under GATE 2016 Compiler Design Chapter:
Lexical Analysis: Lexical analyzer reads the source program character by character and
returns the tokens of the source program. It puts information about identifiers into the symbol
table. To read more on Lexical analysis, Click Here
Parsing: Syntax analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. The syntax of a programming is described by a
Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form) notation in the
description of CFGs. To read more on Parsing, Click Here
Syntax Directed Translation: Grammar symbols are associated with attributes to associate
information with the programming language constructs that they represent. Values of these
attributes are evaluated by the semantic rules associated with the production rules. For
detailed notes on Syntax Directed Translation, Click Here
Runtime Environments: It refers how do we allocate the space for the generated target code
and the data object of our source programs? The places of the data objects that can be
determined to compile time will be allocated statically. But the places for the some of data
objects will be allocated at run-time. For detailed notes on Runtime Environments, Click
Here
Intermediate Code Generation: Intermediate codes are machine independent codes, but
they are close to machine instructions. The given program in a source language is converted
to an equivalent program in an intermediate language by the intermediate code generator. For
more on Intermediate Code Generation, Click Here
Lexical Analysis
Lexical analyzer reads the source program character by character and returns the tokens of
the source program. It puts information about identifiers into the symbol table.
The Role of Lexical Analyzer:
It reads the input character and produces output sequence of tokens that the Parser
uses for syntax analysis.
Lexical Analyzer is also responsible for eliminating comments and white spaces from
the source program.
Lexical Analyzer is also responsible for eliminating comments and white spaces from
the source program.
It also generates lexical errors.
A token describes a pattern of characters having same meaning in the source program
such as identifiers, operators, keywords, numbers, delimiters and so on. A token may
have a single attribute which holds the required information for that token. For
identifiers, this attribute is a pointer to the symbol table and the symbol table holds the
actual attributes for that token.
Example:
Lexeme: A1, Sum, Total
Pattern: Starting with a letter and followed by letter or digit but not a keyword.
Token: ID
Lexeme: 123.45
Pattern: Starting with digit followed by a digit or optional fraction and or optional
exponent
Token: NUM
Parsing
Syntax Analyzer (Parser): Syntax analyzer creates the syntactic structure of the given
source program. This syntactic structure is mostly a parse tree. The syntax of a programming
is described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the rules
implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of
that program. Otherwise the parser gives the error messages.
Efficient top-down and bottom-up parsers can be implemented only for subclasses of
context-free grammars.
Both top-down and bottom-up parsers scan the input from left-to-right (one symbol at
a time).
1. LL for top-down parsing
2. LR for bottom-up parsing)
E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses substitutions
during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and uses
reductions during the process.
Context-Free Grammars: Inherently recursive structures of a programming language are
defined by a CFG. In a CFG, we have A start symbol (one of the non-terminals). A finite set
of terminals (in our case, this will be the set of tokens). A set of non-terminals (syntactic
variables).
A finite set of production rules in the following form
A , where A is non-terminal and a is a string of terminals (including the empty string).
Parse Trees: Graphical representation for a derivation that filters out the order of choosing
non-terminals to avoid rewriting. The root node represents the start symbol, inner nodes of a
parse tree are non-terminal Symbol.
Ambiguity: A grammar produces more than one parse tree for a sentence is called as
ambiguous grammar. Unambiguous grammar refers unique selection of the parse tree for
a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect
. Ambiguity can be handled in several ways
Enforce associativity and precedence
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an unambiguous one
Left Recursion: A grammar is left recursive, if it has a non-terminal A such that there is a
derivation.
A A for some string
The left-recursion may appear in a single step of the derivation (immediate left recursion) or
may appear in more than one step of the derivation.
A top down parser with production A A may loop forever
From the grammar A A | b left recursion may be eliminated by transforming the
grammar to
AbR
RR|
Left recursion is an issue of concern in top down parsers. A grammar is left-recursive if
we can find some non-terminal A which will eventually derive a sentential form with
itself as the left-symbol. In other words, a grammar is left recursive if it has a nonterminal A such that there is a derivation
A + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique cant handle left recursive grammars. So, we have to convert
our left recursive grammar into an equivalent grammar which is not left recursive.
Removal of left recursion
In general
A A1|A2||Am
|1|2||n
Transforms to
A 1A|2A|.|nA
A 1A|2A||mA|
Left Factoring: A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left factored.
where is not empty and the first symbols of 1 and 2 (if they have one) are different.
When processing , we cant know whether expand
A 1 | 2
But, if we rewrite the grammar as follows
A A
A 1 | 2, so we can immediately expand A to A.
Dangling else problem can be handled by left factoring
stmt if expr then stmt else stmt | if expr then stmt
can be transformed to
stmt if expr then stmt S
S else stmt |
Top-down Parsing: There are two main techniques to achieve top-down parse tree
1. Recursive descent parsing
2. Predictive parsing
Predictive Parser
. A non recursive top down parsing method
. Parser predicts which production to use
. It removes backtracking by fixing one production for every non-terminal and input token(s)
. Predictive parsers accept LL(k) languages
First L stands for left to right scan of input
Second L stands for leftmost derivation
k stands for number of lookahead token
. In practice LL(1) is used
Predictive Parser
Functions used in Constructing LL (1) Parsing Tables
Two functions are used in the construction of LL (1) parsing tables: FIRST and
FOLLOW.
FIRST () is a set of the terminal symbols which occur as first symbols in strings
derived from , where is any string of grammar symbols. If derives to , then is
also in FIRST ().
FOLLOW (A) is the set of the terminals which occur immediately after (FOLLOW)
the non-terminal A in the strings derived from the starting symbol.
First set is computed for all non-terminals, but follow set is computed only for those
non-terminals in their first set contain epsilon.
For every terminal y in Follow(Y), there is an entry (null production) in the table.
Predictive Parser
To Compute FIRST of any String X
If X is Y1, Y2, Yn
If a terminal a in FIRST (Yi) and is in all FIRST (Yj) for j = 1, i -1, then a is in
FIRST (X).
If is in all FIRST (Yj) for j =1,..n, then is in FIRST (X).
Predictive Parser
Example:
For the expression grammar
E T E
E +T E |
T F T
T * F T |
F ( E ) | id
First(E) = First(T) = First(F) = { (, id }
First(E) = {+, }
First(T) = { *, }
To compute FOLLOW (for Non-terminals):
If S is the start symbol, $ is in FOLLOW (S).
Apply these rules until nothing more can be added to any FOLLOW set.
The parser considers X the symbol on top of stack, and a the current input symbol
Assume that $ is a special token that is at the bottom of the stack and terminates the
input string.
For the above grammar and parsing table, we verify the string id + id * id in the following
way with the help of parsing algorithm.
Bottom-up Parsing Techniques: A bottom-up parser creates the parse tree of the given
input string from leaves towards the root. A bottom-up parser tries to find the right most
derivation of the given input in the reverse order.
Bottom-up parsing is also known as shift reduce parsing.
Shift Reduce Parsing: A shift reduce parser tries to reduce the given input string into the
starting symbol. At each reduction step, a substring of the input matching to the right side of
a production rule is replaced by the non-terminal at the left side of that production rule.
Handle: A handle of a string is a substring that matches the right side of a production rule.
Handles always appear at the top of the stack and never inside it.
Shift: The next input symbol is shifted onto the top of the stack.
Reduce: Replace the handle on the top of the stack by the non-terminal.
Error: Parser discovers a syntax error and calls an error recovery routine.
What action to take in case both shift and reduce are valid?
Types of LR Parsers:
SLR, CLR and LALR work in same way, but their parsing tables may different.
Relative power of various classes :
SLR(1) LALR(1) LR(1)
SLR(k) LALR(k) LR(k)
LL(k) LR(k)
SLR (1) < LALR (1) < LR (1)
SLR (k) < LALR (1) < LR (k)
LL (k) < LR (k)
LR parsing: LR parsing is most general non-back tracking shift reduce parsing. The class of
grammars that can be parsed using LR methods is a proper superset of the class of grammars
that can be parsed with predictive parsers.
LL (1) grammars LR (1)) grammars
An LR parser can detect a syntactic error as soon as it is possible.
A configuration of a LR parsing is
(S0 X1 S1 Xm Sm, ai ai-1 an $)
Stack Rest of input
Sm and ai decides the parser action by consulting the parsing action table (initial stack
contains just S0).
LR Parser Actions
Shift S: Shift the next input symbol and the state S onto the stack
(S0 X1 S1 Xm Sm, ai ai-1 an $) (S0 X1 S1 Xm Sm, ai S, ai-1 an $)
Reduce A : Pop 2|| (= r) items from the stack; let us assume that = Yl, Y2 , Yr
Then, push A and S, where S = goto [Sm r , A]
(S0 X1 S1 Xm Sm, ai ai+1 an $) (S0 X1 S1 Xm-r Sm-r, AS, ai , ai-1 an $)
Accept: Parsing successfully completed.
Error: Parser detected an error (an empty entry in the action table).
Example:
Consider the grammar And its parse table E E + T | T
T T*F|F
F ( E ) | id
Generalization of CFG where each grammar symbol has an associated set of attributes
Translation schemes
An attribute may hold a string, a number, a memory location, a complex record etc.
Evaluation of a semantic rule defines the value of an attribute, but a semantic rule
may also have some side effects such as printing a value.
Attributes
The value of a synthesized attribute is computed from the values of its children nodes
The value of an inherited attribute is computed from the sibling and parent nodes.
S-Attributed grammar: A syntax directed definition that uses only synthesized attributes is
said to be an S- attributed definition. A parse tree for an S-attributed definition can be
annotated by evaluating semantic rules for attributes.
Translations are appended only at the end.
It uses bottom up parsing for evaluation.
L-attributed grammar: When translation takes place during parsing, order of evaluation is
linked to the order in which nodes are created. L-attributed definition: where attributes can be
evaluated in depth-first order.
This definition can use synthesized attributes and also restricted inherited attributes(the value
can be taken from parent and left siblings only).
Translations can append anywhere in the rhs of the production.
It uses a natural order in both top-down and bottom-up parsing is depth first-order.
Evaluation order of SDTs
Dependency Graph : Directed graph indicating interdependencies among the synthesized and
inherited attributes of various nodes in a parse tree.
Syntax directed translation table: Symbols E, T and F are associated with an attribute
value.
Runtime Environments
It refers how do we allocate the space for the generated target code and the data object of our
source programs? The places of the data objects that can be determined to compile time will
be allocated statically. But the places for the some of data objects will be allocated at runtime.
The allocation and de allocation of the data objects is managed by the run-time support
package. Run-time support package is loaded together with the generated target code. The
structure of the run-time support package depends on the semantics of the programming
language (especially the semantics of procedures in that language).
Symbol Table
Compiler uses symbol table to keep track of scope and binding information about
names
symbol table is changed every time a name is encountered in the source; changes to
table occur
Procedure Activation
Each activation of a procedure is called as activation of that procedure. An execution of a
procedure starts at the beginning of the procedure body. When the procedure is completed, it
returns the control to the point immediately after the place, where that procedure is called.
Each execution of the procedure is called as its activation.
If a and b are procedure activations, then their lifetimes are either non-overlapping or
are nested.
Activation Tree
We can create a tree (known as activation tree) to show the way control enters and leaves
activations. In an activation tree
The node a is a parent of the node b if and only if the control flows from a tab.
The node a is left to the node b if the lifetime of a occurs before the lifetime of b.
Example:
Program main;
enter main
Procedure s;
enter p
Begin end;
enter q
Procedure p;
exit q
Procedure q;
enter s
Beginend;
exit s
Begin q; s; end;
exit p
Begin p;s; end;
enter s
exit s
exit main
Control Stack
The flow of the control in a program corresponds to a depth first traversal of the activation
tree that
1. Starts at the root.
2. Visits a node before its children.
3. Recursively visits children at each node and a left-to-right order.
A stack called control stack can be used to keep track of live procedure activations.
1. An activation record is pushed onto the control stack as the activation starts.
2. That activation record is popped when that activation ends.
When node n is at the top of the control stack, the stack contains the nodes along the
path from n to the root.
Variable Scope
The scope rules of the language determine, which declaration of a name applies when the
name appears in the program.
An occurrence of a variable is local, if that occurrence is in the same procedure in which that
name is declared and the variable is non-local, if it is declared outside of that procedure.
Example:
Procedure q;
Var a: real;
Procedure r;
Var b: integer;
Begin b=1; a=2; end;
Beginend;
Variable b is local to procedure r and variable a is non-local to procedure r.
Storage Organisation
Static allocation: lays out storage at compile time for all data objects
Heap allocation :allocates and de-allocates storage as needed at runtime from heap
Static allocation:
Constraints:
o
Stack Allocation:
Recursion supported
Heap Allocation:
The values of the local variables must be retained when an activation ends
Heap allocation gives out pieces of contiguous storage for activation records
Languages like Algol have dynamic data structures and it reserves some part of
memory for it.
Activation Record
Information needed by a single execution of a procedure is managed using a contiguous block
of storage called activation record. When a procedure is entered, an activation record is
allocated and it is deallocated when that procedure exits. Size of each field can be determined
at compile time, although actual location of the activation record is determined at run-time.
Key Points
If a procedure has a local variable and its size depends on a parameter, its size is
determined at run-time.
Return Value: The returned value of the called procedure is returned in this field to the
calling procedure. We can use a machine register for the return value.
Actual parameters: The field for actual parameters is used by the calling procedure to
supply parameter to the called procedure.
Optional control link: The optional control link points to the activation record of the caller.
Optional access link: It is used to refer to the non-local data held in the other activation
record.
Saved Machine Status: The field for saved machine status holds information about the state
of the machine before the procedure is called
Local data: Local data field holds data that is local to an execution of a procedure.
Temporaries: Temporary variables are stored in field of temporaries.
Syntax Tree
Syntax tree is a variant of the parse tree, where each leaf represents an operand and each
interior node represent an operator.
Three-Address Code
When each statement contains three addresses (two for operands and one for result), Most
general kind of three-address code is
x = yop z
Where x, y and z are names, constants or compiler generated temporaries and op is any
operator.
But we can also use the following notation for quadruples (much better notation because it
looks like a machine code instruction)
Op y, z, x
Apply operator op to y and z and store the result in x.
Triples
Indirect triple