0% found this document useful (0 votes)

98 views260 pages

Datastructure C

The document discusses various data structures including linked lists, stacks, queues, trees and graphs. It describes linked lists and their advantages over arrays in terms of flexibility of space usage. It provides examples of implementing linked lists using structures and pointers in C. It also describes stacks as a special type of collection with LIFO semantics and methods to implement push and pop operations on stacks using either arrays or linked lists.

Uploaded by

keshavsaini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views260 pages

Datastructure C

Uploaded by

keshavsaini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 260

C

Advance Data Structures

By : Anand B
E-mail :
[email protected]
Index
• Searching/Sorting
• Link Lists
– Singly
– Doubly
– Circular
• Queue
• Stacks
• Trees
• Graphs
• Symbol Tables
• Garbage Collection
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Array Limitations
• Arrays
– Simple,
– Fast
but
– Must specify size at construction time
– Murphy’s law
• Construct an array with space for n
– n = twice your estimate of largest collection
• Tomorrow you’ll need n+1
– More flexible system?
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Flexible space use
– Dynamically allocate space for each element as needed
– Include a pointer to the next item
Linked list
– Each node of the list contains
• the data item (an object pointer in our ADT)
• a pointer to the next node

Data Next

object

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Collection structure has a pointer to the list head
– Initially NULL

Collection
Head

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Collection structure has a pointer to the list head
– Initially NULL
• Add first item
– Allocate space for node
– Set its data pointer to object
– Set Next to NULL
– Set Head to point to new node

Collection
node
Head
Data Next

object
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Add second item
– Allocate space for node
– Set its data pointer to object
– Set Next to current Head
– Set Head to point to new node

Collection
Head

node
node
Data Next
Data Next

object2
Anand B
object
[email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Add implementation
struct t_node {
void *item;
struct t_node *next;
} node;
typedef struct t_node *Node;
struct collection {
Node head;
……
};
int AddToCollection( Collection c, void *item ) {
Node new = malloc( sizeof( struct t_node ) );
new->item = item;
new->next = c->head;
c->head = new;
return TRUE;
}

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Add implementation
struct t_node {
void *item;
Recursive type definition -
struct t_node *next;
C allows it!
} node;
typedef struct t_node *Node;
struct collection {
Node head;
……
};
int AddToCollection( Collection c, void *item ) {
Node new = malloc( sizeof( struct t_node ) );
new->item = item;
new->next = c->head;
c->head = new;
Error checking, asserts
return TRUE;
omitted for clarity!
}

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Add time
– Constant - independent of n
• Search time
– Worst case - n

Collection
Head
node
node
Data Next
Data Next

object2
object
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Find implementation
• Implementation

void FindinCollection( Collection c, void key ) {

Node n = c->head;
while ( n != NULL ) {
if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) {
return n->item;
n = n->next;
}
return NULL;
}

• A recursive implementation is also possible!

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Delete implementation
• Implementation

void DeleteFromCollection( Collection c, void key ) {

Node n, prev;
n = prev = c->head;
while ( n != NULL ) {
if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) {
prev->next = n->next;
return n;
}
prev = n;
n = n->next;
} head
return NULL;
}

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Delete implementation
• Implementation

void DeleteFromCollection( Collection c, void key ) {

Node n, prev;
n = prev = c->head;
while ( n != NULL ) {
if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) {
prev->next = n->next;
return n;
} head
prev = n;
n = n->next;
}
Minor addition needed to allow
return NULL;
for deleting this one! An exercise!
}

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - LIFO and FIFO
• Simplest implementation
– Add to head
Last-In-First-Out (LIFO) semantics
• Modifications
– First-In-First-Out (FIFO)
– Keep a tail pointer
head

struct t_node { tail

void *item;
struct t_node *next;
tail is set in
} node;
the AddToCollection
typedef struct t_node *Node;
method if
struct collection {
head == NULL
Node head, tail;
Anand B }; [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Doubly linked
• Doubly linked lists
– Can be scanned in both directions
struct t_node {
void *item;
struct t_node *prev,
*next;
} node;

typedef struct t_node *Node;

struct collection {
Node head, tail;
}; head prev prev prev

tail

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• A stack is a data structure used to store and retrieve data.
• The stack supports two operations push and pop.
• The push operation places data on the stack and the pop operation
retrieves the data from the stack.
• The order in which data is retrieved from the stack determines the
classification of the stack.
– A FIFO (First In First Out) stack retrieves data placed on the stack first.
– A LIFO (Last In First Out) stack retrieves data placed on the stack last.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Stacks are a special form of collection with LIFO semantics
• Two methods
– int push( Stack s, void *item );
- add item to the top of the stack
– void *pop( Stack s );
- remove an item from the top of the stack
• Like a plate stacker
• Other methods

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks - Implementation
• Arrays
– Provide a stack capacity to the constructor
– Flexibility limited but matches many real uses
• Capacity limited by some constraint
– Memory in your computer
– Size of the plate stacker, etc

• Linked list also possible

• push, pop methods

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks - Implementation

head prev prev prev

tail

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks - Implementation
• Arrays common
– Provide a stack capacity to the constructor
– Flexibility limited but matches many real uses
• Stack created with limited capacity

struct t_node
{ prev is optional!
void *item;
struct t_node *prev,
*next;
} node;
typedef struct t_node *Node;
struct collection
{
head prev prev prev
Node head, tail;
};
tail

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stack Frames - Functions in HLL
• Program
function f( int x, int y) {
int a;
if ( term_cond ) return …;
a = ….;
return g( a );
}

function g( int z ) {
int p, q;
p = …. ; q = …. ;
return f(p,q);
}

Context
for execution of f
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Application of Stacks
– The stacks can be utilize to evaluate mathematical
expressions
– These can be used to write non recursive programs to
avoid recursion
• Expression Evaluation
– Based on the presence of mathematical operator in the
expression, Expressions are classified into
• Infix
• Postfix
• Prefix

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Infix
– Mathematical operator is preceded & succeeded by operands
– Ex: A+B
• Postfix
– Operands are succeeded by Mathematical operator
– Ex: AB+
• Prefix
– Operands are preceded by Mathematical operator
– Ex: +AB
• Note:
– Postfix & Prefix expressions are also called as polish expressions.
– Postfix & Prefix expressions are parentheses less expressions.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Converting Infix to Postfix (Single digit constants)
• The infix expression must be entered as string
• Extract character one by one until end of the string & perform the fallowing

• Check the open parenthesis.

– If yes push that in to opstack
• Else
– Check for operand
• If yes place that operand directly in the post fix array.
– Else
• Check for operator.
– If yes pop all the operators from the opstack which are having higher or equal precedence of the operator which
is from infix exp.
– And place popped operator in the postfix array.
– After popping operations is over push the infix operator into operators stack.
• Else
– Check for Closing “)” parenthesis.
– If yes pop all the operators from opstack until open parenthesis & place them in the postfix array
• Pop all operators which are remaining in the opstack & place them in the postfix array.
• If any one of above is not true then display error message & terminate prog

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
Infix Expression :A+ B * C Infix
Postfix Exp :AB C* +
A + B * C \n

1. Read A - operand : Push to Postfix

2. Read + - operator :
• Pop all operators from opstack & push to Postfix array opstack
• Push + in to opstack
3. Read B - operand : Push to Postfix
4. Read * - operator :
• Pop all operators from opstack & push to Postfix array
• Push * in to opstack
5. Read C - operand : Push to Postfix * 4
6. Pop all the operators from opstack
+ 2
Postfix

A B C * +
1 3 5 6a 6b
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
Infix Expression :A+ B * C + D Infix
Postfix Exp :AB C* + D + A + B * C + D \n

1. Read A - operand : Push to Postfix

2. Read + - operator :
• Pop all operators from opstack & push to Postfix array
• Push + in to opstack opstack
3. Read B - operand : Push to Postfix
4. Read * - operator :
• Pop all operators from opstack & push to Postfix array
• Push * in to opstack
5. Read C - operand : Push to Postfix
6a
6. Read + - operator : 8a
• Pop all operators from opstack & push to Postfix array * 4
• Push * in to opstack
• + +
Push + to opstack 2 6d
7. Read D - operand : Push to Postfix Postfix
8. Pop all the operators from opstack
A B C * + D +
1 3 5 6b 6c 7 8b
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Evaluating the postfix expression
– Read char by char from postfix array & perform fallowing
• Check for operand, If yes push the value of the operand into the value stack.
• Check for operator.
– If yes perform 2 pop operations on the value stack
– Perform the mathematical operation with popped value
– Push the resultant value into value stack
– Pop the values which remains in the value stack & present that as a result of
the expression
– Ex:
• A+B*C = ABC *+ => 2+3+4 = 234*+
• A+B*C+D = ABC *+D+ => 2+3*4+5 = 234*+5+
• (A+B)*(C+D)=AB+CD+* => (2+3)*(4+5) = 23+45+*

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Queue
• The queue is another data structure.
• A physical analogy for a queue is a line at a bank. When you go to the bank,
customers go to the rear (end) of the line and customers come off of the line
(i.e., are serviced) from the front of the line.
• Like a stack, a queue usually holds things of the same type.
• The main property of a queue is that objects go on the rear and come off of the
front of the queue.

Front rear Front rear

A B C Add D to Queue D A B C D

Front rear Front rear

A B C D Delete Item B C D
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing queue
Implementing queue using Array

qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1

Push: 10
if (rear >= qsize -1) -> Overflow
Front=rear= 0
item = 10 , 20 , 90
Q [++rear] = item 10 20
if (rear=0) front=0
else if (rear= qsize) rear = 0 Front=0 rear= 1
10 20 90
Pop:
if (front == -1|| front > rear) -> Empty Front=0 rear= 2
item = Q [front++]
if (front = qsize ) front= 0
20 30
Front=1 rear=2

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Queue
• In a normal queue insertion operation can be performed at one end (rear end)
• And deletion operation can be performed at another end (front end)
• In a queue push & pop operations can be performed in different ways also,
based on these methods the queue’s are further classified into
– Dequeue
– Priority Queue
• Dequeue (Double ended Queue)
– It allows insertion & deletion at both ends
• Input Restricted
• Output Restricted
– In the I/P restricted dequeue insertion is done at rear end & deletion can be done at
both ends.
– In the O/P restricted dequeue deletion is done at front end & insertion can be done at
both ends.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing Dequeue
• Implementing I/P • Implementing O/P restricted Dequeue
– Display options for push & pop
restricted dequeue – For Push Operation
– Display options for push & • Display option to push (Rear / Front)
• Rear: Push the item by increasing rear
pop
• Front: Push the item by decreasing front
– For Push operation – For pop operations
• Increase rear & place the • Front value must be greater than “0”
otherwise overflow
item
• Delete the item by increasing the front
– For pop operation: value

• Display options to pop

(Rear/Front)
• Rear: Pop the item by
decreasing the rear value
• Front: Pop the item by
increasing front value

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing Dequeue
Implementing I/P restricted dequeue using Array

qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1

Push:
if (rear >= qsize -1) -> Overflow
10
item = 10 , 20
Q [++rear] = item Front=rear= 0
if (rear=0) front=0
Pop: 10 20
Rear : Front=0 rear= 1
if (front == -1|| front > rear) -> Empty
item = Q [rear--]

front :
if (front == -1|| front > rear) -> Empty 10
item = Q [front++] Front=rear= 0

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing Dequeue
Implementing O/P restricted dequeue using Array

qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1

Push:
Rear:
10
if (rear >= qsize -1) -> Overflow
item = 10 , 20 Front=rear= 0
Q [++rear] = item
if (rear=0) front=0 10 20
Front: Front=0 rear= 1
if (front = -1) -> Q[++ front] = item
if (front > 0) Q[--front]=item
Pop:
if (front == -1|| front > rear) -> Empty
item = Q [front++] 10
Front=rear= 0

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
• Implementing Dequeue using link list
– Link list must be circular link list
• Josephs problem
– Let us consider a problem that can be solved using circular list.
– A group of solders surrounded by enemy force. There is no
hope to survive without reinforcement, but there is single horse
available for escape. The solders agree a pact to determine
which of them to escape. They form a circle and a no “n” has
picked. Beginning with the solder whose name is picked they
begin to count clockwise around the circle , when the count
reaches “n” that solder is removed & the count begin again.
Any solder removed from the circle is no longer counted. The
last solder remaining is to take horse & escape.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
• Using Doubly Link List
• If a structure contains two self referential member then it can be
used to construct DLL.
• In a SSL the last node contains Null in its next ref field.
• In a DLL the last node contains Null in its next ref & the first node
contains Null in its previous ref field.
• A SLL is a one way transversal List, in this list starting from any
node you can reach to last node.
• A DLL is two way transversal, in this starting from any node we
can reach to the beginning of end of list.
• If we can reach to the same node by traversing all nodes of the list
then list is having circular reference.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
1 F R T N

2 Push

2a 10 N N 2c 90 N N
b
F
2b 10 N N 2d 10 N a 90 N
R R c
F R

2e 20 N N
b

2f 10 N 90 a 20 N
c
F R
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation

3 Pop from Front

3a 10 N 90 20 N
a
b
T F R

3b 10 N 90 N 20 N
c
T F R

3c 10 N 90 N 20 N
c
T F R
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation

4 Pop from Rear (End)

4a 90 N 20 N
a
F R T

4b 90 N 20 N
a
b
F R T

4c 90 N N 20 N
c a
F R T

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Priority Queue

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Simple Queues
• Linked lists provide
– LIFO
– FIFO
semantics
– Constant ( O(1) ) addition and deletion
? What if items in the queue have an order
– Usually termed a priority
– We must sort the items so that
the highest ( lowest ) priority item is removed first

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Priority Queues
• Items have some ordering relation
– It doesn’t matter much what it is
– As long as there’s some way to define order
• Maintaining order
– Items are added and deleted continuously
– Tree structure
• Mostly O(log n) behaviour
– but can become unbalanced
O(n) behaviour
 Not acceptable in a life-critical system!!
Disastrous if your safety estimate assumed O(log n)!!

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• A symbol table is a set of name value pairs which
contain symbol & their values or addresses
• In any language or package it perform
– Processing of data
– Maintenance of identifies tables, message tables & special
tables
• Operations on symbol tables
– Constructing symbols tables
– Searching in Symbol tables
– Insertion/Deletion of symbols in or from symbol tables

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Symbols tables can be represented by
– Tree structure
– Arrays
• Tree structure used to represent symbol tables are
Binary Search Trees (BST) & Fibo Search Trees
with perfectly height balancing.
• Classification of Symbol Tables
– Static Symbol Tables
– Dynamic Symbol tables

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Static Symbol Tables
– These tables does not allows insertion and deletion of
symbols once the table have been constructed
– The scope of the symbol which are in a static table is
thought the program
– Ex: COBOL Language Environment, C & PASCAL
• Dynamic Symbol Tables
– These tables allows insertion and deletion of symbols
on the tables while execution
– Ex: BASIC, C++ & FORPRO
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Hashtable
– Array representation of symbol table is known as
hash table.
– These are used to provide random access to key
elements or records which are on external storage
media.
– Also used for internal storage purpose
– All symbol tables are memory based tables
– In the Hashtable the table contains so & so number of
buckets (Rows) which specifies no of items.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Hashtable
– The Hash no, of the item can be calculated through
user defined routines.
– This hash number can be used to provide index to the
item.
– Depending on the size of the table, type of the table &
method of calculating the hash no the hash tables are
classified into
• Closed hash table (Open addressing)
• Open hash table (Separate chaining or unlinked chaining)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Closed hash table (Open addressing)
– Closed hash table is linear array which contain either
values or addresses.
– While insertion, the hash number can be calculated
from the key value by using some user defined hash
function as hash ref.
– The address of the value can be placed in the table by
using the generated hash number as subscript.
– In general hash number must be unique.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Closed hash table (Open addressing)
– Hash Collision: In some cases there might be
possibility of getting the same ref which is know as
hash collision.
– Hash Collision can be occurred when the
corresponding cell referred by hash number is not
empty cell in the hash table.
– When hash collision occurs we have to place the value
or the address of the identifier in the next available
cell.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Closed hash table (Open addressing)
– In resolving the hash collision fallowing probing
(methods) are used
• Linear probing
• Quadratic probing
• Double hashing
• Rehashing

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Linear Probing
– The searching for next available cell is one after the other &
the table must be circular table.
– The formal function is f(i)=i+1
– It is advantageous method in finding cell
– But disadvantageous because it requires no of comparisons.
• Quadratic Probing
– The cell to be checked for availability is based on the formula
f(i)=i2
– Main disadvantage is in some cases we may not find empty cell
even though cells are empty at different positions.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Double hashing
– The hash value is doubled to find out the next available cell.
– The efficiency can be achieved by considering the table size as
prime no.
– The formal function is f(i)=2i
• Rehashing
– A series of host function can be executed to find out the next
available cell.
– The main disadvantage is we may not access a key value
directly because it may not be in the calculated cell.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Open hash Table

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• Every language environment should provide facility for
reserving memory to handle the program data and
reserving the space depends on the large environment &
scope of the variables.
• Some language environment provide facility to define
the intermediate variables & allocation of memory at
runtime (Dynamic memory allocation).
• There are two types of methods of allocating memory
– Sequential allocation (Fixed block allocation)
– Dynamic allocation (Varying length block allocation)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• Sequential memory allocation:
– System automatically allocates memory to variables
sequentially (Continuous allocation).
– It should not allow allocation of memory at runtime
– Ex: COBOL Language
• Dynamic memory allocation:
– Allocation of memory is possible through system
routines or through user defined functions by
specifying the size of memory to be allocated.
– Ex: Allocating memory to pointers at runtime
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
– The dynamic memory allocation technique can be used to
allocate memory to a pointer which indicates the starting
address of the list.
– This pointer is known as external pointer, and the pointer
which point to next node is known as internal pointer.
– Allocation of memory to nodes can be performed by
considering the whole available memory as single block we
need
• A pointer which address the starting address of the free memory
• Variable which represent the total size of memory that can be used for
data.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
– Lets consider
• pointer p refers the starting address of free memory
• m is the max size of the block
• n is the size of the requested block for allocation
– Allocation can be done by fallowing routine
If ( p + n < m )
{
var = p ;
p=p+n;
}
else
var = NULL;

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• P: Pointer to free Memory Free Space (1024)
• Request B1 =150
• Request B2 =200 150 Free
• Request B3 =100 150 200 Free
• Request B4 =175
• Request B5 =275 150 200 100 Free
• Total (900 ) 150 200 100 175 Free
• Request B6 =150 will return
NULL 150 200 100 175 275 Free(124)
• Free Block B1 & B3
• Request B6 =150 will still
return NULL because total Free 200 Free 175 275 Free
free memory (374) is greater
& requested (150) but it is
fragmented
• This can be solved using
Memory compaction

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• Memory Compaction:
– It is the process of de-fragmenting the allocated

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables
• All search structures so far
– Relied on a comparison operation
– Performance O(n) or O( log n)
• Assume I have a function
– f ( key )  integer
ie one that maps a key to an integer
• What performance might I expect now?

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Structure
• Simplest case:
– Assume items have integer keys in the range 1 .. m
– Use the value of the key itself
to select a slot in a
direct access table
in which to store the item
– To search for an item with key, k,
just look in slot k
• If there’s an item there,
you’ve found it
• If the tag is 0, it’s missing.
– Constant time, O(1)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Constraints
• Constraints
– Keys must be unique
– Keys must lie in a small range
– For storage efficiency,
keys must be dense in the range
– If they’re sparse (lots of gaps between values),
a lot of space is used to obtain speed
• Space for speed trade-off

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Relaxing the constraints
• Keys must be unique
– Construct a linked list of duplicates
“attached” to each slot
– If a search can be satisfied
by any item with key, k,
performance is still O(1)
but
– If the item has some
other distinguishing feature
which must be matched,
we get O(nmax)
where nmax is the largest number
of duplicates - or length of the longest chain

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Relaxing the constraints
• Keys are integers
– Need a hash function
h( key )  integer
ie one that maps a key to
an integer
– Applying this function to the
key produces an address
– If h maps each key to a unique
integer in the range 0 .. m-1
then search is O(1)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Hash functions
• Form of the hash function
– Example - using an n-character key
int hash( char *s, int n ) {
int sum = 0;
while( n-- ) sum = sum + *s++;
return sum % 256;
}
returns a value in 0 .. 255
– xor function is also commonly used
sum = sum ^ *s++;
– But any function that generates integers in 0..m-1 for some suitable (not
too large) m will do
– As long as the hash function itself is O(1) !

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collisions
• Hash function
– With this hash function
int hash( char *s, int n ) {
int sum = 0;
while( n-- ) sum = sum + *s++;
return sum % 256;
}
– hash( “AB”, 2 ) and
hash( “BA”, 2 )
return the same value!
– This is called a collision
– A variety of techniques are used for resolving collisions

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision handling
• Collisions
– Occur when the hash function maps
two different keys to the same address
– The table must be able to recognise and resolve this
– Recognise
• Store the actual key with the item in the hash table
• Compute the address
– k = h( key )
• Check for a hit
– if ( table[k].key == key ) then hit
else try next entry
– Resolution
• Variety of techniques We’ll look at various
“try next entry” schemes

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Linked lists
• Collisions - Resolution
Linked list attached
to each primary table slot
• h(i) == h(i1)
• h(k) == h(k1) == h(k2)
– Searching for i1
• Calculate h(i1)
• Item in table, i,
doesn’t match
• Follow linked list to i1
– If NULL found,
key isn’t in table

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Overflow area
 Overflow area
• Linked list constructed
in special area of table
called overflow area
– h(k) == h(j)
– k stored first
– Adding j
• Calculate h(j)
• Find k
• Get first slot in overflow area
• Put j in it
• k’s pointer points to this slot
– Searching - same as linked list

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hashing
Use a second hash function
• Many variations
• General term: re-hashing
– h(k) == h(j)
– k stored first
– Adding j
• Calculate h(j)
• Find k
• Repeat until we find an empty slot
– Calculate h’(j) h’(x) -
• Put j in it second hash function
– Searching - Use h(x), then h’(x)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hash functions
The re-hash function
• Many variations
– Linear probing
• h’(x) is +1
• Go to the next slot
until you find one empty

– Can lead to bad clustering

– Re-hash keys fill in gaps
between other keys and exacerbate
the collision problem

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hash functions
The re-hash function
• Many variations
– Quadratic probing
• h’(x) is c i2 on the ith probe
• Avoids primary clustering
• Secondary clustering occurs
– All keys which collide on h(x) follow the same sequence
– First
» a = h(j) = h(k)
– Then a + c, a + 4c, a + 9c, ....
– Secondary clustering generally less of a problem

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision Resolution Summary
• Chaining
+ Unlimited number of elements
+ Unlimited number of collisions
- Overhead of multiple linked lists
• Re-hashing
+ Fast re-hashing
+ Fast access through use of main table space
- Maximum number of elements must be known
- Multiple collisions become probable
• Overflow area
+ Fast access
+ Collisions don't use primary table space
- Two parameters which govern performance need to be estimated

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision Resolution Summary
• Re-hashing
+ Fast re-hashing
+ Fast access through use of main table space
- Maximum number of elements must be known
- Multiple collisions become probable
• Overflow area
+ Fast access
+ Collisions don't use primary table space
- Two parameters which govern performance need to be
estimated

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Summary so far ...
• Potential O(1) search time
– If a suitable function h(key)  integer can be found
• Space for speed trade-off
– “Full” hash tables don’t work (more later!)
• Collisions
– Inevitable
• Hash function reduces amount of information in key
– Various resolution strategies
• Linked lists
• Overflow areas
• Re-hash functions
– Linear probing h’ is +1
– Quadratic probing h’ is +ci2
– Any other hash function!
» or even sequence of functions!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Choosing the Hash Function
• “Almost any function will do”
– But some functions are definitely better than others!
• Key criterion
– Minimum number of collisions
• Keeps chains short
• Maintains O(1) average

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Choosing the Hash Function
• Uniform hashing
– Ideal hash function
• P(k) = probability that a key, k, occurs
• If there are m slots in our hash table,
• a uniform hashing function, h(k), would ensure:
S P(k) = S P(k) = .... S P(k) = 1
k | h(k) = 0 k | h(k) = 1 k | h(k) = m-1
m
Read as sum over all k such that h(k) = 0

• or, in plain English,

• the number of keys that map to each slot is equal

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - A Uniform Hash Function
• If the keys are integers
randomly distributed in [ 0 , r ), Read as 0  k < r

then mk
h(k) = r

is a uniform hash function

• Most hashing functions can be made to map the keys
to [ 0 , r ) for some r
– eg adding the ASCII codes for characters mod 255
will give values in [ 0, 256 ) or [ 0, 255 ]
– Replace + by xor
same range without the mod operation

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
• We’ve mapped the keys to a range of integers
0k<r
• Now we must reduce this range to [ 0, m )
where m is a reasonable size for the hash table

• Strategies
Division - use a mod function
Multiplication
Universal hashing

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
Division
• Use a mod function
h(k) = k mod m
– Choice of m?
• Powers of 2 are generally not good! k mod 28 selects these bits
h(k) = k mod 2n
selects last n bits of k 0110010111000011010

– All combinations are not generally equally likely

– Prime numbers close to 2n seem to be good choices

eg want ~4000 entry table, choose m = 4093

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
 Multiplication method
• Multiply the key by constant, A, 0 < A < 1
• Extract the fractional part of the product
( kA - kA)
• Multiply this by m
h(k) = m * ( kA - kA)
– Now m is not critical and a power of 2 can be chosen
– So this procedure is fast on a typical digital computer
• Set m = 2p
• Multiply k (w bits) by A•2w 2w bit product
• Extract p most significant bits of lower half
• A = ½(5 -1) seems to be a good choice (see Knuth)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
 Universal Hashing
• A determined “adversary” can always find a set of data that will defeat any
hash function
• Hash all keys to same slot  O(n) search
– Select the hash function randomly (at run time)
from a set of hash functions
Reduced probability of poor performance
– Set of functions, H, which map keys to [ 0, m )
– H, is universal, if for each pair of keys, x and y,
the number of functions, h H,
for which h(x) = h(y) is |H |/m

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to ( 0, m ]
Universal Hashing
• A determined “adversary” can always find a set of data that
will defeat any hash function
• Hash all keys to same slot  O(n) search
– Select the hash function randomly (at run time)
from a set of hash functions
– ---------
– Functions are selected at run time
• Each run can give different results
• Even with the same data
• Good average performance obtainable

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to ( 0, m ]
 Universal Hashing
• Can we design a set of universal hash functions?
• Quite easily
• Key, x = x0, x1, x2, ...., xr x0 x1 x2 .... xr
• Choose a = <a0, a1, a2, ...., ar> n-bit “bytes” of x
a is a sequence of elements
chosen randomly from { 0, m-1 }
• ha(x) = S aixi mod m
• There are mr+1 sequences a,
so there are mr+1 functions, ha(x)
Proof:
• Theorem
See Cormen
• The ha form a set of universal hash functions

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Collision Frequency
• Birthdays or the von Mises paradox
– There are 365 days in a normal year
Birthdays on the same day unlikely?
– How many people do I need
before “it’s an even bet”
(ie the probability is > 50%)
that two have the same birthday?
– View
• the days of the year as the slots in a hash table
• the “birthday function” as mapping people to slots
– Answering von Mises’ question answers the question about the
probability of collisions in a hash table

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Distinct Birthdays
• Let Q(n) = probability that n people have distinct
birthdays
• Q(1) = 1
• With two people, the 2nd has only 364 “free” birthdays
364
Q(2) = Q(1) *
365

• The 3rd has only 363, and so on:

364 364 365-n+1
Q(n) = Q(1) * * *…*
365 365 365

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Coincident Birthdays
• Probability of having two identical birthdays
• P(n) = 1 - Q(n) 1.000
0.900
0.800
• P(23) = 0.507 0.700
0.600
0.500
0.400
0.300
• With 23 entries, 0.200
0.100

table is only 0.000

0 20 40 60 80

23/365 = 6.3%
full!

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Load factor
• Collisions are very probable!
• Table load factor n n = number of items
 m = number of slots
m

must be kept low

• Detailed analyses of the average chain length
(or number of comparisons/search) are available
• Separate chaining
– linked lists attached to each slot
gives best performance
– but uses more space!

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - General Design
 Choose the table size
• Large tables reduce the probability of collisions!
• Table size, m
• n items
• Collision probability  = n / m
 Choose a table organisation
• Does the collection keep growing?
• Linked lists (....... but consider a tree!)
• Size relatively static?
• Overflow area or
• Re-hash
 Choose a hash function ....

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - General Design
 Choose a hash function
• A simple (and fast) one may well be fine ...
• Read your text for some ideas!
 Check the hash function against your data
 Fixed data
• Try various h, m
until the maximum collision chain is acceptable
Known performance
 Changing data
• Choose some representative data
• Try various h, m until collision chain is OK
Usually predictable performance

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Review
• If you can meet the constraints
+ Hash Tables will generally give good performance
+ O(1) search
• Like radix sort,
they rely on calculating an address from a key
• But, unlike radix sort,
relatively easy to get good performance
• with a little experimentation
 not advisable for unknown data
• collection size relatively static
• memory management is actually simpler
• All memory is pre-allocated!

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees
• It represent the list of items in the bottom up tree fashion.
• Every item can be represented as NODE in the tree.
• The NODE which is at tope is called as root node.
• The nodes which are connected to root node are called as Sub root
node (Sub trees) or leaf node.
• A node which does not contain any sub node are called leaf node.
• A node which contains sub nodes are called as non leaf node. And
also referred as sub leaf node.
• In the father child relation the root node can be ref as father
(parents) and the sub nodes which are directly connected to the
father are called as children.
• The children of same fathers are called as siblings.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees
• Root nodes: A A
• Leaf Nodes: C,E,G,H,I,J,K
• Non Leaf Nodes : A,B,D,F
• Siblings
– B,C,D B C D
– E,F
– G,H,I
– J,K E F G H I
• Children of A: B,C,D
• Children of B: EF
• Children of D: G,H,I J K
• Children of F: J,K
• Ansister to J & K : F,B,A
• Ansister to G,H,I : D,A
• Order of Tree : 3

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees
• Order of tree refers max no of nodes we can connect to any node
of the tree. (Above tree: 3)
• The degree of node specifies the no of active connections (nodes).
• There is no restriction on the order of general tree.
• Based on the implementation we have to define the restrictions
– The degree of node A is 3 & D is 3
– The degree of node B & F are 2
• Depth of the tree: If the tree is referred with level structure then
the level no start with 0 at root & increment by 1 towards
descendence (downwards)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• If the order of the tree is two A
then that tree can be referred
as Binary tree.
• In BT any non leaf node can B C
have only 2 sub nodes.
• The first node which is at
top is root node. D E F G
• First sub node is known as
Left Son (Left sub tree) H I J
• And the second node is
known as Right son (Right
sub node)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Complete Trees
• A binary tree is completely full if
– it has height, h, and
– it has 2h+1-1 nodes
• A binary tree of height, h, is complete iff
– it is empty or
– its left subtree is complete
of height h-1 and
its right subtree is completely full
of height h-2
or
– its left subtree is completely full
of height h-1 and
its right subtree is complete
of height h-1

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Complete Trees
• If we examine the examples, we see that a complete tree is “filled
in” from the left



Order for nodes to be added

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Method of transversal A
– Level Order transversal (LOT)
– Pre order transversal (POT)
– In order transversal (IOT) B C
– Post order transversal (PtOT)

D E F G

H I J

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Level Order transversal A
– In this method NODES
can be transverse level by
level starting from root B C
node
– Before transverse the
nodes which are at level n D E F G
the control must transverse
all the node which are H I J
level n-1
– A, B,C, D,E,F,G, H,I,J

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Pre Order transversal +
– Nodes can be transverse from root-left-right
– Ex: +AB
– A,B,D,E,H,I,C,F,J,G A B
• In order transversal
– Nodes can be transverse from left-root-right +
– Ex: A+B
– D,B,H,E,I,A,J,F,C,G
– In BST the data must be transverse in the ascending A B
order
• Post order transversal
– Nodes can be transverse from left-right-root +
– AB+
– D,H,I,E,B,J,F,G,C,A A B
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
• Traversal = visiting every node of a tree
• Three basic alternatives
Pre-order 
– Root 
– Left sub-tree
– Right sub-tree


x A +x+BC xDE F

L R
L R L

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
• Traversal = visiting every node of a tree
• Three basic alternatives
 In-order
– Left sub-tree
– Root  
– Right sub-tree
 11
 
Ax B+C xDxE +F  
 
L
L R

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
• Traversal = visiting every node of a tree
• Three basic alternatives
 Post-order
11
– Left sub-tree
– Right sub-tree  
– Root


A B C+ D Exx F+x  

L   
L R

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
 Post-order 11
– Left sub-tree
 
– Right sub-tree
– Root 

 Reverse-Polish  
(A (((BC+)(DEx) x) F +)x )
  
• Normal algebraic form
(A x(((B+C)(DxE))+F))

= which traversal?

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing BT
– To construct a binary tree we require a self ref structure with
two pointers
• One is to refer to left sub tree
• Other is to refer to right sub tree
– The node which contain NUL in both ref can be refereed as
leaf node
– To insert a node at level n, first we have to fulfill (n-1) level
with nodes.
– The method of constructing a BT is level order construction &
it requires O/P restricted Dequeue.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing BT –Steps
– Make a first node as root node & push the address of first node in to O/P
restricted Dequeue
– For second node on words for each new node
• Pop the address from O/P dequeue
• If left is empty connect the new node as left son & push the popped address at
front side. And push the newly constructed node address at rear
• If the left is not empty, connect the new node as right son & push only the new
node address into the dequeue at rear.

– Representation in Data Structure

typedef struct tree
{ int no;
struct TREE *left;
struct TREE *right;
}TREE;

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
1 H 2 *DEQ[10]
T
5 H
T1 N 10 N Rear=0

Front=0
3 T N 10 N

T
4 N 10 N
H If h is null

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 20 N *DEQ[10]
6 T 7 T1
H
Front=1

10 N Rear=0
8 N 20 N
Front=0

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 20 N *DEQ[10]
6 T 7 T1
H
Front=1
10 N Front=0
N 20 N Rear=1
9

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 30 N *DEQ[10]
10 T 11 T1
H
Front=1
10 12
Front=0
N 20 N N 30 N
Rear=1

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 30 N *DEQ[10]
10 T 11 T1
H
Front=1
10
Front=0
N 20 N
Rear=1 N 30 N
13

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree - LOT
Root Info Left Right
Queue[20] K 0 0
A
C 3 6

G 0 0
B C 14

Root A 10 2

H 17 1
D E G H
L 0 0

Avail 9
F J K
4

B 18 13

L 19

F 0 0

E 12 0

J 7 0

D 0 0

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree - LOT
*DEQ[10]
11 T1
H
Front=1
10
Front=0
N 20 N
Rear=1 N 30 N
13

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• A BT is a finite set of elements that is either empty or partitioned into 3 disjoint subsets.

• The first subset contains a single element called the root of the tree.

• Other two sets are themselves binary trees called left & right sub trees of original tree.

• Each element of a binary tree is called as a node of the tree.

• Where as in the multi-way BT a node contains more than one key value (elements) and
no of key vales of node depends upon the order of the tree.

• The order of BT is two.

• If “A” is the root of a BT & “B” is the root of Left or Right subtree then “A” is said to be
the father of “B” & “B” is said to be left or right son of “A”.

• A node that has no son are called as Leaf Node.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Node N1 is an ancestor of N2 if N1 is either father of N2 or father of some
ancestor of N2.

• Father can be ancestor to its left or right son but ancestor can not be the father.

• A node N2 is left descendent of node N1 if N2 is either the left son of N1 or

descendent of the left son of N1.

• Moving from leaf node to root is called as climbing. Reverse is called as

descending.

• Tree structure can be logically viewed as Bottom up tree.

• Non leaf nodes are called internal nodes & leaf nodes are called external nodes.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• In every non leaf node of a BT, if it A
has non empty left & right subtrees
then it is termed as Strictly binary tree.

• IF “n” is the no of leaf nodes of a SBT

B C
then the no of non leaf nodes must be
equal to (n-1)
D E
• A SBT with “n” leaf nodes always
contains 2(n-1) no of nodes.
F G
– Total no of Nodes : 2(4) - 1 = 7
– (Sum of leaf nodes + sum of non leaf
nodes)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• The Depth of BT is the max level of any A
leaf in the tree. That is the longest path
from root to any leaf node.
• A SBT who’s leaves at level “d” is
complete BT B C
• A SBT may not be CBT but CBT is always
SBT.
• If a BT contains “n” nodes at level “l” then D E F G
it contains at most “2n” nodes at level
“l+1”
• Max no of nodes at level l=2l
H I J K L M N O
• If “d” is the depth of the tree and tree is
CBT then the total no of nodes of the tree
are 2d+1 -1
– Total no of Nodes in CBT = 2d+1 -1
– Total no of leaf nodes in CBT = 2d
– Total no of non leaf nodes = 2d -1

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• If there is a Complete Binary Tree (CBT) & “n” is the total no of
nodes of that tree then the depth of tree :
– d=log2(N+1) -1
– But the general formula is d=log2n

• A BT of a depth “d” is an Almost Complete Binary Tree (ACBT)

– If all legs of the tree are at level “d” or at level d-1
– For any node “nd” in the tree with right decedents must be either at level “l”
or at level “l+1”
– A SBT may be ACBT but ACBT may not be SBT
– A fully BT is generally CBT.
• A S.B.T may not be Fully BT but C.B.T is a Fully BT.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree

B.T S.B.T A.C.B.T A.C.B.T Not A.C.B.T

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• With 2 nodes we can construct 2 diff type of B.T
• With 3 nodes we can construct 5 diff type of B.T

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Height balancing of a S.B.T which contains duplicate values may
be not possible at some instances.
• Level order transversal is also called as Breadth first Search (BFS)
• Pre order transversal is also called as depth first search (DFS)
• In order transversal is also called as symmetric order
• Non recursive functions without using stacks requires either father
field or thread field.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing a B.S Array which 0 75
represent a B.S.T
– Consider a initial values of the array
are Zeros which represent 1 2
availability of cells. 65 85
– Q=0
4 5 6
– If the node number is q then its 3
55 70 80 95
• Left son : 2q+1
• Right son: 2q+2
– Node no= 10
• 2*10+1 = 21 105
• 2*10+2 = 22

75 65 85 55 70 80 95 . . . . . . . 105
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Deleting a node from B.T
– While deleting in BT, the node which has to be replace that node position
must be the in order successor of the node that is to be deleted.
– First identify node in the tree
– Check the leaf node if yes, free that node by making its parent node ref as
Null else
• IF node is not having any right sub tree then
– move the left son into that position & free the node.
• If node is having only a single right son or having a right sub tree with single
node then
– move the right son into deleted position & free the node.
• If the right son contains left sub tree then
– place the left most node of the right son at the deleted position & free the node.
• Note: If node is deleted from the BT, SBT its inorder should not
change

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Delete node I (30) 100
– It is leaf node
– D->right = Null & free (I) A
• Delete node H (5) 50 200
– It is leaf node
– D->left = Null & free (D)
B C
• Delete node P (350)
25 60 150 300
– It is non leaf node not having right son
– M->left = R & free (P) D E F G
• Delete node R (325)
5 30 125 175 250 400
– It is non leaf node having right sub tree
without having left sub tree H I J K L M
– P->left = S & free (R)
• Delete node G (300) – Need to clear 160 190 350 500
300
– It is non leaf node having right subtree with
left subtree attached to that. G N O P Q
– C->Right = R
– P->left = S 325
250 400
– R->left = g-> left R
L M
– R->right = g-> right 340
– Free (g) S
P 340 500 Q

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Finding no of nodes in B.T (Recursive/Non Recursive)

int nc =0
void nodeCount (TREE *head)
{
if (!head)
Return;
nc++;
NodeCount (head->left);
NodeCount (head->right);
}

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Finding depth of the Tree

int depth =0
void TreeDepth (TREE *head , int level)
{
if (!head)
Return;
If ( level > depth )
depth = level;
nc++;
TreeDepth (head->left , level+1 );
TreeDepth (head->right , level+1 );
}

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• These trees are also known as
height balanced tree. 70,80,60,50,90,100,40,30,20,110,120
• The concept of AVL tree is to
improve the efficiency of BST
in minimizing the no of 70
comparisons required for
searching.
60 80
• While constructing a BST
based on the values the tree
may not be constructed in 50 90
proper way to satisfy the BS
property which requires log2 N 40
+1 comparison in worst case. 100

• This tree structure violates the

Binary search property while 30 110
performing searching either for
insertion or for deletion.
20 120
• In such case the tree requires
height balancing .

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Particular Nodes can be evaluated through the fallowing formula.
– Height diff = height of left Sub tree – height of right sub tree
• If the height diff is -1 , 0 or 1 then the height balancing is not
required at a particular node & it can be performed by rotating
nodes.
• While performing rotation the in order property should not be
changed
• Based on the height diff the rotations are classified into
– Left Rotation
– Right Rotation
• If the height diff is < -1 then the rotation should be left rotation.
• If the height diff is > +1 then the rotation must be right rotation.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Based on the node values again rotation is classified into
– Single Right Rotation
– Single Left Rotation
– Double Left Right Rotation
– Double Right Left Rotation

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Single Right Rotation
– The height diff must be >+1
– Value of node must satisfy the following property A>B>C
– A : Node where rotation will require
– B,C: Descendents of A

B
A
70 60

B 60 50 70

C C A
50

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
x 70
• Before Rotation • After Rotation x
A
– X -> Left = A – X -> Left = B 60 80
– A -> Left = B – A -> Left =Y
b 50
– B -> Left = C – B -> Left = C 65 75 90

– B -> Right = Y – B -> Right = A c

40 55
– A -> Father = X – A -> Father = B Y
– B -> Father = A – B -> Father = X 30
– Y -> Father = B – Y -> Father = A
x 70

B
50 80

C A
40 60 75 90

30 55 65
Y
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Single Left Rotation
– The height diff must be < -1
– Value of node must satisfy the following property A<B<C
– A : Node where rotation will require
– B Should become the sub root
– A Should become the left son & C remains as right son

B
A
70 80

B 80 70 90

A C
90
C

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
x 70
• Before Rotation • After Rotation
– X -> Right = A – X -> Right = B 60 80 A
– A -> Right = B – A -> Right =Y
– B -> Right = C – B -> Right = C 50 65 75 90 b

– B -> Left = Y – B -> Left = A 100 c

85
– Y -> Father = B – A -> Father = B Y
– B -> Father = A – B -> Father = X 110

– A -> Father = x – C -> Father = B x 70

– C -> Father = B
B
50 90

A C
40 60 80 100

75 85 110
Y
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL and other balanced trees
• AVL Trees
– First balanced tree algorithm
– Discoverers: Adelson-Velskii and Landis
• Properties
– Binary tree
– Height of left and right-subtrees differ by at most 1
– Subtrees are AVL trees

AVL Tree AVL Tree

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL trees - Height
• Theorem
– An AVL tree of height h has at least Fh+3+1 nodes
• Proof
– Let Sh be the size of the smallest AVL tree of height h
– Clearly, S0 = 1 and S1 = 2

– Also, Sh = Sh-1 + Sh-2 + 1

– A minimum height tree must be
composed of min height trees
differing in height by at most 1
– By induction ..
• Sh = Fh+3+1

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rebalancing
• Insertion leads to non-AVL tree
– 4 cases

1 2 3 4

– 1 and 4 are mirror images

– 2 and 3 are mirror images

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rebalancing
• Case 1 solved by rotation

– Case 4 is the mirror image rotation

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rebalancing
• Case 2 needs a double rotation

– Case 3 is the mirror image rotation

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Data Structures
• AVL trees can be implemented with a flag to indicate the balance state

typedef enum { LeftHeavy, Balanced, RightHeavy }

BalanceFactor;
struct AVL_node {
BalanceFactor bf;
void *item;
struct AVL_node *left, *right;
}
• Insertion
• Insert a new node (as any binary tree)
• Work up the tree re-balancing as necessary to restore
the AVL property

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dynamic Trees - Red-Black or AVL
• Insertion
– AVL : two passes through the tree
• Down to insert the node
• Up to re-balance
– Red-Black : two passes through the tree
• Down to insert the node
• Up to re-balance
but Red-Black is more popular??

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Forest
• An ordered set of trees forms a forest
• Ordered tree must satisfy fallowing criteria's
– The Preorder transversal must be same.
– The postorder transversal of the tree must be same as the
inorder transversal of the ordered tree.
• After constructing ordered trees. If we connect them in a
proper way that can be represented as Forest

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Forest
• Converting Binary Tree to
Ordered Tree A A
– Right son of the parent should
become the left descendant (ie,
Right son should connect to left B C B
son as Right son)
– In this process preorder property C
must not change
– Similarly for a general tree the son
which are in brother relation should
be represented as right descendents
to first son.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Searching - Re-visited
• Binary tree O(log n) if it stays balanced
– Simple binary tree good for static collections
– Low (preferably zero) frequency of
insertions/deletions
but my collection keeps changing!
– It’s dynamic
– Need to keep the tree balanced
• First, examine some basic tree operations
– Useful in several ways!

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Searching
• Binary search tree
– Produces a sorted list by in-order traversal

• In order: ADE G HK L M N OP T V

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Searching
• Binary search tree
– Preserving the order
– Observe that this transformation preserves the
search tree

• We’ve performed a rotation of the sub-tree

about the T and O nodes

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rotations
• Binary search tree
– Rotations can be either left- or right-rotations

– For both trees: the inorder traversal is

AxByC

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rotations
• Binary search tree
– Rotations can be either left- or right-rotations

– Note that in this rotation, it was necessary to move

B from the right child of x to the left child of y

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Binary search tree
– Each node is “coloured” red or black

– An ordinary binary search tree with node colourings

to make a red-black tree
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Every node is RED or
BLACK
– Every leaf is BLACK

Sentinel nodes (black)

When you examine

rb-tree code, you will
see sentinel nodes (black)
added as the leaves.
They contain no data.

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Every node is RED or BLACK
– Every leaf is BLACK
– If a node is RED,
then both children
are BLACK

This implies that no path

may have two adjacent
RED nodes.
(But any number of BLACK
nodes may be adjacent.)

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Every node is RED or BLACK
– Every leaf is BLACK
– If a node is RED,
then both children
are BLACK
– Every path
from a node to a leaf
contains the same number
of BLACK nodes
From the root,
there are 3 BLACK nodes
on every path

The length of this path is the

black height of the tree

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• Lemma
A RB-Tree with n nodes has
height  2 log(n+1)
– Proof .. See Cormen

• Essentially,
height  2 black height
• Search time
O( log n )

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• Data structure
– As we’ll see, nodes in red-black trees need to know their parents,
– so we need this data structure

struct t_red_black_node {
enum { red, black } colour;
void *item; Same as a
struct t_red_black_node *left, binary tree
*right, with these two
attributes
*parent; added
}

Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Insertion of a new node
– Requires a re-balance of the tree

Insert node
4
Mark it red