Datastructure C
Datastructure C
By : Anand B
E-mail :
[email protected]
Index
• Searching/Sorting
• Link Lists
– Singly
– Doubly
– Circular
• Queue
• Stacks
• Trees
• Graphs
• Symbol Tables
• Garbage Collection
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Array Limitations
• Arrays
– Simple,
– Fast
but
– Must specify size at construction time
– Murphy’s law
• Construct an array with space for n
– n = twice your estimate of largest collection
• Tomorrow you’ll need n+1
– More flexible system?
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Flexible space use
– Dynamically allocate space for each element as needed
– Include a pointer to the next item
Linked list
– Each node of the list contains
• the data item (an object pointer in our ADT)
• a pointer to the next node
Data Next
object
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Collection structure has a pointer to the list head
– Initially NULL
Collection
Head
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Collection structure has a pointer to the list head
– Initially NULL
• Add first item
– Allocate space for node
– Set its data pointer to object
– Set Next to NULL
– Set Head to point to new node
Collection
node
Head
Data Next
object
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Add second item
– Allocate space for node
– Set its data pointer to object
– Set Next to current Head
– Set Head to point to new node
Collection
Head
node
node
Data Next
Data Next
object2
Anand B
object
[email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Add implementation
struct t_node {
void *item;
struct t_node *next;
} node;
typedef struct t_node *Node;
struct collection {
Node head;
……
};
int AddToCollection( Collection c, void *item ) {
Node new = malloc( sizeof( struct t_node ) );
new->item = item;
new->next = c->head;
c->head = new;
return TRUE;
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Add implementation
struct t_node {
void *item;
Recursive type definition -
struct t_node *next;
C allows it!
} node;
typedef struct t_node *Node;
struct collection {
Node head;
……
};
int AddToCollection( Collection c, void *item ) {
Node new = malloc( sizeof( struct t_node ) );
new->item = item;
new->next = c->head;
c->head = new;
Error checking, asserts
return TRUE;
omitted for clarity!
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Add time
– Constant - independent of n
• Search time
– Worst case - n
Collection
Head
node
node
Data Next
Data Next
object2
object
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Find implementation
• Implementation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Delete implementation
• Implementation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - Delete implementation
• Implementation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Linked Lists - LIFO and FIFO
• Simplest implementation
– Add to head
Last-In-First-Out (LIFO) semantics
• Modifications
– First-In-First-Out (FIFO)
– Keep a tail pointer
head
tail
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• A stack is a data structure used to store and retrieve data.
• The stack supports two operations push and pop.
• The push operation places data on the stack and the pop operation
retrieves the data from the stack.
• The order in which data is retrieved from the stack determines the
classification of the stack.
– A FIFO (First In First Out) stack retrieves data placed on the stack first.
– A LIFO (Last In First Out) stack retrieves data placed on the stack last.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Stacks are a special form of collection with LIFO semantics
• Two methods
– int push( Stack s, void *item );
- add item to the top of the stack
– void *pop( Stack s );
- remove an item from the top of the stack
• Like a plate stacker
• Other methods
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks - Implementation
• Arrays
– Provide a stack capacity to the constructor
– Flexibility limited but matches many real uses
• Capacity limited by some constraint
– Memory in your computer
– Size of the plate stacker, etc
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks - Implementation
tail
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks - Implementation
• Arrays common
– Provide a stack capacity to the constructor
– Flexibility limited but matches many real uses
• Stack created with limited capacity
struct t_node
{ prev is optional!
void *item;
struct t_node *prev,
*next;
} node;
typedef struct t_node *Node;
struct collection
{
head prev prev prev
Node head, tail;
};
tail
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stack Frames - Functions in HLL
• Program
function f( int x, int y) {
int a;
if ( term_cond ) return …;
a = ….;
return g( a );
}
function g( int z ) {
int p, q;
p = …. ; q = …. ;
return f(p,q);
}
Context
for execution of f
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Application of Stacks
– The stacks can be utilize to evaluate mathematical
expressions
– These can be used to write non recursive programs to
avoid recursion
• Expression Evaluation
– Based on the presence of mathematical operator in the
expression, Expressions are classified into
• Infix
• Postfix
• Prefix
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Infix
– Mathematical operator is preceded & succeeded by operands
– Ex: A+B
• Postfix
– Operands are succeeded by Mathematical operator
– Ex: AB+
• Prefix
– Operands are preceded by Mathematical operator
– Ex: +AB
• Note:
– Postfix & Prefix expressions are also called as polish expressions.
– Postfix & Prefix expressions are parentheses less expressions.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
• Converting Infix to Postfix (Single digit constants)
• The infix expression must be entered as string
• Extract character one by one until end of the string & perform the fallowing
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
Infix Expression :A+ B * C Infix
Postfix Exp :AB C* +
A + B * C \n
A B C * +
1 3 5 6a 6b
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Stacks
Infix Expression :A+ B * C + D Infix
Postfix Exp :AB C* + D + A + B * C + D \n
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Queue
• The queue is another data structure.
• A physical analogy for a queue is a line at a bank. When you go to the bank,
customers go to the rear (end) of the line and customers come off of the line
(i.e., are serviced) from the front of the line.
• Like a stack, a queue usually holds things of the same type.
• The main property of a queue is that objects go on the rear and come off of the
front of the queue.
A B C Add D to Queue D A B C D
A B C D Delete Item B C D
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing queue
Implementing queue using Array
qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1
Push: 10
if (rear >= qsize -1) -> Overflow
Front=rear= 0
item = 10 , 20 , 90
Q [++rear] = item 10 20
if (rear=0) front=0
else if (rear= qsize) rear = 0 Front=0 rear= 1
10 20 90
Pop:
if (front == -1|| front > rear) -> Empty Front=0 rear= 2
item = Q [front++]
if (front = qsize ) front= 0
20 30
Front=1 rear=2
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Queue
• In a normal queue insertion operation can be performed at one end (rear end)
• And deletion operation can be performed at another end (front end)
• In a queue push & pop operations can be performed in different ways also,
based on these methods the queue’s are further classified into
– Dequeue
– Priority Queue
• Dequeue (Double ended Queue)
– It allows insertion & deletion at both ends
• Input Restricted
• Output Restricted
– In the I/P restricted dequeue insertion is done at rear end & deletion can be done at
both ends.
– In the O/P restricted dequeue deletion is done at front end & insertion can be done at
both ends.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing Dequeue
• Implementing I/P • Implementing O/P restricted Dequeue
– Display options for push & pop
restricted dequeue – For Push Operation
– Display options for push & • Display option to push (Rear / Front)
• Rear: Push the item by increasing rear
pop
• Front: Push the item by decreasing front
– For Push operation – For pop operations
• Increase rear & place the • Front value must be greater than “0”
otherwise overflow
item
• Delete the item by increasing the front
– For pop operation: value
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing Dequeue
Implementing I/P restricted dequeue using Array
qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1
Push:
if (rear >= qsize -1) -> Overflow
10
item = 10 , 20
Q [++rear] = item Front=rear= 0
if (rear=0) front=0
Pop: 10 20
Rear : Front=0 rear= 1
if (front == -1|| front > rear) -> Empty
item = Q [rear--]
front :
if (front == -1|| front > rear) -> Empty 10
item = Q [front++] Front=rear= 0
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Implementing Dequeue
Implementing O/P restricted dequeue using Array
qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1
Push:
Rear:
10
if (rear >= qsize -1) -> Overflow
item = 10 , 20 Front=rear= 0
Q [++rear] = item
if (rear=0) front=0 10 20
Front: Front=0 rear= 1
if (front = -1) -> Q[++ front] = item
if (front > 0) Q[--front]=item
Pop:
if (front == -1|| front > rear) -> Empty
item = Q [front++] 10
Front=rear= 0
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
• Implementing Dequeue using link list
– Link list must be circular link list
• Josephs problem
– Let us consider a problem that can be solved using circular list.
– A group of solders surrounded by enemy force. There is no
hope to survive without reinforcement, but there is single horse
available for escape. The solders agree a pact to determine
which of them to escape. They form a circle and a no “n” has
picked. Beginning with the solder whose name is picked they
begin to count clockwise around the circle , when the count
reaches “n” that solder is removed & the count begin again.
Any solder removed from the circle is no longer counted. The
last solder remaining is to take horse & escape.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
• Using Doubly Link List
• If a structure contains two self referential member then it can be
used to construct DLL.
• In a SSL the last node contains Null in its next ref field.
• In a DLL the last node contains Null in its next ref & the first node
contains Null in its previous ref field.
• A SLL is a one way transversal List, in this list starting from any
node you can reach to last node.
• A DLL is two way transversal, in this starting from any node we
can reach to the beginning of end of list.
• If we can reach to the same node by traversing all nodes of the list
then list is having circular reference.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
1 F R T N
2 Push
2a 10 N N 2c 90 N N
b
F
2b 10 N N 2d 10 N a 90 N
R R c
F R
2e 20 N N
b
2f 10 N 90 a 20 N
c
F R
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
3a 10 N 90 20 N
a
b
T F R
3b 10 N 90 N 20 N
c
T F R
3c 10 N 90 N 20 N
c
T F R
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
4a 90 N 20 N
a
F R T
4b 90 N 20 N
a
b
F R T
4c 90 N N 20 N
c a
F R T
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Priority Queue
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Simple Queues
• Linked lists provide
– LIFO
– FIFO
semantics
– Constant ( O(1) ) addition and deletion
? What if items in the queue have an order
– Usually termed a priority
– We must sort the items so that
the highest ( lowest ) priority item is removed first
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Priority Queues
• Items have some ordering relation
– It doesn’t matter much what it is
– As long as there’s some way to define order
• Maintaining order
– Items are added and deleted continuously
– Tree structure
• Mostly O(log n) behaviour
– but can become unbalanced
O(n) behaviour
Not acceptable in a life-critical system!!
Disastrous if your safety estimate assumed O(log n)!!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• A symbol table is a set of name value pairs which
contain symbol & their values or addresses
• In any language or package it perform
– Processing of data
– Maintenance of identifies tables, message tables & special
tables
• Operations on symbol tables
– Constructing symbols tables
– Searching in Symbol tables
– Insertion/Deletion of symbols in or from symbol tables
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Symbols tables can be represented by
– Tree structure
– Arrays
• Tree structure used to represent symbol tables are
Binary Search Trees (BST) & Fibo Search Trees
with perfectly height balancing.
• Classification of Symbol Tables
– Static Symbol Tables
– Dynamic Symbol tables
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Static Symbol Tables
– These tables does not allows insertion and deletion of
symbols once the table have been constructed
– The scope of the symbol which are in a static table is
thought the program
– Ex: COBOL Language Environment, C & PASCAL
• Dynamic Symbol Tables
– These tables allows insertion and deletion of symbols
on the tables while execution
– Ex: BASIC, C++ & FORPRO
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Hashtable
– Array representation of symbol table is known as
hash table.
– These are used to provide random access to key
elements or records which are on external storage
media.
– Also used for internal storage purpose
– All symbol tables are memory based tables
– In the Hashtable the table contains so & so number of
buckets (Rows) which specifies no of items.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Hashtable
– The Hash no, of the item can be calculated through
user defined routines.
– This hash number can be used to provide index to the
item.
– Depending on the size of the table, type of the table &
method of calculating the hash no the hash tables are
classified into
• Closed hash table (Open addressing)
• Open hash table (Separate chaining or unlinked chaining)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Closed hash table (Open addressing)
– Closed hash table is linear array which contain either
values or addresses.
– While insertion, the hash number can be calculated
from the key value by using some user defined hash
function as hash ref.
– The address of the value can be placed in the table by
using the generated hash number as subscript.
– In general hash number must be unique.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Closed hash table (Open addressing)
– Hash Collision: In some cases there might be
possibility of getting the same ref which is know as
hash collision.
– Hash Collision can be occurred when the
corresponding cell referred by hash number is not
empty cell in the hash table.
– When hash collision occurs we have to place the value
or the address of the identifier in the next available
cell.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Closed hash table (Open addressing)
– In resolving the hash collision fallowing probing
(methods) are used
• Linear probing
• Quadratic probing
• Double hashing
• Rehashing
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Linear Probing
– The searching for next available cell is one after the other &
the table must be circular table.
– The formal function is f(i)=i+1
– It is advantageous method in finding cell
– But disadvantageous because it requires no of comparisons.
• Quadratic Probing
– The cell to be checked for availability is based on the formula
f(i)=i2
– Main disadvantage is in some cases we may not find empty cell
even though cells are empty at different positions.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Double hashing
– The hash value is doubled to find out the next available cell.
– The efficiency can be achieved by considering the table size as
prime no.
– The formal function is f(i)=2i
• Rehashing
– A series of host function can be executed to find out the next
available cell.
– The main disadvantage is we may not access a key value
directly because it may not be in the calculated cell.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Open hash Table
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• Every language environment should provide facility for
reserving memory to handle the program data and
reserving the space depends on the large environment &
scope of the variables.
• Some language environment provide facility to define
the intermediate variables & allocation of memory at
runtime (Dynamic memory allocation).
• There are two types of methods of allocating memory
– Sequential allocation (Fixed block allocation)
– Dynamic allocation (Varying length block allocation)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• Sequential memory allocation:
– System automatically allocates memory to variables
sequentially (Continuous allocation).
– It should not allow allocation of memory at runtime
– Ex: COBOL Language
• Dynamic memory allocation:
– Allocation of memory is possible through system
routines or through user defined functions by
specifying the size of memory to be allocated.
– Ex: Allocating memory to pointers at runtime
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
– The dynamic memory allocation technique can be used to
allocate memory to a pointer which indicates the starting
address of the list.
– This pointer is known as external pointer, and the pointer
which point to next node is known as internal pointer.
– Allocation of memory to nodes can be performed by
considering the whole available memory as single block we
need
• A pointer which address the starting address of the free memory
• Variable which represent the total size of memory that can be used for
data.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
– Lets consider
• pointer p refers the starting address of free memory
• m is the max size of the block
• n is the size of the requested block for allocation
– Allocation can be done by fallowing routine
If ( p + n < m )
{
var = p ;
p=p+n;
}
else
var = NULL;
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• P: Pointer to free Memory Free Space (1024)
• Request B1 =150
• Request B2 =200 150 Free
• Request B3 =100 150 200 Free
• Request B4 =175
• Request B5 =275 150 200 100 Free
• Total (900 ) 150 200 100 175 Free
• Request B6 =150 will return
NULL 150 200 100 175 275 Free(124)
• Free Block B1 & B3
• Request B6 =150 will still
return NULL because total Free 200 Free 175 275 Free
free memory (374) is greater
& requested (150) but it is
fragmented
• This can be solved using
Memory compaction
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• Memory Compaction:
– It is the process of de-fragmenting the allocated
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables
• All search structures so far
– Relied on a comparison operation
– Performance O(n) or O( log n)
• Assume I have a function
– f ( key ) integer
ie one that maps a key to an integer
• What performance might I expect now?
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Structure
• Simplest case:
– Assume items have integer keys in the range 1 .. m
– Use the value of the key itself
to select a slot in a
direct access table
in which to store the item
– To search for an item with key, k,
just look in slot k
• If there’s an item there,
you’ve found it
• If the tag is 0, it’s missing.
– Constant time, O(1)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Constraints
• Constraints
– Keys must be unique
– Keys must lie in a small range
– For storage efficiency,
keys must be dense in the range
– If they’re sparse (lots of gaps between values),
a lot of space is used to obtain speed
• Space for speed trade-off
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Relaxing the constraints
• Keys must be unique
– Construct a linked list of duplicates
“attached” to each slot
– If a search can be satisfied
by any item with key, k,
performance is still O(1)
but
– If the item has some
other distinguishing feature
which must be matched,
we get O(nmax)
where nmax is the largest number
of duplicates - or length of the longest chain
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Relaxing the constraints
• Keys are integers
– Need a hash function
h( key ) integer
ie one that maps a key to
an integer
– Applying this function to the
key produces an address
– If h maps each key to a unique
integer in the range 0 .. m-1
then search is O(1)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Hash functions
• Form of the hash function
– Example - using an n-character key
int hash( char *s, int n ) {
int sum = 0;
while( n-- ) sum = sum + *s++;
return sum % 256;
}
returns a value in 0 .. 255
– xor function is also commonly used
sum = sum ^ *s++;
– But any function that generates integers in 0..m-1 for some suitable (not
too large) m will do
– As long as the hash function itself is O(1) !
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collisions
• Hash function
– With this hash function
int hash( char *s, int n ) {
int sum = 0;
while( n-- ) sum = sum + *s++;
return sum % 256;
}
– hash( “AB”, 2 ) and
hash( “BA”, 2 )
return the same value!
– This is called a collision
– A variety of techniques are used for resolving collisions
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision handling
• Collisions
– Occur when the hash function maps
two different keys to the same address
– The table must be able to recognise and resolve this
– Recognise
• Store the actual key with the item in the hash table
• Compute the address
– k = h( key )
• Check for a hit
– if ( table[k].key == key ) then hit
else try next entry
– Resolution
• Variety of techniques We’ll look at various
“try next entry” schemes
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Linked lists
• Collisions - Resolution
Linked list attached
to each primary table slot
• h(i) == h(i1)
• h(k) == h(k1) == h(k2)
– Searching for i1
• Calculate h(i1)
• Item in table, i,
doesn’t match
• Follow linked list to i1
– If NULL found,
key isn’t in table
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Overflow area
Overflow area
• Linked list constructed
in special area of table
called overflow area
– h(k) == h(j)
– k stored first
– Adding j
• Calculate h(j)
• Find k
• Get first slot in overflow area
• Put j in it
• k’s pointer points to this slot
– Searching - same as linked list
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hashing
Use a second hash function
• Many variations
• General term: re-hashing
– h(k) == h(j)
– k stored first
– Adding j
• Calculate h(j)
• Find k
• Repeat until we find an empty slot
– Calculate h’(j) h’(x) -
• Put j in it second hash function
– Searching - Use h(x), then h’(x)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hash functions
The re-hash function
• Many variations
– Linear probing
• h’(x) is +1
• Go to the next slot
until you find one empty
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hash functions
The re-hash function
• Many variations
– Quadratic probing
• h’(x) is c i2 on the ith probe
• Avoids primary clustering
• Secondary clustering occurs
– All keys which collide on h(x) follow the same sequence
– First
» a = h(j) = h(k)
– Then a + c, a + 4c, a + 9c, ....
– Secondary clustering generally less of a problem
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision Resolution Summary
• Chaining
+ Unlimited number of elements
+ Unlimited number of collisions
- Overhead of multiple linked lists
• Re-hashing
+ Fast re-hashing
+ Fast access through use of main table space
- Maximum number of elements must be known
- Multiple collisions become probable
• Overflow area
+ Fast access
+ Collisions don't use primary table space
- Two parameters which govern performance need to be estimated
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision Resolution Summary
• Re-hashing
+ Fast re-hashing
+ Fast access through use of main table space
- Maximum number of elements must be known
- Multiple collisions become probable
• Overflow area
+ Fast access
+ Collisions don't use primary table space
- Two parameters which govern performance need to be
estimated
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Summary so far ...
• Potential O(1) search time
– If a suitable function h(key) integer can be found
• Space for speed trade-off
– “Full” hash tables don’t work (more later!)
• Collisions
– Inevitable
• Hash function reduces amount of information in key
– Various resolution strategies
• Linked lists
• Overflow areas
• Re-hash functions
– Linear probing h’ is +1
– Quadratic probing h’ is +ci2
– Any other hash function!
» or even sequence of functions!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Choosing the Hash Function
• “Almost any function will do”
– But some functions are definitely better than others!
• Key criterion
– Minimum number of collisions
• Keeps chains short
• Maintains O(1) average
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Choosing the Hash Function
• Uniform hashing
– Ideal hash function
• P(k) = probability that a key, k, occurs
• If there are m slots in our hash table,
• a uniform hashing function, h(k), would ensure:
S P(k) = S P(k) = .... S P(k) = 1
k | h(k) = 0 k | h(k) = 1 k | h(k) = m-1
m
Read as sum over all k such that h(k) = 0
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - A Uniform Hash Function
• If the keys are integers
randomly distributed in [ 0 , r ), Read as 0 k < r
then mk
h(k) = r
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
• We’ve mapped the keys to a range of integers
0k<r
• Now we must reduce this range to [ 0, m )
where m is a reasonable size for the hash table
• Strategies
Division - use a mod function
Multiplication
Universal hashing
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
Division
• Use a mod function
h(k) = k mod m
– Choice of m?
• Powers of 2 are generally not good! k mod 28 selects these bits
h(k) = k mod 2n
selects last n bits of k 0110010111000011010
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
Multiplication method
• Multiply the key by constant, A, 0 < A < 1
• Extract the fractional part of the product
( kA - kA)
• Multiply this by m
h(k) = m * ( kA - kA)
– Now m is not critical and a power of 2 can be chosen
– So this procedure is fast on a typical digital computer
• Set m = 2p
• Multiply k (w bits) by A•2w 2w bit product
• Extract p most significant bits of lower half
• A = ½(5 -1) seems to be a good choice (see Knuth)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
Universal Hashing
• A determined “adversary” can always find a set of data that will defeat any
hash function
• Hash all keys to same slot O(n) search
– Select the hash function randomly (at run time)
from a set of hash functions
Reduced probability of poor performance
– Set of functions, H, which map keys to [ 0, m )
– H, is universal, if for each pair of keys, x and y,
the number of functions, h H,
for which h(x) = h(y) is |H |/m
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to ( 0, m ]
Universal Hashing
• A determined “adversary” can always find a set of data that
will defeat any hash function
• Hash all keys to same slot O(n) search
– Select the hash function randomly (at run time)
from a set of hash functions
– ---------
– Functions are selected at run time
• Each run can give different results
• Even with the same data
• Good average performance obtainable
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to ( 0, m ]
Universal Hashing
• Can we design a set of universal hash functions?
• Quite easily
• Key, x = x0, x1, x2, ...., xr x0 x1 x2 .... xr
• Choose a = <a0, a1, a2, ...., ar> n-bit “bytes” of x
a is a sequence of elements
chosen randomly from { 0, m-1 }
• ha(x) = S aixi mod m
• There are mr+1 sequences a,
so there are mr+1 functions, ha(x)
Proof:
• Theorem
See Cormen
• The ha form a set of universal hash functions
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Collision Frequency
• Birthdays or the von Mises paradox
– There are 365 days in a normal year
Birthdays on the same day unlikely?
– How many people do I need
before “it’s an even bet”
(ie the probability is > 50%)
that two have the same birthday?
– View
• the days of the year as the slots in a hash table
• the “birthday function” as mapping people to slots
– Answering von Mises’ question answers the question about the
probability of collisions in a hash table
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Distinct Birthdays
• Let Q(n) = probability that n people have distinct
birthdays
• Q(1) = 1
• With two people, the 2nd has only 364 “free” birthdays
364
Q(2) = Q(1) *
365
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Coincident Birthdays
• Probability of having two identical birthdays
• P(n) = 1 - Q(n) 1.000
0.900
0.800
• P(23) = 0.507 0.700
0.600
0.500
0.400
0.300
• With 23 entries, 0.200
0.100
23/365 = 6.3%
full!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Load factor
• Collisions are very probable!
• Table load factor n n = number of items
m = number of slots
m
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - General Design
Choose the table size
• Large tables reduce the probability of collisions!
• Table size, m
• n items
• Collision probability = n / m
Choose a table organisation
• Does the collection keep growing?
• Linked lists (....... but consider a tree!)
• Size relatively static?
• Overflow area or
• Re-hash
Choose a hash function ....
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - General Design
Choose a hash function
• A simple (and fast) one may well be fine ...
• Read your text for some ideas!
Check the hash function against your data
Fixed data
• Try various h, m
until the maximum collision chain is acceptable
Known performance
Changing data
• Choose some representative data
• Try various h, m until collision chain is OK
Usually predictable performance
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Hash Tables - Review
• If you can meet the constraints
+ Hash Tables will generally give good performance
+ O(1) search
• Like radix sort,
they rely on calculating an address from a key
• But, unlike radix sort,
relatively easy to get good performance
• with a little experimentation
not advisable for unknown data
• collection size relatively static
• memory management is actually simpler
• All memory is pre-allocated!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees
• It represent the list of items in the bottom up tree fashion.
• Every item can be represented as NODE in the tree.
• The NODE which is at tope is called as root node.
• The nodes which are connected to root node are called as Sub root
node (Sub trees) or leaf node.
• A node which does not contain any sub node are called leaf node.
• A node which contains sub nodes are called as non leaf node. And
also referred as sub leaf node.
• In the father child relation the root node can be ref as father
(parents) and the sub nodes which are directly connected to the
father are called as children.
• The children of same fathers are called as siblings.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees
• Root nodes: A A
• Leaf Nodes: C,E,G,H,I,J,K
• Non Leaf Nodes : A,B,D,F
• Siblings
– B,C,D B C D
– E,F
– G,H,I
– J,K E F G H I
• Children of A: B,C,D
• Children of B: EF
• Children of D: G,H,I J K
• Children of F: J,K
• Ansister to J & K : F,B,A
• Ansister to G,H,I : D,A
• Order of Tree : 3
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees
• Order of tree refers max no of nodes we can connect to any node
of the tree. (Above tree: 3)
• The degree of node specifies the no of active connections (nodes).
• There is no restriction on the order of general tree.
• Based on the implementation we have to define the restrictions
– The degree of node A is 3 & D is 3
– The degree of node B & F are 2
• Depth of the tree: If the tree is referred with level structure then
the level no start with 0 at root & increment by 1 towards
descendence (downwards)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• If the order of the tree is two A
then that tree can be referred
as Binary tree.
• In BT any non leaf node can B C
have only 2 sub nodes.
• The first node which is at
top is root node. D E F G
• First sub node is known as
Left Son (Left sub tree) H I J
• And the second node is
known as Right son (Right
sub node)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Complete Trees
• A binary tree is completely full if
– it has height, h, and
– it has 2h+1-1 nodes
• A binary tree of height, h, is complete iff
– it is empty or
– its left subtree is complete
of height h-1 and
its right subtree is completely full
of height h-2
or
– its left subtree is completely full
of height h-1 and
its right subtree is complete
of height h-1
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Complete Trees
• If we examine the examples, we see that a complete tree is “filled
in” from the left
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Method of transversal A
– Level Order transversal (LOT)
– Pre order transversal (POT)
– In order transversal (IOT) B C
– Post order transversal (PtOT)
D E F G
H I J
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Level Order transversal A
– In this method NODES
can be transverse level by
level starting from root B C
node
– Before transverse the
nodes which are at level n D E F G
the control must transverse
all the node which are H I J
level n-1
– A, B,C, D,E,F,G, H,I,J
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Pre Order transversal +
– Nodes can be transverse from root-left-right
– Ex: +AB
– A,B,D,E,H,I,C,F,J,G A B
• In order transversal
– Nodes can be transverse from left-root-right +
– Ex: A+B
– D,B,H,E,I,A,J,F,C,G
– In BST the data must be transverse in the ascending A B
order
• Post order transversal
– Nodes can be transverse from left-right-root +
– AB+
– D,H,I,E,B,J,F,G,C,A A B
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
• Traversal = visiting every node of a tree
• Three basic alternatives
Pre-order
– Root
– Left sub-tree
– Right sub-tree
x A +x+BC xDE F
L R
L R L
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
• Traversal = visiting every node of a tree
• Three basic alternatives
In-order
– Left sub-tree
– Root
– Right sub-tree
11
Ax B+C xDxE +F
L
L R
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
• Traversal = visiting every node of a tree
• Three basic alternatives
Post-order
11
– Left sub-tree
– Right sub-tree
– Root
A B C+ D Exx F+x
L
L R
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Tree Traversal
Post-order 11
– Left sub-tree
– Right sub-tree
– Root
Reverse-Polish
(A (((BC+)(DEx) x) F +)x )
• Normal algebraic form
(A x(((B+C)(DxE))+F))
= which traversal?
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing BT
– To construct a binary tree we require a self ref structure with
two pointers
• One is to refer to left sub tree
• Other is to refer to right sub tree
– The node which contain NUL in both ref can be refereed as
leaf node
– To insert a node at level n, first we have to fulfill (n-1) level
with nodes.
– The method of constructing a BT is level order construction &
it requires O/P restricted Dequeue.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing BT –Steps
– Make a first node as root node & push the address of first node in to O/P
restricted Dequeue
– For second node on words for each new node
• Pop the address from O/P dequeue
• If left is empty connect the new node as left son & push the popped address at
front side. And push the newly constructed node address at rear
• If the left is not empty, connect the new node as right son & push only the new
node address into the dequeue at rear.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
1 H 2 *DEQ[10]
T
5 H
T1 N 10 N Rear=0
Front=0
3 T N 10 N
T
4 N 10 N
H If h is null
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 20 N *DEQ[10]
6 T 7 T1
H
Front=1
10 N Rear=0
8 N 20 N
Front=0
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 20 N *DEQ[10]
6 T 7 T1
H
Front=1
10 N Front=0
N 20 N Rear=1
9
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 30 N *DEQ[10]
10 T 11 T1
H
Front=1
10 12
Front=0
N 20 N N 30 N
Rear=1
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 30 N *DEQ[10]
10 T 11 T1
H
Front=1
10
Front=0
N 20 N
Rear=1 N 30 N
13
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree - LOT
Root Info Left Right
Queue[20] K 0 0
A
C 3 6
G 0 0
B C 14
Root A 10 2
H 17 1
D E G H
L 0 0
Avail 9
F J K
4
B 18 13
L 19
F 0 0
E 12 0
15
16
11
J 7 0
D 0 0
20
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree - LOT
*DEQ[10]
11 T1
H
Front=1
10
Front=0
N 20 N
Rear=1 N 30 N
13
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• A BT is a finite set of elements that is either empty or partitioned into 3 disjoint subsets.
• The first subset contains a single element called the root of the tree.
• Other two sets are themselves binary trees called left & right sub trees of original tree.
• Where as in the multi-way BT a node contains more than one key value (elements) and
no of key vales of node depends upon the order of the tree.
• If “A” is the root of a BT & “B” is the root of Left or Right subtree then “A” is said to be
the father of “B” & “B” is said to be left or right son of “A”.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Node N1 is an ancestor of N2 if N1 is either father of N2 or father of some
ancestor of N2.
• Father can be ancestor to its left or right son but ancestor can not be the father.
• Non leaf nodes are called internal nodes & leaf nodes are called external nodes.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• In every non leaf node of a BT, if it A
has non empty left & right subtrees
then it is termed as Strictly binary tree.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• The Depth of BT is the max level of any A
leaf in the tree. That is the longest path
from root to any leaf node.
• A SBT who’s leaves at level “d” is
complete BT B C
• A SBT may not be CBT but CBT is always
SBT.
• If a BT contains “n” nodes at level “l” then D E F G
it contains at most “2n” nodes at level
“l+1”
• Max no of nodes at level l=2l
H I J K L M N O
• If “d” is the depth of the tree and tree is
CBT then the total no of nodes of the tree
are 2d+1 -1
– Total no of Nodes in CBT = 2d+1 -1
– Total no of leaf nodes in CBT = 2d
– Total no of non leaf nodes = 2d -1
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• If there is a Complete Binary Tree (CBT) & “n” is the total no of
nodes of that tree then the depth of tree :
– d=log2(N+1) -1
– But the general formula is d=log2n
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• With 2 nodes we can construct 2 diff type of B.T
• With 3 nodes we can construct 5 diff type of B.T
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Height balancing of a S.B.T which contains duplicate values may
be not possible at some instances.
• Level order transversal is also called as Breadth first Search (BFS)
• Pre order transversal is also called as depth first search (DFS)
• In order transversal is also called as symmetric order
• Non recursive functions without using stacks requires either father
field or thread field.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing a B.S Array which 0 75
represent a B.S.T
– Consider a initial values of the array
are Zeros which represent 1 2
availability of cells. 65 85
– Q=0
4 5 6
– If the node number is q then its 3
55 70 80 95
• Left son : 2q+1
• Right son: 2q+2
– Node no= 10
• 2*10+1 = 21 105
• 2*10+2 = 22
75 65 85 55 70 80 95 . . . . . . . 105
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Deleting a node from B.T
– While deleting in BT, the node which has to be replace that node position
must be the in order successor of the node that is to be deleted.
– First identify node in the tree
– Check the leaf node if yes, free that node by making its parent node ref as
Null else
• IF node is not having any right sub tree then
– move the left son into that position & free the node.
• If node is having only a single right son or having a right sub tree with single
node then
– move the right son into deleted position & free the node.
• If the right son contains left sub tree then
– place the left most node of the right son at the deleted position & free the node.
• Note: If node is deleted from the BT, SBT its inorder should not
change
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Delete node I (30) 100
– It is leaf node
– D->right = Null & free (I) A
• Delete node H (5) 50 200
– It is leaf node
– D->left = Null & free (D)
B C
• Delete node P (350)
25 60 150 300
– It is non leaf node not having right son
– M->left = R & free (P) D E F G
• Delete node R (325)
5 30 125 175 250 400
– It is non leaf node having right sub tree
without having left sub tree H I J K L M
– P->left = S & free (R)
• Delete node G (300) – Need to clear 160 190 350 500
300
– It is non leaf node having right subtree with
left subtree attached to that. G N O P Q
– C->Right = R
– P->left = S 325
250 400
– R->left = g-> left R
L M
– R->right = g-> right 340
– Free (g) S
P 340 500 Q
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Finding no of nodes in B.T (Recursive/Non Recursive)
int nc =0
void nodeCount (TREE *head)
{
if (!head)
Return;
nc++;
NodeCount (head->left);
NodeCount (head->right);
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Finding depth of the Tree
int depth =0
void TreeDepth (TREE *head , int level)
{
if (!head)
Return;
If ( level > depth )
depth = level;
nc++;
TreeDepth (head->left , level+1 );
TreeDepth (head->right , level+1 );
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• These trees are also known as
height balanced tree. 70,80,60,50,90,100,40,30,20,110,120
• The concept of AVL tree is to
improve the efficiency of BST
in minimizing the no of 70
comparisons required for
searching.
60 80
• While constructing a BST
based on the values the tree
may not be constructed in 50 90
proper way to satisfy the BS
property which requires log2 N 40
+1 comparison in worst case. 100
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Particular Nodes can be evaluated through the fallowing formula.
– Height diff = height of left Sub tree – height of right sub tree
• If the height diff is -1 , 0 or 1 then the height balancing is not
required at a particular node & it can be performed by rotating
nodes.
• While performing rotation the in order property should not be
changed
• Based on the height diff the rotations are classified into
– Left Rotation
– Right Rotation
• If the height diff is < -1 then the rotation should be left rotation.
• If the height diff is > +1 then the rotation must be right rotation.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Based on the node values again rotation is classified into
– Single Right Rotation
– Single Left Rotation
– Double Left Right Rotation
– Double Right Left Rotation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Single Right Rotation
– The height diff must be >+1
– Value of node must satisfy the following property A>B>C
– A : Node where rotation will require
– B,C: Descendents of A
B
A
70 60
B 60 50 70
C C A
50
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
x 70
• Before Rotation • After Rotation x
A
– X -> Left = A – X -> Left = B 60 80
– A -> Left = B – A -> Left =Y
b 50
– B -> Left = C – B -> Left = C 65 75 90
B
50 80
C A
40 60 75 90
30 55 65
Y
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Single Left Rotation
– The height diff must be < -1
– Value of node must satisfy the following property A<B<C
– A : Node where rotation will require
– B Should become the sub root
– A Should become the left son & C remains as right son
B
A
70 80
B 80 70 90
A C
90
C
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees
x 70
• Before Rotation • After Rotation
– X -> Right = A – X -> Right = B 60 80 A
– A -> Right = B – A -> Right =Y
– B -> Right = C – B -> Right = C 50 65 75 90 b
A C
40 60 80 100
75 85 110
Y
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL and other balanced trees
• AVL Trees
– First balanced tree algorithm
– Discoverers: Adelson-Velskii and Landis
• Properties
– Binary tree
– Height of left and right-subtrees differ by at most 1
– Subtrees are AVL trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL trees - Height
• Theorem
– An AVL tree of height h has at least Fh+3+1 nodes
• Proof
– Let Sh be the size of the smallest AVL tree of height h
– Clearly, S0 = 1 and S1 = 2
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rebalancing
• Insertion leads to non-AVL tree
– 4 cases
1 2 3 4
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rebalancing
• Case 1 solved by rotation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rebalancing
• Case 2 needs a double rotation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Data Structures
• AVL trees can be implemented with a flag to indicate the balance state
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dynamic Trees - Red-Black or AVL
• Insertion
– AVL : two passes through the tree
• Down to insert the node
• Up to re-balance
– Red-Black : two passes through the tree
• Down to insert the node
• Up to re-balance
but Red-Black is more popular??
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Forest
• An ordered set of trees forms a forest
• Ordered tree must satisfy fallowing criteria's
– The Preorder transversal must be same.
– The postorder transversal of the tree must be same as the
inorder transversal of the ordered tree.
• After constructing ordered trees. If we connect them in a
proper way that can be represented as Forest
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Forest
• Converting Binary Tree to
Ordered Tree A A
– Right son of the parent should
become the left descendant (ie,
Right son should connect to left B C B
son as Right son)
– In this process preorder property C
must not change
– Similarly for a general tree the son
which are in brother relation should
be represented as right descendents
to first son.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Searching - Re-visited
• Binary tree O(log n) if it stays balanced
– Simple binary tree good for static collections
– Low (preferably zero) frequency of
insertions/deletions
but my collection keeps changing!
– It’s dynamic
– Need to keep the tree balanced
• First, examine some basic tree operations
– Useful in several ways!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Searching
• Binary search tree
– Produces a sorted list by in-order traversal
• In order: ADE G HK L M N OP T V
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Searching
• Binary search tree
– Preserving the order
– Observe that this transformation preserves the
search tree
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Searching
• Binary search tree
– Preserving the order
– Observe that this transformation preserves the
search tree
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rotations
• Binary search tree
– Rotations can be either left- or right-rotations
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rotations
• Binary search tree
– Rotations can be either left- or right-rotations
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Binary search tree
– Each node is “coloured” red or black
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Every node is RED or BLACK
– Every leaf is BLACK
– If a node is RED,
then both children
are BLACK
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Every node is RED or BLACK
– Every leaf is BLACK
– If a node is RED,
then both children
are BLACK
– Every path
from a node to a leaf
contains the same number
of BLACK nodes
From the root,
there are 3 BLACK nodes
on every path
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Every node is RED or BLACK
– Every leaf is BLACK
– If a node is RED,
then both children
are BLACK
– Every path
from a node to a leaf
contains the same number
of BLACK nodes
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• Lemma
A RB-Tree with n nodes has
height 2 log(n+1)
– Proof .. See Cormen
• Essentially,
height 2 black height
• Search time
O( log n )
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• Data structure
– As we’ll see, nodes in red-black trees need to know their parents,
– so we need this data structure
struct t_red_black_node {
enum { red, black } colour;
void *item; Same as a
struct t_red_black_node *left, binary tree
*right, with these two
attributes
*parent; added
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Insertion of a new node
– Requires a re-balance of the tree
Insert node
4
Mark it red
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
x->parent
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
x->parent->parent
x->parent
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
x->parent->parent
x->parent
right “uncle”
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
New x
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. So move x up and
rotate about x as root ...
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
uncle
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. Change colours
and rotate ..
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Red-black trees - Analysis
• Addition
– Insertion Comparisons O(log n)
– Fix-up
• At every stage,
x moves up the tree
at least one level O(log n)
– Overall O(log n)
• Deletion
– Also O(log n)
• More complex
• ... but gives O(log n) behaviour in dynamic cases
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Red Black Trees - What you need to know?
• Code?
– This is not a course for masochists!
• You can find it in a text-book
• You need to know
– The algorithm exists
– What it’s called
– When to use it
• ie what problem does it solve?
– Its complexity
– Basically how it works
– Where to find an implementation
• How to transform it to your application
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dynamic Trees - A cautionary tale
• Insertion
– If you read Cormen et al,
• There’s no reason to prefer a red-black tree
– However, in Weiss’ text
M A Weiss, Algorithms, Data Structures and Problem Solving with
C++, Addison-Wesley, 1996
– you find that you can balance a red-black tree
in one pass!
– Making red-black more efficient than AVL
if coded properly!!!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dynamic Trees - A cautionary tale
• Insertion
– If you read Cormen et al,
• There’s no reason to prefer a red-black tree
– However, in Weiss’ text
M A Weiss, Algorithms, Data Structures and Problem Solving with
C++, Addison-Wesley, 1996
– you find that you can balance a red-black tree
in one pass!
– Making red-black more efficient than AVL
if coded properly!!!
Moral: You need to read the literature!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Dynamic Trees - A cautionary tale
• Insertion in one pass
– As you proceed down the tree,
if you find a node with two red children,
make it red and the children black
– This doesn’t alter the number of black nodes in any path
– If the parent of this node was red,
a rotation is needed ...
– May need to be a single or a double rotation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Adding 4 ...
Discover two red
children here
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Adding 4 ...
Red sequence,
violates
red-black property
Rotate
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Adding 4 ...
Rotate
Add the 4
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Balanced Trees - Yet more variants
• Basically the same ideas
– 2-3 Trees
– 2-3-4 Trees
• Special cases of m-way trees ... coming!
• Variable number of children per node
A more complex implementation
• 2-3-4 trees
– Map to red-black trees
Possibly useful to understand red-black trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Lecture 12 - Key Points
• AVL Trees
– First dynamically balanced tree
– Height within 44% of optimum
– Rebalanced with rotations
– O(log n)
• Less efficient than properly coded red-black trees
• 2-3, 2-3-4 trees
– m-way trees - Yet more variations
– 2-3-4 trees map to red-black trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
m-way trees
• Only two children per node?
• Reduce the depth of the tree to O(logmn)
with m-way trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
B-trees
• All leaves are on the same level
• All nodes except for the root and the leaves
have Each node is at least
– at least m/2 children half full of keys
– at most m children
• B+ trees
– All the keys in the nodes are dummies
– Only the keys in the leaves point to “real” data
– Linking the leaves
• Ability to scan the collection in order
without passing through the higher nodes
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
B+-trees
• B+ trees
– All the keys in the nodes are dummies
– Only the keys in the leaves point to “real” data
– Data records kept in a separate area
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
B+-trees - Scanning in order
• B+ trees
– Linking the leaves
• Ability to scan the collection in order
without passing through the higher nodes
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
B+-trees - Use
• Use - Large Databases
– Reading a disc block is much slower than reading memory ( ~ms vs ~ns )
– Put each block of keys in one disc block
Physical disc
blocks
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
B-trees - Insertion
• Insertion
– B-tree property : block is at least half-full of keys
– Insertion into block with m keys
• block overflows
• split block
• promote one key
• split parent if necessary
• if root is split, tree becomes one level deeper
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
B-trees - Insertion
• Insertion
– Insert 9
– Leaf node overflows,
split it
– Promote middle (8)
– Root overflows,
split it
– Promote middle (6)
– New root node formed
– Height increased by 1
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
B-trees on disc
• Disc blocks
– 512 - 8k bytes
100s of keys
Use binary search within the block
• Overall
– O( log n )
– Matched to hardware!
• Deletion similar
– But merge blocks to maintain B-tree property
(at least half full)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• A graph consist of set of nodes (Vertices) & set of arcs
(edges) which connects the nodes.
• All nodes may not be connected.
• Arcs can be either ordered pair or normal pairs and these
can be represented by the nodes which can be connected
by arcs.
• In undirected graph arc can be represented with (n1,n2).
• In directed graph arc can be represented with <n1,n2>
which is known as ordered pair.
• Digraph : If the arc is represented with arrow head line
then that graph is known as directed Graph (Digraph)
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• This is undirected graphs Arc/Edge
A B
• Nodes are
– (A,B) or (B,A) E
– (A,C) or (C,A)
C D
– (C,D) or (D,C)
– (B,E) or (E,B) Node/Vertices
– (D,E) or (E,D) F
– (D,F) or (F,D) H Pendent Vector
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• Directed Graph A B
– Nodes which are at arc heads are
known as head nodes E
– Nodes which are at tails are known
as tail nodes C D
– Head node is adjacent to tail node
• Cyclic Graph
Nodes F
– Node A is pointing to itself <A,B>
• Acyclic <A,C>
<C,D>
– No node is pointing to itself <B,D>
<D,C>
• Directed Acyclic graph or Cyclic <D,A>
Directed graph <E,B>
<E,D>
– Directed graph without any cycle <F,D>
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• If the n is incident to arc x then it can be incident to both
the nodes which forms an ordered pair.
• Degree of node : max no of incidents of a node.
• IN-Degree : no of incidents which contains that node as
head node.
• OUT-Degree : no of incidents which contains that node
as tails node.
• Eg: For D-Node
– Degree is 6
– In-Degree is 4
– Out-Degree is 2
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• Applications of Graph
– Operations Research
• PERT charts
• CPM charts
– Flow problem
– Network problems
• If the arc contain some value then the value is known as
weight of the arc & the graph can be referred as
weighted graph.
50
A B
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• The graph can be represented through
– Arrays
– Tree structures
– Sparse Matrix
• Adjacency Matrix
– When the graph is represented with the 2 dim array, which shows the
relation then that 2 dim array is called as Adjacency Matrix.
– The node data can be stored in separate hash table by giving numbers to
nodes starting from Zero.
– The element of matrix can be either weight or Boolean values.
– The matrix with Boolean values is known as Adjacency Matrix.
– The order of matrix is depends on the no of nodes in the graph.
– If n is no of nodes then Order of matrix is n*n.
– It must represent only ordered pairs.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Data Structures
• Vertices
– Map to consecutive integers
– Store vertices in an array
• Edges
– Adjacency Matrix
• Booleans -
TRUE - edge exists
FALSE - no edge
• O(|V|2) space
• Can be compacted
– 1 bit/entry
– If undirected,
top half only
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Data Structures
• Edges
– Adjacency Lists
• For each vertex
– List of vertices “attached” to it
• For each edge
– 2 entries
– One in the list for each end
• O(|E|) space
Better for sparse graphs
Undirected representation
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• Graph Operations are
– Establishing relations in the adjacency Matrix or in
the weighted matrix.
– Removing the relations
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• Finding the path matrix using adjacency Matrix
– The adjacency Matrix can be know as PATH of LENGTH 1 Matrix.
– If nodes A & B are in direct relations then the no of path between A & B are
1, & it is referred as PATH of LENGTH 1.
– Here the no of nodes are 2, no of intermediate nodes are 0 & the no of paths
are 1.
– Hence path of length k of 2 nodes which are in indirect relation through k-1
no of nodes, Total no of nodes = k+ 1
– After considering adjacency matrix as PATH 1 matrix (p1), Boolean product
of p1 and adjacency matrix returns PATH 2 matrix (p2). Once again the
Boolean product of p2 and adjacency matrix returns PATH matrix.
– The PATH k matrix (Pk) is the Boolean product of Pk-1 & adjacency matrix.
– The Matrix which shows k no of possible paths is known as PATH of length
K matrix which is known as Transitive closure
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• ….
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• Representing Graph through Multi-way linked list
– While representing graph with a LL, the main LL must contain all graph nodes &
the sub list which are connected to LL nodes should represent the Ordered pairs
A B C D E F N
B C N D N D N B D N D N
A C N
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• Nodes which are in the main LL are known as Graph nodes.
• Nodes which are in the sub list are known as ARC nodes.
• Hence the graph nodes structure definition
– Data members
– Two Self ref pointer for DLL
– A pointer for arc node
• The arc node structure Definition
– A Self ref pointer to indicate next arc
– A pointer for graph node
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
typedef struct gnode
{
int n1,n2,n3;
struct gnode *prev;
struct gnode *head;
struct arc *arcptr;
}GNODE;
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• Finding the transitive closure matrix for weighted graph
– While representing weighted graph through matrix the
elements which represent the relations should contain weights
– All elements must be initialize with some value to apply
WARSHALL’S algorithm to find out shortest distance between
two nodes.
– To find out transitive closure matrix construct an adjacency
matrix from the weighted matrix & apply WARSHALL’S
Algorithm on the adjacency matrix.
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• DIJIKSTRA’S Algorithm
• This algorithm can be used to find out shortest root from the
source node to target node.
– Consider the source node & make it as permanent with its label which
contains distance and its predecessor node.
– For the source node distance will be Zero.
– Predecessor will be NULL
– Identify all reachable nodes from that node and construct labels with sum of
the distance from current node to reachable node and with predecessor
– If the already existing label is permanent avoid the current label otherwise
make the least label as permanent
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs
• DIJIKSTRA’S Algorithm A B
C D F
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Traversing
• Choices
– Depth-First / Breadth-first
• Depth First
– Use an array of flags to mark
“visited” nodes
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Depth-First
struct t_graph {
int n_nodes; Graph data
graph_node *nodes; structure
int *visited;
AdjMatrix am; Adjacency Matrix ADT
}
static int search_index = 0;
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Depth-First
void visit( graph g, int k ) {
int j;
Mark the order in which
g->visited[k] = ++search_index;
this node was visited
for(j=0;j<g->n_nodes;j++) {
if ( adjacent( g->am, k, j ) ) {
if ( !g->visited[j] ) visit( g, j );
}
Visit all the nodes adjacent
to this one
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Depth-First
void visit( graph g, int k ) {
int j;
Mark the order in which
g->visited[k] = ++search_index;
this node was visited
for(j=0;j<g->n_nodes;j++) {
if ( adjacent( g->am, k, j ) ) {
if ( !g->visited[j] ) visit( g, j );
}
Visit all the nodes adjacent
to this one
C hack ...
Should be g->visited[j] != 0
Search_index == 0 means not visited yet!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Depth-First
Adjacency List version of visit
void visit( graph g, int k ) {
AdjListNode al_node;
g->visited[k] = ++search_index;
al_node = ListHead( g->adj_list[k] );
while( n != NULL ) {
j = ANodeIndex( ListItem( al_node ) );
if ( !g->visited[j] ) visit( g, j );
al_node = ListNext( al_node );
}
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Depth-First
Adjacency List version of visit
void visit( graph g, int k ) {
AdjListNode al_node;
g->visited[k] = ++search_index;
al_node = ListHead( g->adj_list[k] );
while( n != NULL ) {
Assumes
j = ANodeIndex( ListItem( a List
al_node ADT with methods
) );
ListHead
if ( !g->visited[j] ) visit( g, j );
ANodeIndex
al_node = ListNext( al_node );
} ListItem
}
ListNext
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graph - Breadth-first Traversal
• Adjacency List
– Time complexity
• Visited set for each node
• Each edge visited twice
– Once in each adjacency list
• O(|V| + |E|)
O(|V|2) for dense |E| ~ |V|2 graphs
• but O(|V|) for sparse |E| ~ |V| graphs
• Adjacency Lists perform better for sparse graphs
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graph - Breadth-first Traversal
• Breadth-first requires a FIFO queue
static queue q;
void search( graph g ) {
q = ConsQueue( g->n_nodes );
for(k=0;k<g->n_nodes;k++) g->visited[k] = 0;
search_index = 0;
for(k=0;k<g->n_nodes;k++) {
if ( !g->visited[k] ) visit( g, k );
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graph - Breadth-first Traversal
• Breadth-first requires a FIFO queue
void visit( graph g, int k ) {
al_node al_node;
int j;
AddIntToQueue( q, k ); Put this node on the queue
while( !Empty( q ) ) {
k = QueueHead( q );
g->visited[k] = ++search_index;
al_node = ListHead( g->adj_list[k]);
while( al_node != NULL ) {
j = ANodeIndex(al_node);
if ( !g->visited[j] ) {
AddIntToQueue( g, j );
g->visited[j] = -1; /* C hack, 0 = false! */
al_node = ListNext( al_node );
}
}
}
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Key Points - Lecture 19
• Dynamic Algorithms
• Optimal Binary Search Tree
– Used when
• some items are requested more often than others
• frequency for each item is known
– Minimises cost of all searches
– Build the search tree by
• Considering all trees of size 2, then 3, 4, ....
• Larger tree costs computed from smaller tree costs
– Sub-trees of optimal trees are optimal trees!
• Construct optimal search tree by saving root of each optimal sub-tree
and tracing back
• O(n3) time / O(n2) space
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Key Points - Lecture 19
• Other Problems using Dynamic Algorithms
• Matrix chain multiplication
– Find optimal parenthesisation of a matrix product
• Expressions within parentheses
– optimal parenthesisations themselves
• Optimal sub-structure characteristic of dynamic algorithms
• Similar to optimal binary search tree
• Longest common subsequence
– Longest string of symbols found in each of two sequences
• Optimal triangulation
– Least cost division of a polygon into triangles
– Maps to matrix chain multiplication
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Definitions
• Graph
– Set of vertices (nodes) and edges connecting them
– Write
G = ( V, E )
where
• V is a set of vertices: V = { vi }
• An edge connects two vertices: e = ( vi , vj )
• E is a set of edges: E = { (vi Vertices
, vj ) }
Edges
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Definitions
• Path
– A path, p, of length, k, is a sequence of connected
vertices
– p = <v0,v1,...,vk> where (vi,vi+1 < i,) c,f, g,
E h>
Path of length 5
< a, b >
Path of length 2
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Definitions
• Cycle
– A graph contains no cycles if there is no path
– p = <v0,v1,...,vk> such that v0 = vk
< i, c, f, g, i >
is a cycle
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Definitions
• Spanning Tree
– A spanning tree is a set of |V|-1 edges that connect
all the vertices of a graph
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Definitions
• Minimum Spanning Tree
– Generally there is more than one spanning tree
– If a cost cij is associated with edge eij = (vi,vj)
then the minimum spanning tree is the set of edges Espan such
that
C = S( cij | " eij Espan ) Other ST’s can be formed ..
is a minimum • Replace 2 with 7
• Replace 4 with 11
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm
• Calculate the minimum spanning tree
– Put all the vertices into single node trees by themselves
– Put all the edges in a priority queue
– Repeat until we’ve constructed a spanning tree
• Extract cheapest edge
• If it forms a cycle, ignore it
else add it to the forest of trees
(it will join two trees into a larger tree)
– Return the spanning tree
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm
• Calculate the minimum spanning tree
– Put all the vertices into single node trees by themselves
– Put all the edges in a priority queue
– Repeat until we’ve constructed a spanning tree
• Extract cheapest edge
• If it forms a cycle, ignore it
else add it to the forest of trees
(it will
Note that thisjoin two trees into
algorithm makesa larger tree)
no attempt
– •Return the spanning tree
to be clever
• to make any sophisticated choice of the next edge
• • it just tries the cheapest one!
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm in C
Forest MinimumSpanningTree( Graph g, int n,
double **costs ) {
Forest T;
Queue q;
Edge e;
T = ConsForest( g ); Initial Forest: single vertex trees
q = ConsEdgeQueue( g, costs );
for(i=0;i<(n-1);i++) { P Queue of edges
do {
e = ExtractCheapestEdge( q );
} while ( !Cycle( e, T ) );
AddEdge( T, e );
}
return T;
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm in C
Forest MinimumSpanningTree( Graph g, int n,
double **costs ) {
Forest T;
Queue q; We need n-1 edges
Edge e; to fully connect (span)
T = ConsForest( g ); n vertices
q = ConsEdgeQueue( g, costs );
for(i=0;i<(n-1);i++) {
do {
e = ExtractCheapestEdge( q );
} while ( !Cycle( e, T ) );
AddEdge( T, e );
}
return T;
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm in C
Forest MinimumSpanningTree( Graph g, int n,
double **costs ) {
Forest T;
Queue q;
Edge e;
T = ConsForest( g );
q = ConsEdgeQueue( g, costs );
for(i=0;i<(n-1);i++) {Try the cheapest edge
do {
e = ExtractCheapestEdge( q );
} while ( !Cycle( e, T ) );
AddEdge( T, e ); Until we find one that doesn’t
} form a cycle
return T;
} ... and add it to the forest
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm
• Priority Queue
– WeMinimumSpanningTree(
Forest already know about this!!
Graph g, int n,
double **costs ) {
Forest T;
Queue q;
Edge e;
T = ConsForest( g );
Add to
q = ConsEdgeQueue( g, costs );
a heap here
for(i=0;i<(n-1);i++) {
do {
Extract from
e = ExtractCheapestEdge( q );
a heap here
} while ( !Cycle( e, T ) );
AddEdge( T, e );
}
return T;
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm
• Cycle detection
Forest MinimumSpanningTree( Graph g, int n,
double **costs ) {
Forest T;
Queue q;
Edge e;
T = ConsForest( g );
q = ConsEdgeQueue( g, costs );
for(i=0;i<(n-1);i++) {
do {
e = ExtractCheapestEdge( q ); But how do
} while ( !Cycle( e, T ) ); we detect a
AddEdge( T, e ); cycle?
}
return T;
}
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm
• Cycle detection
– Uses a Union-find structure
– For which we need to understand a partition of a set
• Partition
– A set of sets of elements of a set
• Every element belongs to one of the sub-sets
• No element belongs to more than one sub-set
– Formally:
• Set, S = { si }
• Partition(S) = { Pi }, where Pi = { si } Pi are subsets of S
" si S, si Pj
All si belong to one of the Pj
• " j, k Pj Pk =
• S = Pj None of the Pi
have common elements
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm
• Partitions
– In the MST algorithm,
the connected vertices form equivalence classes
• “Being connected” is the equivalence relation
– Initially, each vertex is in a class by itself
– As edges are added,
more vertices become related
and the equivalence classes grow
– Until finally all the vertices are in a single equivalence class
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm
• Representatives
– One vertex in each class may be chosen as the representative of
that class
– We arrange the vertices in lists that lead to the representative
• This is the union-find structure
• Cycle determination
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm
• Cycle determination
– If two vertices have the same representative,
they’re already connected and adding a further
connection between them is pointless
– Procedure:
• For each end-point of the edge that you’re going to add
• follow the lists and find its representative
• if the two representatives are equal,
then the edge will form a cycle
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
All the vertices are in
single element trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
All the vertices are in
single element trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The cheapest edge
is h-g
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is c-i
Add it to the forest,
joining c and i into a
2-element tree
Choose c as its
representative
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is a-b
Add it to the forest,
joining a and b into a
2-element tree
Choose b as its
representative
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is c-f
Add it to the forest,
merging two
2-element trees
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is g-i
The rep of g is c
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is c-d
The rep of c is c
The rep of d is d
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is h-i
The rep of h is c
The rep of i is c
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is a-h
The rep of a is b
The rep of h is c
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
The next cheapest edge
is b-c But b-c forms a cycle
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Greedy Algorithms
• At no stage did we attempt to “look ahead”
• We simply made the naïve choice
– Choose the cheapest edge!
• MST is an example of a greedy algorithm
• Greedy algorithms
– Take the “best” choice at each step
– Don’t look ahead and try alternatives
– Don’t work in many situations
• Try playing chess with a greedy approach!
– Are often difficult to prove
• because of their naive approach
• what if we made this other (more expensive) choice now and later on ..... ???
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Proving Greedy Algorithms
• MST Proof
– “Proof by contradiction” is usually the best approach!
– Note that
• any edge creating a cycle is not needed
Each edge must join two sub-trees
– Suppose that the next cheapest edge, ex, would join trees Ta and Tb
– Suppose that instead of ex we choose ez - a more expensive edge, which
joins Ta and Tc
– But we still need to join Tb to Ta or some other tree to which Ta is
connected
– The cheapest way to do this is to add ex
– So we should have added ex instead of ez
– Proving that the greedy approach is correct for MST
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
MST - Time complexity
• Steps
– Initialise forest O( |V| )
– Sort edges O( |E|log|E| )
• Check edge for cycles O( |V| ) x
• Number of edges O( |V| ) O( |V|2 )
– Total O( |V|+|E|log|E|+|V|2 )
– Since |E| = O( |V|2 ) O( |V|2 log|V| )
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
MST - Time complexity
• Steps
– Initialise forest O( |V| )
– Sort edges O( Here’s
|E|log|E| ) the
• Check edge for cycles“professionals
O( |V| ) x read textbooks”
• Number of edges O( |V| theme
) O( |V|recurring
2)
again!
– Total O( |V|+|E|log|E|+|V|2 )
– Since |E| = O( |V|2 ) O( |V|2 log|V| )
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Anand B [email protected]
H I 1 2 3 4 5 6 7 C *
Thanking you
Good Luck