DS Ebook
DS Ebook
Data
Structures
Version 2.0
June 2006
Data Structures
J.E.D.I
Author
Joyce Avestro
Team
Joyce Avestro
Florence Balagtas
Rommel Feria
Reginald Hutcherson
Rebecca Ong
John Paul Petines
Sang Shin
Raghavan Srinivas
Matthew Thompson
Solaris OS (SPARC):
Processor: PowerPC G4
Memory: 512 MB
Disk space: 450 MB of free disk space
Solaris OS (SPARC):
Data Structures
J.E.D.I
Data Structures
Processor: PowerPC G5
Memory: 1 GB
Disk space: 850 MB of free disk space
Required Software
NetBeans Enterprise Pack 5.5 Early Access runs on the Java 2 Platform
Standard Edition Development Kit 5.0 Update 1 or higher (JDK 5.0, version
1.5.0_01 or higher), which consists of the Java Runtime Environment plus
developer tools for compiling, debugging, and running applications written in
the Java language. Sun Java System Application Server Platform Edition 9
has been tested with JDK 5.0 update 6.
For Solaris, Windows, and Linux, you can download the JDK for
your platform from http://java.sun.com/j2se/1.5.0/download.html
J.E.D.I
Table of Contents
1 Basic Concepts and Notations............................................................................. 8
1.1 Objectives................................................................................................. 8
1.2 Introduction.............................................................................................. 8
1.3 Problem Solving Process ............................................................................. 8
1.4 Data Type, Abstract Data Type and Data Structure..........................................9
1.5 Algorithm................................................................................................ 10
1.6 Addressing Methods.................................................................................. 10
1.6.1 Computed Addressing Method ............................................................. 10
1.6.2 Link Addressing Method...................................................................... 11
1.6.2.1 Linked Allocation: The Memory Pool............................................... 11
1.6.2.2 Two Basic Procedures................................................................... 12
1.7 Mathematical Functions............................................................................. 13
1.8 Complexity of Algorithms........................................................................... 14
1.8.1 Algorithm Efficiency........................................................................... 14
1.8.2 Operations on the O-Notation.............................................................. 15
1.8.3 Analysis of Algorithms........................................................................ 17
1.9 Summary ............................................................................................... 19
1.10 Lecture Exercises.................................................................................... 19
2 Stacks........................................................................................................... 21
2.1 Objectives............................................................................................... 21
2.2 Introduction............................................................................................. 21
2.3 Operations.............................................................................................. 22
2.4 Sequential Representation......................................................................... 23
2.5 Linked Representation .............................................................................. 24
2.6 Sample Application: Pattern Recognition Problem.......................................... 25
2.7 Advanced Topics on Stacks........................................................................ 30
2.7.1 Multiple Stacks using One-Dimensional Array......................................... 30
2.7.1.1 Three or More Stacks in a Vector S................................................ 30
2.7.1.2 Three Possible States of a Stack.................................................... 31
2.7.2 Reallocating Memory at Stack Overflow................................................. 31
2.7.2.1 Memory Reallocation using Garwick's Algorithm............................... 32
2.8 Summary................................................................................................ 36
2.9 Lecture Exercises...................................................................................... 37
2.10 Programming Exercises........................................................................... 37
3 Queues.......................................................................................................... 39
3.1 Objectives............................................................................................... 39
3.2 Introduction............................................................................................. 39
3.3 Representation of Queues.......................................................................... 39
3.3.1 Sequential Representation................................................................... 40
3.3.2 Linked Representation........................................................................ 41
3.4 Circular Queue......................................................................................... 42
3.5 Application: Topological Sorting.................................................................. 44
3.5.1 The Algorithm................................................................................... 45
3.6 Summary................................................................................................ 47
3.7 Lecture Exercise....................................................................................... 48
3.8 Programming Exercises............................................................................. 48
4 Binary Trees .................................................................................................. 49
4.1 Objectives............................................................................................... 49
4.2 Introduction............................................................................................. 49
Data Structures
J.E.D.I
Data Structures
J.E.D.I
Data Structures
J.E.D.I
Data Structures
J.E.D.I
Chapter
Chapter
Chapter
Chapter
Chapter
6 .................................................................................................... 205
7 .................................................................................................... 206
8 .................................................................................................... 207
9 .................................................................................................... 208
10 .................................................................................................. 209
Data Structures
J.E.D.I
1.2 Introduction
In creating a solution in the problem solving process, there is a need for representing
higher level data from basic information and structures available at the machine level.
There is also a need for synthesis of the algorithms from basic operations available at the
machine level to manipulate higher-level representations. These two play an important
role in obtaining the desired result. Data structures are needed for data representation
while algorithms are needed to operate on data to produce correct output.
In this lesson we will discuss the basic concepts behind the problem-solving process,
data types, abstract data types, algorithm and its properties, the addressing methods,
useful mathematical functions and complexity of algorithms.
J.E.D.I
medium bits, bytes, words, etc consists of serially arranged bits that are addressable
as a unit. The processing units allow us to perform basic operations that include
arithmetic, comparison and so on.
Solution domain, on the other hand, links the problem and machine domains. It is at
the solution domain where structuring of higher level data structures and synthesis of
algorithms are of concern.
Byte-length integer
short
Short integer
int
Integer
long
Long integer
float
double
char
A single character
boolean
Abstract Data Type (ADT), on the other hand, is a mathematical model with a
collection of operations defined on the model. It specifies the type of data stored. It
specifies what its operations do but not how it is done. In Java, ADT can be expressed
with an interface, which contains just a list methods. For example, the following is an
interface of the ADT stack, which we will cover in detail in Chapter 2:
public interface Stack{
public int size(); /* returns the size of the stack */
public boolean isEmpty(); /* checks if empty */
public Object top() throws StackException;
public Object pop() throws StackException;
public void push(Object item) throws StackException;
}
Data structure is the implementation of ADT in terms of the data types or other data
structures. A data structure is modeled in Java by a class. Classes specify how
operations are performed. In Java, to implement an ADT as a data structure, an interface
is implemented by a class.
Abstraction and representation help us understand the principles behind large software
systems. Information-hiding can be used along with abstraction to partition a large
system into smaller subsystems with simple interfaces that makes them easier to
Data Structures
10
J.E.D.I
1.5 Algorithm
Algorithm is a finite set of instructions which, if followed, will accomplish a task. It has
five important properties: finiteness, definiteness, input, output and effectiveness.
Finiteness means an algorithm must terminate after a finite number of steps.
Definiteness is ensured if every step of an algorithm is precisely defined. For example,
"divide by a number x" is not sufficient. The number x must be define precisely, say a
positive integer. Input is the domain of the algorithm which could be zero or more
quantities. Output is the set of one or more resulting quantities which is also called the
range of the algorithm. Effectiveness is ensured if all the operations in the algorithm are
sufficiently basic that they can, in principle, be done exactly and in finite time by a
person using paper and pen.
Consider the following example:
public class Minimum {
The Java code above returns the minimum value from an array of integers. There is no
user input since the data from where to get the minimum is already in the program. That
is, for the input and output properties. Each step in the program is precisely defined.
Hence, it is definite. The declaration, the for loop and the statement to output will all
take a finite time to execute. Thus, the finiteness property is satisfied. And when run, it
returns the minimum among the values in the array so it is said to be effective.
All the properties of an algorithm must be ensured in writing an algorithm.
of a structure in pre-
11
J.E.D.I
Data Structures
12
J.E.D.I
AvailList(Node n){
head = n;
}
if ( head == null) {
return null;
/* avail list is empty */
}
else {
a = head.link;
/* assign the node to return to a */
head = head.link.link; /* advance the head pointer to
the next node in the avail list */
return a;
}
Data Structures
13
J.E.D.I
while the following method in the class Avail returns a node to the avail list:
void retNode(Node n){
n.link = head.link;
head.link = n;
Floor of x - the greatest integer less than or equal to x, where x is any real number.
Notation: x
e.g.
3.14 = 3
1/2
= 0 -1/2 = - 1
Ceiling of x - is the smallest integer greater than or equal to x, where x is any real
number.
Notation : x
Data Structures
14
J.E.D.I
e.g.
3.14 = 4
1/2 = 1
-1/2 = 0
=x
=x-y* x/y
10 mod 3 = 1
24 mod 8 = 0
if y = 0
if y <> 0
-5 mod 7 = 2
Identities
The following are the identities related to the mathematical functions defined above:
x=x
if and only if x is an integer
x=x
if and only if x is not an integer
- x = - x
x + y <= x + y
x = x + x mod 1
z ( x mod y ) = zx mod zy
Data Structures
15
J.E.D.I
Description
Algorithm
O(1)
Constant
O(log2n)
Logarithmic
Binary Search
O(n)
Linear
Sequential Search
O(n log2n)
Heapsort
O(n2)
Quadratic
Insertion Sort
O(n )
Cubic
Floyds Algorithm
O( 2 )
Exponential
To make the difference clearer, let's compare based on the execution time where
n=100000 and time unit = 1 msec:
F(n)
Running Time
log2n
19.93 microseconds
1.00 seconds
n log2n
19.93 seconds
11.57 days
n3
317.10 centuries
Eternity
Data Structures
16
J.E.D.I
n n 0.
n n 0.
n n 0.
n(n+1)/2
= n2/2 + n/2
= O(n2)
Data Structures
17
J.E.D.I
#Times Executed
n+1
found = false;
loc
= 1;
while ((loc <= n) && (!found)){
if (item == a[loc]found = true;
18
J.E.D.I
5
6
STATEMENT
# of times executed
n+1
T( n ) = 3n + 3 so that T( n ) = O( n )
Since g( n ) <= c f( n ) for n >= n 0, then
3n + 3 <= c n
(3n + 3)/n <= c
= 3 + 3 / n <= c
Thus c = 4 and n0 = 3.
The following are the general rules on determining the running time of an algorithm:
FOR loops
At most the running time of the statement inside the for loop times the
number of iterations.
CONSECUTIVE STATEMENTS
The statement with the maximum running time.
IF/ELSE
Never more than the running time of the test plus the larger of the running
times of the conditional block of statements.
Data Structures
19
J.E.D.I
1.9 Summary
-5.3
6.14
8 mod 7
3 mod 4
5 mod 2
10 mod 11
(15 mod 9) + 4.3
2. What is the time complexity of the algorithm with the following running times?
a)
b)
c)
d)
3n5 + 2n3 + 3n +1
n3/2+n2/5+n+1
n5+n2+n
n3 + lg n + 34
3. Suppose we have two parts in an algorithm, the first part takes T(n1)=n3+n+1 time to
execute and the second part takes T(n2) = n5+n2+n, what is the time complexity of
the algorithm if part 1 and part 2 are executed one at a time?
4. Sort the following time complexities in ascending order.
0(n log2 n)
0(n2)
0(n)
0(log2 n)
0(n2 log2 n)
0(1)
0(n3)
0(nn)
0(2n)
0(log2 log2 n)
5. What is the execution time and time complexity of the algorithm below?
Data Structures
20
J.E.D.I
Data Structures
21
J.E.D.I
2 Stacks
2.1 Objectives
At the end of the lesson, the student should be able to:
2.2 Introduction
Stack is a linearly ordered set of elements having the the discipline of last-in, first out,
hence it is also known as LIFO list. It is similar to a stack of boxes in a warehouse,
where only the top box could be retrieved and there is no access to the other boxes.
Also, adding a box means putting it at the top of the stack.
Stacks are used in pattern recognition, lists and tree traversals, evaluation of
expressions, resolving recursions and a lot more. The two basic operations for data
manipulation are push and pop, which are insertion into and deletion from the top of
stack respectively.
Just like what was mentioned in Chapter 1, interface (Application Program Interface or
API) is used to implement ADT in Java. The following is the Java interface for stack:
public interface Stack{
public int size(); /* returns the size of the stack */
public boolean isEmpty(); /* checks if empty */
public Object top() throws StackException;
public Object pop() throws StackException;
public void push(Object item) throws StackException;
}
StackException is an extension of RuntimeException:
class StackException extends RuntimeException{
public StackException(String err){
super(err);
}
}
Stacks has two possible implementations a sequentially allocated one-dimensional
Data Structures
22
J.E.D.I
array (vector) or a linked linear list. However, regardless of implementation, the interface
Stack will be used.
2.3 Operations
The following are the operations on a stack:
Getting the size
Checking if empty
Getting the top element without deleting it from the stack
Insertion of new element onto the stack (push)
Deletion of the top element from the stack (pop)
Data Structures
23
J.E.D.I
24
J.E.D.I
}
/* Implementation of top() */
public Object top(){
if (isEmpty()) throw new
StackException("Stack empty.");
return S[top];
}
/* Implementation of pop() */
public Object pop(){
Object item;
if (isEmpty())
throw new StackException("Stack underflow.");
item = S[top];
S[top--] = null;
return item;
}
/* Implementation of push() */
public void push(Object item){
if (size()==capacity)
throw new StackException("Stack overflow.");
S[++top]=item;
}
The following Java code implements the ADT stack using linked representation:
public class LinkedStack implements Stack{
private Node top;
Data Structures
25
J.E.D.I
/* Implementation of push() */
public void push(Object item){
Node newNode = new Node();
newNode.info = item;
newNode.link = top;
top = newNode;
}
Data Structures
26
J.E.D.I
Action
abbabcbabba
------
Stack
(bottom) --> (top)
abbabcbabba
Push a
bbabcbabba
Push b
ab
babcbabba
Push b
abb
abcbabba
Push a
abba
bcbabba
Push b
abbab
cbabba
Discard c
abbab
babba
abba
abba
abb
bba
ab
ba
Success
Input
Action
abacbab
------
Stack
(bottom) --> (top)
abacbab
Push a
bacbab
Push b
ab
acbab
Push a
aba
cbab
Discard c
aba
bab
ba
27
J.E.D.I
else {
boolean inL = pr.recognize(args[0]);
if (inL) System.out.println(args[0] +
" is in the language.");
else System.out.println(args[0] +
" is not in the language.");
}
Data Structures
28
J.E.D.I
proper head is
Priority
Property
Example
right associative
a^b^c = a^(b^c)
*/
left associative
a*b*c = (a*b)*c
+-
left associative
a+b+c = (a+b)+c
Examples:
Infix Expression
Postfix Expression
a*b+c/d
ab*cd/-
a^b^c-d
abc^^d-
a*(b+(c+d)/e)-f
a b c d + e /+* f -
a*b/c+f*(g+d)/(fh)^i
ab*c/fgd+*fhi^/+
Data Structures
Operand
+1
+-
-1
*/
-1
-1
29
J.E.D.I
Stack
a + ( b * c + d ) - f / g ^ h
Output
into its
Remarks
Output a
Push +
+(
Push (
+(
ab
Output b
+(*
ab
+(*
abc
Output c
+(+
abc*
+(+
abc*d
Output d
abc*d+
Pop +, pop (
abc*d++
abc*d++f
Output f
-/
abc*d++f
icp(/)>isp(-), push /
-/
abc*d++fg
Output g
-/^
abc*d++fg
icp(^)>isp(/), push ^
-/^
abc*d++fgh
Output h
abc*d++fgh^/-
Data Structures
30
J.E.D.I
= n/m * i - 1
= n-1
0i<m
B[i] points to the space one cell below the stack's first actual cell. To initialize the stack,
tops are set to point to base addresses, i.e.,
T[i] = B[i] ,
0im
For example:
31
J.E.D.I
Figure 1.12 Three States of Stack (empty, non-empty but not full, full)
The following Java code snippets show the implementation of the operations push and
pop for multiple stacks:
/* Pushes element on top of stack i */
public void push(int i, Object item) {
if (T[i]==B[i+1]) MStackFull(i);
S[++T[i]]=item;
}
/* Pops the top of stack i */
public Object pop(int i) throws StackException{
Object item;
if (isEmpty(i))
throw new StackException("Stack underflow.");
item = S[T[i]];
S[T[i]--] = null;
return item;
}
The method MStackFull handles the overflow condition.
Data Structures
32
J.E.D.I
Data Structures
33
J.E.D.I
m = number of stacks
is the number of cells that each stack gets from 10% of available
space allotted
is number of cells that the stack will get per unit increase in stack
usage from the remaining 90% of free space
=0
for j = 1 to m-1:
= + + diff[j-1]*
B[j] = B[j-1] + size[j-1] + -
=
4. Shift stacks to their new boundaries
5. Set oldT = T
Data Structures
34
J.E.D.I
Consider the following example. Five stacks coexist in a vector of size 500. The state of
the stacks are shown in the figure below:
Factor
Value
stack sizes
differences
freecells
incr
0 + 15+1 + 16 + 0 + 8 = 40
35
J.E.D.I
= 3.62
= -1 + 80 + 3.62 0 = 82
[OK]
36
J.E.D.I
int totalSize = 0;
double freecells, incr = 0;
double alpha, beta, sigma=0, tau=0;
/* Compute for the allocation factors */
for (int j=0; j<m; j++){
size[j] = T[j]-B[j];
if ( (T[j]-oldT[j]) > 0 )
diff[j] = T[j]-oldT[j];
else diff[j] = 0;
totalSize += size[j];
incr += diff[j];
}
diff[i]++;
size[i]++;
totalSize++;
incr++;
freecells = n - totalSize;
alpha = 0.10 * freecells / m;
beta = 0.90 * freecells / incr;
/* If every stack is full */
if (freecells < 1)
throw new StackException("Stack overflow.");
/* Compute for the new bases */
for (int j=1; j<m; j++){
tau = sigma + alpha + diff[j-1] * beta;
B[j] = B[j-1] + size[j-1] + (int) Math.floor(tau)
- (int) Math.floor(sigma);
sigma = tau;
}
/* Restore size of the overflowed stack to its old value */
size[i]--;
2.8 Summary
A stack is a linearly ordered set of elements obeying the last-in, first-out (LIFO)
principle
Two basic stack operations are push and pop
Stacks have two possible implementations a sequentially allocated onedimensional array (vector) or a linked linear list
Stacks are used in various applications such as pattern recognition, lists and tree
traversals, and evaluation of expressions
Two or more stacks coexisting in a common vector results in better memory
utilization
Memory reallocation techniques include the unit-shift method and Garwick's
algorithm
Data Structures
37
J.E.D.I
38
J.E.D.I
Implement the basic operations on stack (push, pop, etc) to make them applicable on
multiple-stack. Name the class MStack.
5. A book shop has bookshelves with adjustable dividers. When one divider becomes full,
the divider could be adjusted to make space. Create a Java program that will
reallocate bookshelf space using Garwick's Algorithm.
Data Structures
39
J.E.D.I
3 Queues
3.1 Objectives
At the end of the lesson, the student should be able to:
3.2 Introduction
A queue is a linearly ordered set of elements that has the discipline of First-In, First-Out.
Hence, it is also known as a FIFO list.
There are two basic operations in queues: (1) insertion at the rear, and (2) deletion at
the front.
To define the ADT queue in Java, we have the following interface:
interface Queue{
/* Insert an item */
void enqueue(Object item) throws QueueException;
/* Delete an item */
Object dequeue() throws QueueException;
Data Structures
40
J.E.D.I
41
J.E.D.I
Q = new Object[n];
Whenever deletion is made, there is space vacated at the front-side of the queue.
Hence, there is a need to move the items to make room at the rear-side for future
insertion. The method moveQueue implements this procedure. This could be invoked
when
void moveQueue() throws QueueException{
if (front==0) throw new
QueueException("Inserting into a full queue");
42
J.E.D.I
Data Structures
43
J.E.D.I
mod n;
Data Structures
44
J.E.D.I
Transitivity
Antisymmetry
Reflexivity
:
:
:
if x y and y z, then x z.
if x y and y x, the x = y.
x x.
Transitivity
Asymmetry
Irreflexivity
:
:
:
if x y and y z, then x z.
if x y then y x.
x x.
Data Structures
45
J.E.D.I
0,1
0,3
0,5
1,2
1,5
2,4
3,2
3,4
5,4
6,5
6,7
7,1
7,5
06371254
Input. A set of number pairs of the form (i, j) for each relation i j could represent
the partial ordering of elements. The input pairs could be in any order.
Output. The algorithm will come up with a linear sequence of items such that no item
appears in the sequence before its direct predecessor.
Algorithm Proper. A requirement in topological sorting is not to output the items with
which the predecessors are not yet in the output. To do this, there is a need to keep
track of the number of predecessors for every item. A vector could be used for this
purpose. Let's call this vector COUNT. When an item is placed in the output, the
count of every successor of the item is decremented. If the count of an item is zero, or
it becomes zero as a result of putting in the output all of its predecessors, that would
be the time it is considered ready for output. To keep track of the successors, a linked
list named SUC, with structure (INFO, LINK), will be used. INFO contains the label of
the direct successor while LINK points to the next successor, if any.
The following shows the definition of the Node:
class Node{
int info;
Node link;
}
The COUNT vector is initially set to 0 and the SUC vector to NULL. For every input pair
(i, j),
COUNT[j]++;
Data Structures
46
J.E.D.I
Data Structures
47
J.E.D.I
Figure 1.24
Figure 1.25
To generate the output, which is a linear ordering of the objects such that no object
appears in the sequence before its direct predecessors, we proceed as follows:
1. Look for an item, say k, with count of direct predecessors equal to zero, i.e.,
COUNT[k] == 0. Put k in the output.
2. Scan list of direct successors of k, and decrement the count of each such
successor by 1.
3. Repeat steps 1 and 2 until all items are in the output.
To avoid having to go through the COUNT vector repeatedly as we look for objects with a
count of zero, we will constitute all such objects into a linked queue. Initially, the
queue will consist of items with no direct predecessors (there will always be at least one
such item). Subsequently, each time that the count of direct predecessors of an item
drops to zero, it is inserted into the queue, ready for output. Since for each item, say j,
in the queue, the count is zero, we can now reuse COUNT[j] as a link field such that
COUNT[j] = k if k is the next item in the queue
= 0 if j is the rear element in the queue
Hence, we have an embedded linked queue in a sequential vector.
If the input to the algorithm is correct, i.e., if the input relations satisfy partial ordering,
then the algorithm terminates when the queue is empty with all n objects placed in the
output. If, on the other hand, partial ordering is violated such that there are objects
which constitute one or more loops (for instance, 12; 23; 34; 41), then the
algorithm still terminates, but objects comprising a loop will not be placed in the output.
This approach of topological sorting uses both sequential and linked allocation
techniques, and the use of a linked queue that is embedded in a sequential vector.
3.6 Summary
A queue is a linearly ordered set of elements obeying the first-in, first-out (FIFO)
principle
Two basic queue operations are insertion at the rear and deletion at the front
In circular queues, there is no need to move the elements to make room for
insertion
The topological sorting approach discussed uses both sequential and linked
allocation techniques, as well as a linked queue embedded in a sequential vector
Data Structures
48
J.E.D.I
Data Structures
start of partial
order definition
Sem 2
CS 2
CS 3
49
J.E.D.I
4 Binary Trees
4.1 Objectives
At the end of the lesson, the student should be able to:
Traverse binary trees using the three traversal algorithms: preorder, inorder,
postorder
Discuss binary tree traversal applications
4.2 Introduction
Binary tree is an abstract data type that is hierarchical in structure. It is a collection of
nodes which are either empty or consists of a root and two disjoint binary trees called
the left and the right subtrees. It is similar to a tree in the sense that there is the
concept of a root, leaves and branches. However, they differ in orientation since the root
of a binary tree is the topmost element, in contrast to the bottommost element in a real
tree.
Binary tree is most commonly used in searching, sorting, efficient encoding of strings,
priority queues, decision tables and symbol tables.
Data Structures
50
J.E.D.I
Data Structures
51
J.E.D.I
A binary tree could be empty. If a binary tree has either zero or two children, it is
classified as a proper binary tree. Thus, every proper binary tree has internal nodes
with two children.
The following figure shows different they different types of binary tree: (a) shows an
empty binary tree; (b) shows a binary tree with only one node, the root; (c) and (d)
show trees with no right and left sons respectively; (e) shows a left-skewed binary tree
while (f) shows a complete binary tree.
Data Structures
52
J.E.D.I
A right ( left ) skewed binary tree is a tree in which every node has no left
(right)subtrees. For a given number of nodes, a left or right-skewed binary tree has the
greatest depth.
A strictly binary tree is a tree in which every node has either two subtrees or none at
all.
Figure 1.31 Left and Right-Skewed Binary Trees
Figure 1.32
Data Structures
53
J.E.D.I
54
J.E.D.I
Figure 1.37
The following Java class definition implements the above representation:
class BTNode {
Object info;
BTNode left, right;
public BTNode(){
}
public BTNode(Object i) {
info = i;
}
55
J.E.D.I
root = n;
Data Structures
56
J.E.D.I
57
J.E.D.I
Data Structures
58
J.E.D.I
Data Structures
59
J.E.D.I
if (root != null){
newLeft = new BinaryTree(root.left).copy();
newRight = new BinaryTree(root.right).copy();
newRoot = new BTNode(root.info, newLeft, newRight);
return newRoot;
}
return null;
60
J.E.D.I
answer = (root.info.equals(t2.root.info));
if (answer) answer =
new BinaryTree(root.left).equivalent(
new BinaryTree(t2.root.left));
if (answer) answer =
new BinaryTree(root.right).equivalent(
new BinaryTree(t2.root.right));
}
return answer;
Trichotomy: for any two objects x and y in S, exactly one of these relations
holds: x > y, x = y or x < y.
Data Structures
61
J.E.D.I
algorithms
4.7.1 Sift-Up
A complete binary tree may be converted into a heap by applying a process called siftup. In the process, larger keys sift-up the tree to make it satisfy the heap-order
property. This is a bottom-up, right-to-left, process in which the smallest subtrees of a
complete binary tree are converted into heaps, then the subtrees which contain them,
and so on, until the entire binary tree is converted into a heap.
Note that when the subtree rooted at any node, say , is converted into a heap, its left
and right subtrees are already heaps. We call such a subtree an almost-heap. When an
almost heap is converted into a heap, one of its subtrees may cease to be a heap (i.e., it
may become an almost heap). This, however, may be converted to a heap and the
process continues as smaller and yet smaller subtrees lose and regain the heap property,
with LARGER keys migrating upwards.
Data Structures
62
J.E.D.I
63
J.E.D.I
i = child ;
child = 2 * i; /*Consider the left child again*/
}
else break;
}
key[i] = k ; /* this is where the root belongs */
The following example shows the execution of heapSort with input keys
Data Structures
64
J.E.D.I
algorithms
Data Structures
65
J.E.D.I
Data Structures
66
J.E.D.I
Data Structures
67
J.E.D.I
Data Structures
68
J.E.D.I
4.8 Summary
Data Structures
69
J.E.D.I
b)
c)
Data Structures
70
J.E.D.I
2. Heapsort. Arrange the following elements in ascending order. Show the state of the
tree at every step.
a) C G A H F E D J B I
b) 1 6 3 4 9 7 5 8 2 12 10 14
Operation
^
Description
Exponentiation (highest precedence)
*/
+-
Data Structures
71
J.E.D.I
5 Trees
5.1 Objectives
At the end of the lesson, the student should be able to:
Convert a forest into its binary tree representation and vice versa using
natural correspondence
Data Structures
72
J.E.D.I
Data Structures
73
J.E.D.I
Data Structures
74
J.E.D.I
5.4 Forests
When zero or more disjoint trees are taken together, they are known as a forest. The
following forest is an example:
If n > 0, then the root of B(F) is the root of T 1; the left subtree of B(F) is
B( T11, T12, ... T1m ) where T11, T12, ... T1m are the subtrees of the root of
Data Structures
75
J.E.D.I
Data Structures
76
J.E.D.I
77
J.E.D.I
Preorder Traversal
1. Visit the root of the first tree.
2. Traverse the subtrees of the first tree in preorder.
3. Traverse the remaining trees in preorder.
Postorder Traversal
1. Traverse the subtrees of the first tree in postorder.
2. Visit the root of the first tree.
3. Traverse the remaining trees in postorder.
:ABCDEFGHIKJLMN
:CDEFBAHKILJGNM
The binary tree equivalent of the forest will result to the following listing for preorder,
inorder and postorder
B(F) preorder
B(F) inorder
B(F) postorder
:ABCDEFGHIKJLMN
:CDEFBAHKILJGNM
:FEDCBKLJIHNMGA
Notice that forest postorder yields the same result as B(F) inorder. It
coincidence.
is not a
Data Structures
78
J.E.D.I
RLINK contains the node pointed to by the current node and LTAG has a value of 1 for
every ')' in the representation.
Since a terminal node always immediately precedes a node pointed to by an arrow
except the last node in the sequence, the use of data can be lessened by
(1)Eliminating LTAG; or
(2)Replacing RLINK with RTAG that simply identifies the nodes where an arrow emanates
Using the second option will need a stack to establish the relation between nodes since
the arrows have the last in, first out structure, and using it leads to the following
representation:
Data Structures
79
J.E.D.I
Having bit values for RTAG and LTAG, the last option clearly shows the least space
required for storage. However, it entails more computations in retrieving the forest.
5.4.3.2 Family-Order Sequential Representation
In this sequential representation, the family-order listing of elements is used in the
representation. In family-order traversal, the first family to be listed consists of the root
nodes of all trees in the forest and subsequently, the families are listed on a last-in
first-out basis. This representation makes use of LLINK and RTAG. LLINK is a pointer to
the leftmost son of a node or the left son in the tree's binary representation. RTAG
identifies the youngest brother in a brood or the last member of the family. The following
is the family-order sequential representation of the forest F:
Just like in preorder sequential representation, since an RTAG value always immediately
precedes an arrow except for the last node in the sequence, an alternative structure is to
replace LLINK with LTAG, which is set if an arrow emanates from it:
Data Structures
80
J.E.D.I
Notice that unlike preorder and family-order, arrows cross in this representation.
However, it can also be noticed that the first arrow to begin is also the first arrow to end.
Having the FIFO (first-in, first-out) structure, queue could be used to establish a relation
between nodes. Therefore, just like the previous methods, it could be represented as:
81
J.E.D.I
alpha.info = INFO[i];
alpha = beta;
Data Structures
return t;
82
J.E.D.I
The following are the different ways to represent the tree above.
5.5.1.1 Preorder Sequence with Degrees
INFO
DEGREE
1
3
2
3
5
1
11
0
6
2
12
0
13
0
7
0
3
1
8
0
4
2
9
1
14
2
15
0
16
0
10
0
7
0
3
1
8
0
4
5
9
3
14
2
15
0
16
0
10
0
1
15
Data Structures
2
6
5
1
11
0
6
2
12
0
13
0
83
J.E.D.I
11
0
5
1
12
0
13
0
6
2
7
0
2
3
8
0
3
1
15
0
16
0
14
2
9
1
10
0
4
2
1
3
2
6
8
0
3
1
15
0
16
0
14
2
9
3
10
0
4
5
1
15
7
0
8
0
9
1
10
0
11
0
12
0
13
0
14
2
15
0
16
0
9
3
10
0
11
0
12
0
13
0
14
2
15
0
16
0
11
0
5
1
12
0
13
0
6
2
7
0
1
3
2
3
3
1
4
2
5
1
6
2
1
15
2
6
3
1
4
5
5
1
6
2
7
0
8
0
84
J.E.D.I
For example, consider the set S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}. Suppose the
equivalence relations defined on the set are: 1 10, 9 12, 9 3, 6 10, 10 12, 2
5, 7 8, 4 11, 2 13 and 1 9. Now, the question, is 10 2? Is 6 12? To answer
these questions, we need to create the equivalence classes.
Input
Equivalence Classes
Remarks
1 10
C1 = {1, 10}
9 12
C2 = {9, 12}
93
C2 = {9, 12, 3}
Add 3 to C2
6 10
C1 = {1, 10, 6}
Add 6 to C1
25
C3 = {2, 5}
78
C4 = {7, 8}
4 11
C5 = {4, 11}
6 13
Add 13 to C2
19
No change
=
=
=
=
{1,
{2,
{7,
{4,
Data Structures
85
J.E.D.I
Forest
1 10
1
10
10
12
10 11 12 13
0
9 12
Data Structures
10 11 12 13
0
86
J.E.D.I
Input
Forest
93
1
10 11 12 13
10
12
10
11
12
13
10
10
12
10
11
12
13
10
10
12
10
11
12
13
10
10
12
6 10
10 12
25
Data Structures
87
J.E.D.I
Input
Forest
78
1
10
11
12
13
10
10
12
10
11
12
13
10
11
10
12
10
11
12
13
10
10
12
4 11
6 13
Data Structures
13 11
88
J.E.D.I
19
1
10
13 11
10
11
12
13
10
12
}
Data Structures
89
J.E.D.I
}
/* Accepts two elements j and k.
Returns true if equivalent, otherwise returns false */
boolean test(int j, int k){
/* Get the roots of j and k */
while (FATHER[j] > 0) j = FATHER[j];
while (FATHER[k] > 0) k = FATHER[k];
/* If they have the same root, they are equivalent */
if (j == k) return true;
}
Data Structures
90
J.E.D.I
In terms of equivalence classes, the two trees represent the same class. However the
second tree requires traversing only one branch from every node to reach the root while
it takes n-1 branches for the first tree.
To solve this problem, we will use a technique known as the weighting rule for union.
It is defined as follows:
Let node i and node j be roots. If the number of nodes in the tree rooted at node
i is greater than the number of nodes in the tree rooted at node j, then make
node i the father of node j; else, make node j the father of node i.
In the algorithm, A COUNT vector could be used to count the number of nodes in each
tree in the forest. However, if a node is the root of an equivalent class, its entry in the
FATHER vector does not matter anymore since a root node has no father. Taking
advantage of this, we may use that slot in the FATHER vector instead of using another
vector. To distinguish between counts and labels in FATHER, a minus sign is appended to
counts. The following method implements the weighting rule for union:
/* Implements the weighting rule for union */
void union(int i, int j){
The UNION operation has time complexity of O(1). If the weighting rule for union is not
applied, a sequence of O(n) union-find operations takes, in the worst case, O(n2).
Otherwise, if applied, the time complexity is O(n log2n).
Worst-Case Trees
Another observation in using trees for the equivalence problem is related to the worstcase trees generated even when weighting rule for union is applied. Consider the
Data Structures
91
J.E.D.I
following illustration:
Data Structures
92
J.E.D.I
93
J.E.D.I
/* Find root */
while (FATHER[k] > 0) k = FATHER[k];
The FIND operation is proportional to the length of the path from node i to its root.
Final Solution to the Equivalence Problem
The following code implements the final solution to the equivalence problem:
/* Generates equivalent classes based
on the equivalence pairs j,k */
void setEquivalence(int a[], int b[]){
int j, k;
for (int i=0; i<FATHER.length; i++) FATHER[i] = -1;
for (int i=0; i<a.length; i++){
/* Get the equivalence pair j,k */
j = a[i];
k = b[i];
/* Get the roots of j and k */
j = find(j);
k = find(k);
The following is the state of the equivalence classes after this final solution to the
Data Structures
94
J.E.D.I
Input
Forest
FATHER
1 10
1
10
11
12
13
10
10
12
10
12
12
10
11
12
13
10
12
10
12
10
11
12
13
12
12
12
12 12
9 12
10 11 12 13
0
9 3,
w/ weighting rule count(12) > count(3)
10 11 12 13
6 10
Data Structures
95
J.E.D.I
Input
Forest
FATHER
25
1
10
12
12
10
12
12
12
12
10
11
12
13
12 12
10
11
12
13
12 12
10
11
12
13
12 12
10
11
12
13
12 12
10
11
12
13
12 12
78
9
4 11
12 11
6 13
1
10
12
12 11
12
12
19
Data Structures
12 11
96
J.E.D.I
Figure 1.76 An Example Using Weighted Rule for Union and Collapsing Rule for Find
5.6 Summary
FOREST 1
FOREST 2
Data Structures
97
J.E.D.I
FOREST 3
a)
b)
c)
d)
2. Show the forest represented below using preorder sequential with weights:
a
3. Equivalence Classes.
a) Given the set S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and equivalence pairs: (1,4), (5,8),
(1,9),(5,6), (4,10), (6,9), (3,7) and (3,10), construct the equivalence classes using
forest. Is 7 6?Is 9 10?
b) Draw the corresponding forest and resulting father vector of the equivalent classes
generated from the elements of S = {1,2,3,4,5,6,7,8,9,10,11,12} on the basis of
the equivalent relations 12, 35, 57, 910, 1112, 25, 84, and 46. Use
weighting rule for union.
Data Structures
98
J.E.D.I
6 Graphs
6.1 Objectives
At the end of the lesson, the student should be able to:
Get the minimum cost spanning tree for undirected graphs using Prim's
algorithm and Kruskal's algorithms
6.2 Introduction
This lesson covers the nomenclature for the ADT graph. It also discusses different ways
to represent a graph. The two graph traversal algorithms are also covered, as well as the
minimum cost spanning tree problem and shortest path problems.
Data Structures
99
J.E.D.I
Data Structures
100
J.E.D.I
A complete directed graph is a graph in which every pair of vertices i and j are
connected by two edges <i,j> and <j,i>. There are n(n-1) edges in it.
Figure 1.82 Complete Directed Graph
Data Structures
101
J.E.D.I
V and E
E.
Data Structures
102
J.E.D.I
Data Structures
103
J.E.D.I
10
LIST
10
Data Structures
104
J.E.D.I
LIST
Data Structures
105
J.E.D.I
LIST
INFO NEXT
Data Structures
106
J.E.D.I
Data Structures
107
J.E.D.I
Data Structures
108
J.E.D.I
109
J.E.D.I
this lesson, we will cover algorithms that use the greedy approach. In this approach, a
sequence of opportunistic choices succeeds in finding a global optimum. To solve the
MST problem, we shall use Prim's and Kruskal's algorithms, which are both greedy
algorithms.
6.6.1.1 MST Theorem
Let G = (V,E) be a connected, weighted, undirected graph. Let U be some proper subset
of V and (u, v) be an edge of least cost such that u U and v (V U). There exists a
minimum cost spanning tree T such that (u, v) is an edge in T.
having a as the start vertex, the following shows the execution of Prim's algorithm to
solve the MST problem:
Data Structures
110
J.E.D.I
Data Structures
111
J.E.D.I
The following shows the execution of Kruskal's algorithm in solving the MST problem of
the graph above:
Data Structures
112
J.E.D.I
Edges
MST
(c, e) 1
(c, d) 4
(a, e) 5
(a, c) 6
(d, e) 8
(a, d) 10
(a, b) 11
(d, f) 12
(b, c) 13
(e, f) 20
V-U
Remarks
a, b, c, List of edges in
d, e, f
non-decreasing
order of weight
(c, e) 1 accept
(c, d) 4
(a, e) 5
(a, c) 6
(d, e) 8
(a, d) 10
(a, b) 11
(d, f) 12
(b, c) 13
(e, f) 20
(c,e) 1
c, e
a, b, d, c and e not in U
f
(c, d) 4 accept
(a, e) 5
(a, c) 6
(d, e) 8
(a, d) 10
(a, b) 11
(d, f) 12
(b, c) 13
(e, f) 20
(c, e) 1
(c,d) 4
c, d, e
a, b, f
d not in U
(a,
(a,
(d,
(a,
(a,
(d,
(b,
(e,
e) 5 accept
c) 6
e) 8
d) 10
b) 11
f) 12
c) 13
f) 20
(c, e) 1
(c, d) 4
(a, e) 5
a, c, d, e
b, f
a not in U
(a,
(d,
(a,
(a,
(d,
(b,
(e,
c) 6 reject
e) 8
d) 10
b) 11
f) 12
c) 13
f) 20
(c, e) 1
(c, d) 4
(a, e) 5
a, c, d, e
b, f
a and c are in U
Data Structures
113
J.E.D.I
Edges
MST
V-U
Remarks
(d,
(a,
(a,
(d,
(b,
(e,
e) 8 reject
d) 10
b) 11
f) 12
c) 13
f) 20
(c, e) 1
(c, d) 4
(a, e) 5
a, c, d, e
b, f
d and e are in U
(a,
(a,
(d,
(b,
(e,
d) 10 reject
b) 11
f) 12
c) 13
f) 20
(c, e) 1
(c, d) 4
(a, e) 5
a, c, d, e
b, f
a and d are in U
(a,
(d,
(b,
(e,
b) 11 accept
f) 12
c) 13
f) 20
(c, e) 1
(c, d) 4
(a, e) 5
(a,b) 11
a, b, c, d, f
e
b not in U
(d, f) 12 accept
(b, c) 13
(e, f) 20
(c, e) 1
(c, d) 4
(a, e) 5
(a,b) 11
(d,f) 12
a, b, c, d,
e, f
f not in U
Data Structures
114
J.E.D.I
Edges
MST
(b, c) 13
(e, f) 20
V-U
Remarks
All vertices
now in U
COST = 33
are
Data Structures
115
J.E.D.I
Single Source Shortest Paths (SSSP) Problem that determines the cost of the
shortest path from a source vertex u to a destination vertex v, where u and v are
elements of V.
All-Pairs Shortest Paths (APSP) Problem that determines the cost of the shortest
path from each vertex to every other vertex in V.
We will discuss the algorithm created by Dijkstra to solve the SSSP problem, and for the
APSP, we will use the algorithm developed by Floyd.
Let vertex u be the source vertex and vertex v be the destination vertex. Let pivot be
the vertex that is most recently considered to be a part of the path. Let path of a vertex
be its direct source in the shortest path. Now the algorithm:
1. Place vertex u in class 1 and all other vertices in class 2.
2. Set the value of vertex u to zero and the value of all other vertices to
infinity.
3. Do the following until vertex v is placed in class 1:
a. Define the pivot vertex as the vertex most recently placed in
class 1.
b. Adjust all class 2 nodes in the following way:
i. If a vertex is not connected to the pivot vertex, its value
remains the same.
ii. If a vertex is connected to the pivot vertex, replace its value by
the minimum of its current value or the value of the pivot
vertex plus the distance from the pivot vertex to the vertex in
class 2. Set its path to pivot.
c. Choose a class 2 vertex with minimal value and place it in class 1.
Data Structures
116
J.E.D.I
For example, given the following weighted, directed graph, find the shortest path from
vertex 1 to vertex 7.
pivot
pivot
pivot
value path
Description
1
2
3
4
5
6
7
1
2
2
2
2
2
2
0
0
0
0
0
0
0
1
2
3
4
5
6
7
1
2
2
2
2
2
2
0
4
3
0
0
0
0
0
0
0
1
2
3
4
5
6
7
1
2
1
2
2
2
2
0
4
3
7
0
1
1
3
0
0
0
1
2
3
1
1
1
0
4
3
0
1
1
Data Structures
117
J.E.D.I
Vertex class
pivot
pivot
pivot
pivot
value path
Description
4
5
6
7
2
2
2
2
7
5
16
3
2
2
0
1
2
3
4
5
6
7
1
1
1
2
1
2
2
0
4
3
7
5
7
11
0
1
1
3
2
5
5
1
2
3
4
5
6
7
1
1
1
1
1
2
2
0
4
3
7
5
7
11
0
1
1
3
2
5
5
1
2
3
4
5
6
7
1
1
1
1
1
1
2
0
4
3
7
5
7
8
0
1
1
3
2
5
6
1
2
3
4
5
6
7
1
1
1
1
1
1
1
0
4
3
7
5
7
8
0
1
1
3
2
5
6
The path from the source vertex 1 to destination vertex 7 can be obtained by retrieving
the value of path(7) in reverse order, that is,
path(7)
path(6)
path(5)
path(2)
=
=
=
=
6
5
2
1
Hence, the shortest path is 1 --> 2 --> 5 --> 6 --> 7, and the cost is value(7) = 8.
Data Structures
118
J.E.D.I
initially, indicates that the shortest path between i and j is the edge
(i,j) if it exists
=k
119
J.E.D.I
Also, for the kth iteration, there will be no changes for the kth row and column in A and
PATH since it will add only 0 to the current value. For example, if k = 2:
A(2, 1) = minimum( A(2, 1), A(2,2) + A(2,1) )
since A(2, 2) = 0, there will always be no change to kth row and column.
The following shows the execution of Floyd's algorithm:
1
12
10
PATH
3)
4)
2)
4)
2)
3)
=
=
=
=
=
=
k=1
minimum(
minimum(
minimum(
minimum(
minimum(
minimum(
A(2,
A(2,
A(3,
A(3,
A(4,
A(4,
3)
4)
2)
4)
2)
3)
,
,
,
,
,
,
A(2,
A(2,
A(3,
A(3,
A(4,
A(4,
1)
1)
1)
1)
1)
1)
12
+
+
+
+
+
+
A(1,
A(1,
A(1,
A(1,
A(1,
A(1,
3)
4)
2)
4)
2)
3)
)
)
)
)
)
)
=
=
=
=
=
=
minimum(, ) =
minimum(12, 20) = 12
minimum(10, 7) = 7
minimum(7, 17) = 7
minimum(, ) =
minimum(1, ) = 1
1
PATH
For k=2:
A(1,
A(1,
A(3,
A(3,
A(4,
A(4,
3)
4)
1)
4)
1)
3)
=
=
=
=
=
=
k=2
Data Structures
minimum(
minimum(
minimum(
minimum(
minimum(
minimum(
A(1,
A(1,
A(3,
A(3,
A(4,
A(4,
3)
4)
1)
4)
1)
3)
,
,
,
,
,
,
A(1,
A(1,
A(3,
A(3,
A(4,
A(4,
2)
2)
2)
2)
2)
2)
+
+
+
+
+
+
A(2,
A(2,
A(2,
A(2,
A(2,
A(2,
3)
4)
1)
4)
1)
3)
)
)
)
)
)
)
=
=
=
=
=
=
minimum(, ) =
minimum(12, 9) = 9
minimum(5, 15) = 5
minimum(7, 12) = 7
minimum(, ) =
minimum(1, ) =
1
120
J.E.D.I
PATH
For k=3:
A(1,
A(1,
A(2,
A(2,
A(4,
A(4,
2)
4)
1)
4)
1)
2)
=
=
=
=
=
=
k=3
minimum(
minimum(
minimum(
minimum(
minimum(
minimum(
A(1,
A(1,
A(2,
A(2,
A(4,
A(4,
2)
4)
1)
4)
1)
2)
,
,
,
,
,
,
A(1,
A(1,
A(2,
A(2,
A(4,
A(4,
3)
3)
3)
3)
3)
3)
+
+
+
+
+
+
A(3,
A(3,
A(3,
A(3,
A(3,
A(3,
2)
4)
1)
4)
1)
2)
)
)
)
)
)
)
=
=
=
=
=
=
minimum(2, )
minimum(9, )
minimum(8, )
minimum(7, )
minimum(, 6)
minimum(, 8)
=
=
=
=
=
=
6
8
PATH
For k=3:
A(1,
A(1,
A(2,
A(2,
A(3,
A(3,
2)
3)
1)
3)
1)
2)
=
=
=
=
=
=
k=4
minimum(
minimum(
minimum(
minimum(
minimum(
minimum(
A(1,
A(1,
A(2,
A(2,
A(3,
A(3,
2)
3)
1)
3)
1)
2)
,
,
,
,
,
,
A(1,
A(1,
A(2,
A(2,
A(3,
A(3,
4)
4)
4)
4)
4)
4)
10
+
+
+
+
+
+
A(4,
A(4,
A(4,
A(4,
A(4,
A(4,
2)
3)
1)
3)
1)
2)
)
)
)
)
)
)
=
=
=
=
=
=
minimum(2, 17) = 2
minimum(, 10) = 10
minimum(8, 13) = 8
minimum(, 8) = 8
minimum(5, 13) = 5
minimum(7, 15) = 7
1
PATH
After the nth iteration, A contains the minimum cost while PATH contains the path of the
minimum cost. To illustrate how to use the resulting matrices, let us find the shortest
path from vertex 1 to vertex 4:
A(1, 4) = 9
PATH(1, 4) = 2 --> Since not 0, we have to get PATH(2, 4):
PATH(2, 4) = 0
Therefore, the shortest path from vertex 1 to vertex 4 is 1 --> 2 --> 4 with cost 9. Even
if there is a direct edge from 1 to 4 (with cost 12), the algorithm returned another path.
This example shows that it is not always the direct connection that is returned in getting
Data Structures
121
J.E.D.I
6.8 Summary
b)
2. Find the minimum cost spanning tree of the following graphs using Kruskals
Data Structures
122
J.E.D.I
b)
3. Solve for the SSSP problem of the following graphs using Dijkstra's algorithm. Show
the value, class and path of the vertices for each iteration:
a)
123
J.E.D.I
b)
Data Structures
124
J.E.D.I
Show the output on the screen. The output consists of the path and its cost. The path
must be of the form:
source location 2 location n-1 destination
Sample Input File
(1,
(2,
(3,
.
.
(0,
(1,
(1,
(3,
.
.
(0,
Math Building)
Science Building)
Engineering)
0)
2, 10)
3, 5)
2, 2)
0, 0)
Data Structures
125
J.E.D.I
7 Lists
7.1 Objectives
At the end of the lesson, the student should be able to:
Differentiate singly-linked list, doubly-linked list, circular list and list with
header nodes
7.2 Introduction
List is a data structure that is based on sequences of items. In this lesson, we will cover
the two types of lists - linear and generalized lists - and their different representations.
Singly-linked, circular and doubly-linked lists will also be explored. Moreover, two
applications shall be covered polynomial arithmetic and dynamic memory allocation.
Polynomial arithmetic includes representation of polynomials and arithmetic operations
defined on them. Dynamic memory allocation covers pointers and allocation strategies.
There will also be a short discussion on the concepts of fragmentation.
Data Structures
126
J.E.D.I
position. Similarly, any element may be deleted from any position. The following are the
operations that can be done on linear lists:
Duplicating a list
Erasing a list
Data Structures
127
J.E.D.I
Data Structures
128
J.E.D.I
The following table summarizes the time complexities of the operations on lists with the
two types of allocation:
Operation
Sequential
Representation
Linked
Representation
O(1)
O(1)
O(1)
O(n)
O(1)
O(n)
O(1)
O(n)
O(n)
O(n)
O(n)
O(1)
th
th
Sequential representation is appropriate for lists that are static in nature. If the size is
unknown beforehand, the use of link allocation is recommended.
In addition to singly-linked linear, there are more varieties of linked representation of
lists. Singly-linked circular, doubly-linked and list with header nodes are the most
common of these varieties.
Data Structures
129
J.E.D.I
Take note the the pointer to the list in this representation, points at the last elment in
the list. With circular list, there is the advantage of being able to access a node from any
other node.
Circular list could be used to implement a stack. In such a case, insertion (push) could
be done at the left end of the list, and deletion (pop) at the same end. Similarly, queue
could also be implemented by allowing insertion at the right end of the list and deletion
at the left end. The following code snippets show the three procedures:
/* Inserts element x at the left end of circular list L */
void insertLeft(Object x){
Node alpha = new Node(x);
if (L == null){
alpha.link = alpha;
L = alpha;
}
else{
alpha.link = L.link;
L.link = alpha;
}
}
In insertLeft, if the list is initially empty, it will result to the following circular list:
Data Structures
130
J.E.D.I
Data Structures
131
J.E.D.I
and
Data Structures
132
J.E.D.I
Data Structures
133
J.E.D.I
Data Structures
134
J.E.D.I
A way to represent the terms of a polynomial such that the entities comprising each
term can be accessed and processed easily.
To address these issues, singly-linked circular list with list head could be used, with a
node structure as illustrated below:
Data Structures
135
J.E.D.I
In this application, there is a rule that nodes must be arranged in decreasing value of the
triple (exeyez). A polynomial satisfying this property is said to be in canonical form. This
rule makes the performance of executing the polynomial arithmetic operations faster as
compared to having the terms arranged in no particular order.
Since the list structure has a list head, to represent the zero polynomial, we have the
following:
/* Creates a
PolyTerm(int
expo =
coef =
link =
}
Data Structures
136
J.E.D.I
class Polynomial{
PolyTerm head = new PolyTerm(); /* list head */
Polynomial(){
head.link = head;
}
To insert terms in canonical form, the following is a method of the class Polynomial:
/* Inserts a term to [this] polynomial by inserting
in its proper location to maintain canonical form */
void insertTerm(PolyTerm p){
PolyTerm alpha = head.link; /* roving pointer */
PolyTerm beta = head;
if (alpha == head){
head.link = p;
p.link = head;
return;
}
else{
while (true){
/* If the current term is less than alpha or
is the least in the polynomial, then insert */
if ((alpha.expo < p.expo) || (alpha == head)){
p.link = alpha;
beta.link = p;
return;
}
/* Advance alpha and beta */
alpha = alpha.link;
beta = beta.link;
Data Structures
137
J.E.D.I
EXPO () = EXPO ()
If EXPO () < 0, both pointers and have come full circle and are
now pointing to the list heads
Action: Terminate the procedure
Data Structures
138
J.E.D.I
Expo() = Expo( )
adding
and :
Expo() = Expo( )
adding
between and :
Data Structures
139
J.E.D.I
and :
into Q:
into Q:
Since both P and Q are in canonical form, one pass is sufficient. If the operands are not
in canonical form, the procedure will not yield the correct result. If P has m terms and Q
has n terms, the time complexity of the algorithm is O(m+n).
With this algorithm, there is no need for special handling of the zero polynomial. It works
with zero P and/or Q. However, since the sum is retained in Q, it has to be duplicated if
the need to use Q after the addition will arise. It could be done by calling the method
add(Q, P) of class Polynomial, where P is initially the zero polynomial and will contain
the duplicate of Q.
The following is the Java implementation of this procedure:
140
J.E.D.I
141
J.E.D.I
/* Restore P */
while (alpha.expo != -1) {
alpha = alpha.link;
alpha.coef = - alpha.coef;
}
142
J.E.D.I
R.add(T);
beta = beta.link;
}
alpha = alpha.link;
}
this.head = R.head; /* Make [this] polynomial be R */
}
/* Performs addition of exponents of the triple(x,y,z)
Auxillary method used by multiply */
int expoAdd(int expo1, int expo2){
int ex = expo1/100 + expo2/100;
int ey = expo1%100/10 + expo2%100/10;
int ez = expo1%10 + expo2%10;
return (ex * 100 + ey * 10 + ez);
}
143
J.E.D.I
To satisfy a need for n words, the avail list is scanned for blocks that meet a fitting
criterion:
first fit the first block with m n words
best fit the best-fitting block, i.e. the smallest block with mn words
worst fit the largest block
After finding a block, n words of it are reserved and the remaining m-n words are kept in
the avail list. However, if the remaining m-n words are too small to satisfy any request,
we may opt to allocate the entire block.
The above approach is simple but suffers from two problems. First, it returns to the avail
list whatever is left on the block after reservation. It leads to long searches and a lot of
unusable free space scattered in the avail list. Second, the search always begins at the
head of the avail list. Hence, small blocks left at the leading portion of the list results to
long searches.
To solve the first problem, we could allocate the entire block if what will remain is too
small to satisfy any request. We could define the minimum value as minsize. Using this
solution, there is a need to store the size of the block since the reserved size may not
tally with the actual size of the allocated block. This could be done by adding a size field
to a reserved block, which will be used during liberation.
To solve the second problem, we could keep track of the end of the previous search, and
start the next search at the said block, i.e. if we ended at block A, we could start the
next search at LINK(A). A roving pointer, say rover, is needed to keep track of this
block. The following method implements these solutions.
Shorter searches on the avail list makes the second approach perform better than the
first. The latter approach is what we will use for the first fit method of sequential fit
reservation.
For example, given the state of the memory pool with a size of 64K as illustrated below,
Data Structures
144
J.E.D.I
Data Structures
Task
Request
2K
8K
9K
5K
7K
145
J.E.D.I
(A)
(A)
(A)
146
J.E.D.I
reserved
block
(B)
freed
block
(C)
reserved
block
reserved
block
(B)
free
block
(C)
free
block
free
block
(B)
free
block
(C)
reserved
block
free
block
(B)
free
block
(C)
free
block
(a)
(b)
(c)
(d)
To know if a freed block is adjacent to any of the free blocks in the avail list, we use the
block size. To collapse two blocks, the SIZE field of the lower block, address-wise, is
simply updated to contain the sum of the sizes of the combined blocks.
For example,
Data Structures
147
J.E.D.I
Data Structures
148
J.E.D.I
Data Structures
149
J.E.D.I
Data Structures
150
J.E.D.I
Data Structures
151
J.E.D.I
7.6.4.2 Reservation
Given a block with size 2k, the following is the algorithm for reserving a block for a
request for n words:
1. If the current blocks size < n:
If the current block is the largest size, return: No sufficiently large block is
available.
Else, go to 2.
2. If the blocks size is the smallest multiple of 2 n, then reserve the block for the
requesting task. Return.
Else go to 3.
3. Divide the block into two parts. These two are called buddies. Go to 2, having the
upper half of the newly cut buddies as the current block.
For example, reserve space for the requests A (7K), B (3K), C (16K), D (8K), and E
Data Structures
152
J.E.D.I
Data Structures
153
J.E.D.I
Data Structures
154
J.E.D.I
Data Structures
155
J.E.D.I
Data Structures
156
J.E.D.I
Liberation
When a block is freed by a task and if the buddy of the block being freed is free,
collapsing of the buddies is needed. When the newly collapsed blocks buddy is also free,
collapsing is performed again. This is done repeatedly until no buddies can be collapsed
anymore.
Locating the buddy is a crucial step in the liberation operation and is done by
computation:
Let (k:) = address of the buddy of the block of size 2k at address
(k:) = + 2k
(k:) = - 2k
if
mod 2k+1 = 0
otherwise
If the located buddy happens to be free, it can be collapsed with the newly-freed block.
For the buddy-system method to be efficient, there is a need to maintain one avail list
for each allowable size. The following is the algorithm for reservation using binary buddy
system method:
1. If a request for n words is made and the avail list for blocks of size 2 k, where k =
log2n, is not empty, then we get a block from the avail list. Otherwise, go to 2.
2. Get a block from the avail list of size 2 p where p is the smallest integer greater than k
for which the list is not empty.
3. Split the block p-k times, inserting unused blocks in their respective avail lists.
Using the previous allocation as our example, liberate the blocks with reservation B (3K),
D (8K), and A (7K) in the order given.
Data Structures
157
J.E.D.I
Data Structures
158
J.E.D.I
In the implementation, the following node structure, a doubly-linked list, will be used for
the avail lists:
LLINK
TAG
KVAL
RLINK
TAG = 0
1
KVAL = k
if free block
if reserved block
if block has size 2k
To initialize the memory pool for the buddy-system method, assume that it is of size 2m.
There is a need to maintain m+1 lists. The pointers avail(0:m) to the lists are stored in
an array of size m+1.
7.7 Summary
A list is a finite set of zero or more elements that may be atoms or lists
Both DMA techniques sequential fit and buddy-system may suffer from
internal or external fragmentation
Data Structures
159
J.E.D.I
Data Structures
160
J.E.D.I
4. Using binary buddy-system method and given an empty memory pool of size 128K,
reserve space for the following requests:
Task
Request
30K
21K
13K
7K
14K
Show the state of the memory pool after each allocation. There is no need to show the
avail lists. Liberate C, E and D in the order given. Perform merging when necessary.
Show how the buddies are obtained.
finding the
a) a singly-linked list
b) a circular-list
c) a doubly-linked list
Data Structures
161
J.E.D.I
8 Tables
8.1 Objectives
At the end of the lesson, the student should be able to:
Discuss the basic concepts and definitions on tables: keys, operations and
implementation
8.2 Introduction
One of the most common operation in the problem-solving process is searching. It refers
to the problem of finding data that is stored somewhere in the memory of a computer.
Some information identifying the desired data is supplied to get the desired result. Table
is the most common structure used to store data to search.
Ki
Kn-1
DATA
X0
X1
Xi
Xn-1
In the table above, n records are stored. K i is the key at position i, while X i is the
associated data. The notation used for a record is (Ki, Xi).
The class definition for table in Java is
class Table{
int key[];
int data[];
int size;
Data Structures
162
J.E.D.I
8.3.2 Operations
Aside from searching, several other operations can be done on a table. The following is
a list of the possible operations:
Searching for the record where Ki = K, where K is given by the user
Insertion
Deletion
Search the record with smallest (largest) key
Given key Ki, find the record with the next larger (smaller) key
And so on
8.3.3 Implementation
A table could be implemented using sequential allocation, linked allocation or a
combination of both. In implementing the ADT tree, there are several factors to consider:
Size of the key space Uk, i.e., the number of possible keys
Nature of the table: dynamic or static
Type and mix of operations performed on the table
If the key space is fixed, say m, where m is not so large, then the table can simply be
implemented as an array of m cells. With this, every key in the set is assigned a slot in
the table. If the key is the same as the index in the array, it is known as direct-address
table.
8.3.3.1 Implementation Factors
In implementing a direct-addressing table, the following things must be considered:
Data Structures
163
J.E.D.I
Since the indexes identify records uniquely, there is no need to store the key K i
explicitly.
The data could be stored somewhere else if there is not enough room for the data Xi
with key Ki, using a structure external to the table. A pointer to the actual data is then
stored as Xi. In this case, the table serves as an index to the actual data.
There is need to indicate unused cells corresponding to unused keys.
8.3.3.2 Advantages
With direct-address tables, searching is eliminated since cell Xi contains the data or a
pointer to the data corresponding to key Ki. Also, insertion and deletion operations are
relatively straightforward.
1
2
KEY
K0
K1
Ki
Kn
DATA
X0
X1
Xi
Xn
The algorithm:
Given: A table of records R0, R1, ..., Rn-1 with keys K0, K1, ... Kn-1 respectively, where n
Data Structures
164
J.E.D.I
return -1;
/* unsuccessful search */
Sequential search takes n comparisons in the worst case, hence a time complexity of
O(n). This algorithm works well when the table is relatively small or is barely searched.
The good thing about this algorithm is that it works even if the table is unordered.
Pointer
ID No
Name
Birthdate
Course
12345
01/23/87
BSCS
23456
10/14/85
BSIE
34567
12/07/86
BSCS
45678
09/18/85
BSCE
56789
06/17/86
BS Biology
67890
04/21/84
BSBA
78901
08/22/87
BSME
With this algorithm, the search time for a particular item is reduced. Also, an index could
be used to point to a sorted table implemented as an array or as a linked list. The latter
Data Structures
165
J.E.D.I
implementation implies a larger space overhead for pointers but insertions and deletions
can be performed immediately.
8.4.3.2 Binary Search
Binary search begins with an interval covering the whole table, that is the middle value.
If the search value is less than the item in the middle of the interval, the interval is
narrowed down to the lower half. Otherwise, it is narrowed down to the upper half. This
process of reducing the search size by half is repeatedly performed until the value is
found or the interval is empty. The algorithm for the binary search makes use of the
following relations in searching for the key K:
The Algorithm
/* Returns the index of key k if found, otherwise -1 */
int binarySearch(int k, Table t){
int lower = 0;
int upper = t.size - 1;
int middle;
while (lower < upper){
/* get middle */
middle = (int) Math.floor((lower + upper) / 2);
/* successful search */
if (k == t.key[middle]) return middle;
/* discard lower half */
else if (k > t.key[middle]) lower = middle + 1;
/* discard upper half */
else upper = upper - 1;
}
}
return(-1);
/* unsuccessful search */
12345
23456
34567
45678
56789
67890
78901
166
J.E.D.I
Data Structures
,i2
167
J.E.D.I
This algorithm finds the element from index 1 to n, and since indexing in Java starts at
0, there is a need to handle the case where k = key[0].
For example, search for key k = 34567:
0
12345
23456
34567
45678
56789
67890
78901
0 1 1 2 3 5 8 13 F
01234567 i
i = 5; Fi = 5; (Assumption) table size = Fi+1 - 1 = 7
j = 5, p = 3, q = 2: k < key[j]
j = 3, p = 2, q = 1: k < key[j]
j = 2, p = 1, q = 1: k = key[j] Successful
Another example, search for key = 15:
0
Data Structures
10
11
12
13
14
15
16
17
18
19
10 11 12 13 14 15 16 17 18 19
168
J.E.D.I
Data Structures
if (p == 1)
return -1;
else{
j = j + q;
p = p - q;
q = q - p;
}
/* unsuccessful search */
/* adjust i, p and q */
169
J.E.D.I
8.5 Summary
Two types of keys are internal or embedded key, and external key
2. D in
3. T in
Data Structures
170
J.E.D.I
9.2 Introduction
Another application of the ADT binary tree is searching and sorting. By enforcing certain
rules on the values of the elements stored in a binary tree, it could be used to search
and sort. Consider the sample binary tree shown below:
171
J.E.D.I
it does not store multiple values. Also, the node structure BSTNode(left, info, right), as
illustrated below, is used:
The binary search tree used in this lesson has a list head as illustrated in the following
figure:
Data Structures
172
J.E.D.I
9.3.1 Searching
In searching for a value, say k, three conditions are possible:
The same process is repeated until a match is found or the farthest leaf is reached but no
match is found. In the case of the latter, the search is unsuccessful.
The following is the Java implementation of the above algorithm:
/* Searches for k, returns the node containing k if found */
BSTNode search(int k){
BSTNode p = bstHead.right; /* the root node */
/* If the tree is empty, return null */
if (p == bstHead) return null;
/* Compare */
while (true){
if (k == p.info) return p;
/* successful search */
else if (k < p.info)
/* go left */
if (p.left != null) p = p.left;
else return null;
/* not found */
else
/* go right */
if (p.right != null) p = p.right;
else return null;
/* not found */
}
9.3.2 Insertion
In inserting a value, searching is performed in finding its proper location in the tree.
However, if during the search process the key is found, the insertion will not be
performed. The following is the algorithm:
Data Structures
173
J.E.D.I
1. Start the search at the root node. Declare a node p and make it point to the root
2. Do the comparison:
if (k == p.info) return false
// if key found, insertion not allowed
else if (k < p.info) p = p.left
// go left
else p = p.right
// if (k > p.info) go right
3. Insert the node (p now points to the new parent of node to insert):
newNode.info = k
newNode.left = null
newNode.right = null
if (k < p.info) p.left = newNode
else p.right = newNode
In Java,
/* Inserts k into the binary search tree */
boolean insert(int k){
BSTNode p = bstHead.right; /* the root node */
BSTNode newNode = new BSTNode();
/* If the tree is empty, make the new node the root */
if (p == bstHead){
newNode.info = k;
bstHead.right = newNode;
return true;
}
/* Find the right place to insert k */
while (true){
if (k == p.info)
/* key already exists */
return false;
else if (k < p.info)
/* go left */
if (p.left != null) p = p.left;
else break;
else
/* go right */
if (p.right != null) p = p.right;
else break;
}
9.3.3 Deletion
Deleting a key from the binary search tree is a bit more complex than the other two
operations just discussed. The operation starts by finding the key to delete. If not found,
the algorithm simply returns to tell that deletion fails. If the search returns a node, then
there is a need to delete the node that contains the key that we searched for.
However, deletion is not as simple as removing the node found since it has a parent
pointing to it. It is also a big possibility that it is the parent of some other nodes in the
binary search tree. In this case, there is a need to have its children adopted by some
other node, and the pointers pointing to it should also be adjusted. And in the process of
Data Structures
174
J.E.D.I
reassigning pointers, the BST property of the order of the key values has to be
maintained.
There are two general cases to consider in deleting a node d:
1. Node d is an external node(leaf):
Action: update the child pointer of the parent p:
if d is a left child, set p.left = null
otherwise, set p.right = null
Data Structures
175
J.E.D.I
Data Structures
176
J.E.D.I
/* Go right */
if (delNode.right != null){
toLeft = false;
parent = delNode;
delNode = delNode.right;
}
else return false;
/* not found */
177
J.E.D.I
inPre.right = delNode.right;
178
J.E.D.I
}
}
inSuc.right = delNode.right;
inSuc.left = delNode.left;
Data Structures
179
J.E.D.I
One of the most commonly used balanced BST is the AVL tree. It was created by G.
Adelson-Velskii and E. Landis, hence the name AVL. An AVL tree is a height-balanced
tree wherein the the height difference between the subtrees of a node, for every node in
the tree, is at most 1. Consider the binary search trees below where the nodes are
labeled with the balance factors:
Data Structures
180
J.E.D.I
Simple right rotation (RR) used when the new item C is in the left subtree of the left
child B of the nearest ancestor A with balance factor +2
Simple left rotation (LR) used when the new item C is in the right subtree of the
right child B of the nearest ancestor A with balance factor -2
Figure 1.154 Simple Left Rotation
Data Structures
181
J.E.D.I
Figure 1.155
Left-right rotation (LRR) used when the new item C is in the right subtree of the left
child B of the nearest ancestor A with balance factor +2
Right-left rotation (RLR) used when the new item C is in the left subtree of the right
child B of the nearest ancestor A with balance factor -2
Insertion in an AVL tree is the same as in BST. However, the resulting balance has to be
Data Structures
182
J.E.D.I
Data Structures
10
11
183
J.E.D.I
9.5 Summary
A Binary Search Tree is a binary tree that satisfies the BST property
The BST search algorithm runs in O(log2 n) time on the average and O(n)
time on the worst case
Data Structures
184
J.E.D.I
Data Structures
185
J.E.D.I
collision resolution
techniques
10.2 Introduction
Hashing is the application of a mathematical function (called a hash function) to the
key values that results in mapping the possible range of key values into a smaller range
of relative addresses. The hash function is something like a locked box that needs a key
to get the output which, in this case, is the address where the key is stored:
key ====>
Hash Function
H(k)
===> address
With hashing, there is no obvious connection between the key and the address generated
since the function "randomly selects" an address for a specific key value, without regard
to the physical sequence of the records in the file. Hence, it is also known as
randomizing scheme.
Two or more input keys, say k1 and k2, when applied to a hash function, may hash to the
same address, an accident known as collision. Collision can be reduced by allocating
more file space than the minimum required to store the number of keys. However, this
approach leads to wasted space. There are several ways to handle collisions, which will
be discussed later.
In hashing, there is a need to choose a good hash function and consequently, select a
method to resolve, if not eliminate, collisions. A good hash function performs fast
computation, in time complexity of O(1), and produces less (or no) collisions.
Data Structures
186
J.E.D.I
Key Value k
125
234
845
431
444
947
11
256
981
345
792
12
745
459
902
725
10
569
10
652
254
421
382
458
10.3.2 Folding
Another simple hashing technique is folding. In this technique, the key value is split into
Data Structures
187
J.E.D.I
two or more parts and then added, ANDed, or XORed to get a hash address. If the
resulting address has more digits than the highest address in the file, the excess highorder digits are truncated.
There are different ways of folding. A key value can be folded in half. This is ideal for
relatively small key values since they would easily fit in the available addresses. If in any
case, the key would be unevenly split, the left fold must be greater than the right fold. A
key value can also be folded in thirds this is ideal for somewhat large key values. We
can also fold alternate digits. The digits in the odd positions form one part, and the
digits in the even positions form another. Folding in half and in thirds can be done in yet
another two ways. One is boundary folding where some parts of the folded keys are
reversed (imitating the way we fold paper) and then summed. Last is shift folding
where no parts of the folded keys are reversed.
The following are some examples of shift folding:
1. Even digits, folding in half
125758 => 125+758 => 883
2. Folding in thirds
125758 => 12+57+58 => 127
3. Odd digits , folding in half
7453212 => 7453+212 => 7665
4. Different digits, folding in thirds
74532123 => 745+32+123 => 900
5. Using XOR, folding in half
100101110 => 10010+1110 => 11100
6. Alternate digits
125758 => 155+278 => 433
The following are some examples of boundary folding:
1. Even digits, folding in half
125758 => 125+857 => 982
2. Folding in thirds
125758 => 21+57+85 => 163
3. Odd digits , folding in half
7453212 => 7453+212 => 7665
4. Different digits, folding in thirds
74532123 => 547+32+321 => 900
5. Using XOR, folding in half
100100110 => 10010+0110 => 10100
6. Alternate digits
125758 => 155+872 => 1027
Data Structures
188
J.E.D.I
This method is useful for converting keys with large number of digits to a smaller
number of digits so the address fits into a word memory. It is also easier to store since
the keys do not require much space to be stored.
spread out records, i.e., finding a hashing algorithm that distributes the records
fairly randomly among the available addresses. However, it is hard to find a
hashing algorithm that distributes the records evenly.
use extra memory, if there are memory addresses to distribute the records into, it
is easier to find a hashing algorithm than if we have almost equal number of
addresses and records. One advantage is that the records are spread out evenly,
eventually reducing collisions. However, this method wastes space.
use buckets, i.e., put more than one record at a single address.
There are several collision resolution techniques and in this section we will cover
chaining, use of buckets and open addressing.
10.4.1 Chaining
In chaining, m linked lists are maintained, one for each possible address in the hash
table. Using chaining to resolve collision in storing the example in Prime Number Division
Method of hashing:
Key Value k
Key Value k
125
234
845
431
444
947
11
256
981
345
792
12
745
459
902
725
10
569
10
652
254
421
382
458
Data Structures
189
J.E.D.I
KEY
LINK
KEY
LINK
444
458
745
459
902
382
981
345
125
256
10
10
569
11
11
947
12
12
792
Initial Table
845
234
431
652
421
254
725
After Insertions
Keys 845 and 234 both hash to address 0 so they are linked together in the address. The
case is the same for addresses 2, 4, 5, 7 and 10, while the rest of the addresses have no
collision. The chaining method resolved the collision by providing additional link nodes to
each of the values.
KEY2
845
234
444
431
458
745
459
902
382
981
345
125
256
10
569
11
947
KEY3
Data Structures
652
421
254
725
190
J.E.D.I
KEY1
KEY2
KEY3
792
12
Collision is redefined in this approach. It happens when a bucket overflows that is,
when an insertion is attempted on a full bucket. Hence, there is a significant reduction in
the number of collisions. However, this method wastes some space and is not free from
overflowing further, in which case an overflow policy must be invoked. In the above
example, there are three slots in each address. Being static in size, problem arises when
more than three values hash to a single address.
Key Value k
125
234
845
431
444
947
11
256
981
345
792
12
745
459
Data Structures
191
J.E.D.I
Key Value k
Key Value k
902
725
10
569
10
652
254
421
382
458
KEY2
845
234
444
431
652
458
745
459
902
382
981
421
345
254
125
256
10
569
11
947
12
792
0
1
725
In this technique, key 642 hashed to address 2 but it is already full. Probing for the next
available space led to address 3 where key is safely stored. Later in the insertion
process, the key 458 hash to address 3 and it is stored on the second slot in the address.
With key 421 that hashed to a full address 5, the next available space is at address 6,
where the key is stored.
This approach resolves the problem of overflowing in bucket addressing. Also, probing
for available space makes the storage of overflowed keys near its home address in most
cases. However, this method suffers from displacement problem where the keys that
rightfully owns an address may be displaced by other keys that just probed to the said
address. Also, probing an full hash table would entail a time complexity of O(n).
10.4.3.2 Double Hashing
Double hashing makes use of a second hash function, say h2(k), whenever there's a
collision. The record is initially hashed to an address using the primary function. If the
hashed address is not available, a second hash function is applied and added to the first
hashed value, and the colliding key is hashed to the new address if there is available
space. If there is none, the process is repeated. The following is the algorithm:
Data Structures
192
J.E.D.I
1. Use the primary hash function h1(k) to determine the position i at which to
place the value.
2. If there is a collision, use the rehash function r h(i, k) successively until an
empty slot is found:
rh(i, k) = ( i + h2(k)) mod m
where m is the number of addresses
Using the second hash function h2(k)= k mod 11 in storing the following keys:
Key Value k
Key Value k
125
234
845
431
444
947
11
256
981
345
792
12
745
459
902
725
10
569
10
652
254
421
382
458
For keys 125, 845, 444, 256, 345, 745, 902, 569, 254, 382, 234, 431, 947, 981, 792,
459 and 725, storage is pretty straightforward no overflow happened.
KEY1
KEY2
845
234
444
431
745
459
902
382
981
345
125
256
10
569
11
947
12
792
0
1
2
3
Data Structures
254
725
193
J.E.D.I
Inserting 652 into the hash table results to an overflow at address 2 so we rehash:
h2(652) = 652 mod 11 = 3
rh(2, 652) = (2 + 3) mod 13 = 5,
but address 5 is already full so rehash:
rh(5, 652) = (5 + 3) mod 13 = 8, with space - so store here.
KEY1
KEY2
845
234
444
431
745
459
902
382
981
345
254
125
652
256
10
569
11
947
12
792
0
1
2
3
725
KEY1
KEY2
845
234
444
431
458
745
459
902
382
0
1
Data Structures
194
J.E.D.I
KEY1
KEY2
981
345
254
125
652
256
10
569
725
11
947
421
12
792
Data Structures
ADDRESS
00
10
11
195
J.E.D.I
Data Structures
196
J.E.D.I
Data Structures
197
J.E.D.I
The example shows the original address space consisting of four addresses. An overflow
at address 4 results to its expansion into using two digits 40 and 41. There is also an
overflow at address 2, so it is expanded into 20 and 21. An overflow at address 41
resulted into using three digits 410 and 411.
10.6 Summary
Hashing maps the possible range of key values into a smaller range of relative
addresses.
Simple hash techniques include the prime number division method and folding.
Chaining makes use of m linked lists, one for each possible address in the hash
table.
In the use of buckets method, collision happens when a bucket overflows.
Linear probing and double hashing are two open addressing techniques.
Dynamic hash tables are useful if data to be stored is dynamic in nature. Two
dynamic hashing techniques are extendible hashing and dynamic hashing.
Data Structures
198
J.E.D.I
21453
22414
25411
45324
13541
21534
54231
41254
25411
a) What should be the value of n if Prime Number Division method of hashing is used?
b) With n from (a), hash the keys into a hash table with size n and addresses 0 to n1. In case of collision, use linear probing.
c) Using boundary folding in half that results to 3 digit addresses, what are the hash
values?
2. Using Extendible Hashing, store the keys below in a hash table in the order given. Use
the leftmost digits first. Use more digits when necessary. Start with a hash table of
size 2. Show the table every after extension.
Key
Banana
Melon
Raspberry
Kiwi
Orange
Apple
Hash Value
2
5
1
6
7
0
Binary Equivalent
010
101
001
110
111
000
Data Structures
199
J.E.D.I
Appendix A: Bibliography
Aho, Alfred V.; Ullman, Jeffrey D.; Hopcroft, John E. Data Structures and Algorithms.
Addison Wesley 1983
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald
Introduction to Algorithms. McGraw-Hill, 2001.
Drozdek, Adam. Data Structures and Algorithms in Java. Australia: Brooks Cole. 2001.
Goodrich, Michael; Tamassia, Roberto. Data Structures and Algorithms in Java 2nd
Edition. John Wiley. 1997.
Knuth, Donald E. The Art of Computer Programming, Volumes 1-3. Addison-Wesley
Professional, 1998.
Preiss, Bruno R. Data Structures and Algorithms: With Object-Oriented Design Patterns
in Java. John Wiley. 2000.
Sedgewick, Robert. Algorithms in C. Reading, Mass: Addison-Wesley. 1990.
Standish, Thomas A. Data Structures in Java. Addison-Wesley. 1998
Data Structures
200