Data Structures and Alg2 - 2021
Data Structures and Alg2 - 2021
Sciences, GCUC
Data Structures and Algorithms II
Aps. Nixon Adu-Boahen 2023
Course outline
Introduction
Graph
• Definitions
• Examples of Graph
• Graph ADT
• Graph Search
Trees
• Binary Trees
• B-Tree
Searching algorithms
• Linear Search
• Binary Search
Sorting Algorithms
• Bubble Sort
• Insertion Sort
• Selection Sort
• Quick Sort
2
Page
4
Page
Algorithms
Generally an algorithm can be thought of as, a sequence of steps that solves a given problem.
In computation, an algorithm is any well-defined computational procedure that takes some
value, or set of values, as input and produces some value or a set of values, as output. Thus
it is a sequence of computational steps that transform input into the output. The steps may be
precise instructions that tell the one looking for the solution to the problem at hand what to
do.
Example:
Problem: How do I save a new document I have created in word 2007?
Solution (Algorithm) (to the problem of saving in word 2007 on document creation)
Step1: Click on the office button
Step2: Click on the save or save as menu item
Step3: Type the name of your file in the textbox provided on the Save As dialog box with the
description Name
Step4: Choose where to save your document by dropping down the combo box with the
description Location or in the address bar – optional
Step5: Click on the Save button
Step5: Done
5
Most at times algorithms can be used as a tool for solving computational problems (well-
Page
Output: A permutation (reordering) (a’1, a’2, …, a’n) of the input sequence such that a’1 ≤ a’2
≤ …≤ a’n.
For example, given the input sequence (31, 45, 68, 25, 45, 58), a sorting algorithm returns as
output the sequence (25, 31, 45, 45, 58, 65). Such an input sequence is called an instance of
the sorting problem. Generally, an instance of a problem consists of the input (satisfying
whatever constraints are imposed in the problem statement) needed to compute a solution to
the problem.
An algorithm is said to be correct if, for every input instance, it halts with correct output. An
incorrect algorithm might not halt at all on some input instances, or it might halt with an
answer other than the desired one. Incorrect algorithm might be useful sometimes, hence do
not be disturbed when your algorithm turns out to be incorrect in this course But rather have
time to resolve the inconsistency in your solution.
❖ Medicine: Determining the sequence of the 3 billion chemical base pairs that make up
human, DNA, storing this information in databases, and developing tools for data
analysis.
❖ Internet: Clever algorithms are employed to manage and manipulate large volume of
data. This problem may include routing – finding good routs on which the data travel and
using search engine to quickly find pages on which particular information resides.
❖ Ecommerce: Encryption of data may be required to maintain privacy and prevent fraud
❖ Manufacturing and production settings: Algorithms are required to allocate scarce
resource in most beneficial way. Example placing order to maximize its expected profit.
6
Page
The prerequisite of this course handled, introduction to algorithms and data structures. We
defined algorithms for performing several operations on the data structures and analysed the
algorithms for their efficiency. This part of the course, will introduce us to graph and other
non-linear data structures.
1. 1 GRAPHS
Graphs are useful models for reasoning about relations among objects and combinatorial
problems. Many real-life problems can be solved by converting them to graphs. Proper
application of graph theory ideas can drastically reduce the solution time for some important
problems.
1.2 Terminologies
A graph is a collection of non-empty set of vertices, V and associated edges, E where each
edge is a pair (u, v) of vertices. The equation for a graph:
G = (V, E)
V = {v1,v2,v3}
E = {(v1,v2),(v2,v3)}
v1
Edge
(v1,v2)
7
v2
Page
v3
Fig1. A simple Graph
CSC 312: Data Structures and Algorithm II Aps. Nixon Adu-Boahen
V1
V2 V4
V3
Graph A
V5
Any edge between vertices can be drawn as any kind of a curve (not necessarily a straight
line). The important thing is the incidence between the edges and vertices: any intersection of
edges is not vertex unless specified. The same graph shown pictorially in three different ways
is illustrated below.
2
2 2
1 1 3 5
3 5 3
1 5
4 4
4
Fig. 2 Drawing same graph in different shapes
NB: Minimum possible order is 0 (empty graph) Maximum possible order is n(n-1)/2
(complete graph), where n is the size (number of vertices) of the graph.
Density
The density of G is the ratio of edges in G to the maximum possible number of edges
2L
Density = , where L is the order of G and n is the maximum order of G.
n(n−1)
8
Page
Neighbour vertices
Vertices are said to be neighbours if they share the same edge. V1 is said to be a neighbour of
V2, if V2 can be reached from V1 by an edge. V1 is said to be neighbour to V2, if V1 can be
reached from V2 by an edge. In other words, if there is an edge from V1 to V2 then V2 is
neighbour of V1 otherwise, V2 is neighbour to V1This implies neighbourhood is based on the
direction of an edge
V1 V2
V1 is neighbour to V2 or V2 is neighbour of V1
Degree of Vertex
The degree of a vertex (or a node), in a graph, is the number of edges containing the
vertex. E.g. In Graph A above, we have the following degrees
degree of v1 = 3
degree of v3 = 3
degree of v2 = 2
degree of v4 = 3
degree of v5 = 1
In a graph, G = (V, E), two vertices, v1 and v2, are neighbours if (v1,v2) is an edge.
The in-degree of a vertex (v) is equal to the number of vertices that have v as a neighbour. It
can also be defined as the number of edges whose final vertex is the vertex (v).
V1
V2 V4
9
V3
Page
Graph B V5
Pendant
A vertex is pendant if and only if it has a degree of 1. Consequently, a pendant vertex is
adjacent to exactly one other vertex. Vertex V5 in Graph B is a pendant.
Null Graph
A null graph is a graph consisting of isolated vertices, an isolated vertex being a vertex
having no edges no incident to it.
Loop
A loop is an edge with both endpoints being the same. A loop is not a path necessarily a
path; it is an edge. It involves only one vertex.
V1
Fig3. A loop
Path
A path is sequence of vertices V1, V2, ..., Vn, such that each pair (vi, vi+1) is an edge.
The definition implies that, vertices are neighbour-wise adjacent and every vertex appears
once. For example in the graph B above we can identify 16 paths which are as follows
Connected Graph
A graph is connected if for each vertex pair (vi,vj) there is a path from vi to vj. Hence Graph
B is not a connected graph because there is no path between the vertex V5 and any other
node. But graph C below is a connected graph. V1
V2 V3 V4
V5
A directed graph is a graph with vertices and edges where each edge has a specific direction
relative to each of the vertices. We can convert an undirected graph to a directed one by
duplicating edges, and orienting them both ways.
For example we can convert the undirected graph A into the directed graph below.
V1
V2 V3 V4
Graph D V5
Connected Graph
An undirected graph is connected if there is a path from every vertex to every other vertex. A
directed graph with this property is called strongly connected. If a directed graph is not
strongly connected, but the underlying graph (without direction to the arcs) is connected, then
11
the graph is said to be weakly connected. Fig 4 below shows examples of strongly connected
Page
graphs
V2 V3 V4 V2 V3 V4
V5 V5
Complete Graph
A complete graph is a graph in which there is an edge between every pair of vertices.
Examples are shown below.
V1
V1
V2 V3 V4
V2 V3 V4
V5
Exercise V5
1. Construct a complete directed graph with 6 vertices
2. List all the paths in the graph above. How many paths can you get?
3. Find the order of the graph in question 1
4. Find the size of graph in question 1
5. Find the density of a graph G, with 15 nodes 14 edges
Subgraph
Let H=(V1, E1) of G=(V,E) is a graph with V1 ⊆ V and E1 ⊆ E, and each edge in H has the
same end vertices in H as in G. a a
a b b
b
e e
12
c c
c d
Page
G H1 H2
Tree
A tree is an acyclic connected graph. Trees are very vital structures in computing which can
store data in non-linear manner. They are normally used to handle hierarchical information in
memory. Fig6 below shows some trees
Fig7 A forest
Page
Rooted Trees
Since a tree contains no cycles, the length of a path in it is bounded. Therefore there exist
maximal paths which are proper subpaths of no longer paths. The initial and final nodes of
a maximal path are called the root and leaf (terminal node) of the tree respectively. A tree is
said to be rooted if it has a particular node specially designated as the root. Because a
maximal path in a tree starts from the root and ends at a leaf, it is convenient to think of a
rooted tree as a directed graph.
Level
The root is said to lie on the first level of the tree. The level of any other node is the number
of nodes on the path from the root to that node. In general, a node which lies on the jth level,
is said to lie at the end of a path (from the root) with length j-1.
Branch node
It is any node which is not a terminal node.
Subtree
Any node defines a subtree of which it is the root, consisting of itself and all other nodes
reachable from it.
Example2: A graph G consists of 2 vertices of degree 2 each, 3 vertices of degree 3 each and
the remaining each of degree 1. If the number of edges in G is 8.
i. What is the number of vertices in G?
ii. What is the density of G
Solution: Let n be the unknown number of vertices each of degree 1.
Sum of degrees in G : 2x2 + 3x3 + n*1
Using the theorem 1: sum of degree = 2xe (2 times number of edges)
➔4 + 9 + n = 2 x 8 ➔ 13 + n = 16 ➔ n = 16-13 ➔ n=3
i. The number of vertices : 2 + 3 + 3 = 8
ii. The density : e =L= 8, D =2L/n(n-1) ➔ D = 2x8/8(7) ➔ 16/56 or 0.2857
V2 V4
V3
V5
From the graph above, deg(v1)=3, deg(v2)=2, deg(v3)=3, deg(v4)=3 and deg(v5)=1
15
From the above have 4 vertices with their degrees being odd, and 4 is even.
Page
1. airport system:
nodes = airports; edges = pairs of airports with non-stop flights.
(weight/cost = airfare; distance; capacity)
2. Internet:
3. social graphs:
4. academic graphs:
• Finding routes between cities: the objects could be towns, and the connections could
be road/rail links.
• Deciding what first year courses to take: the objects are courses, and the relationships
are prerequisite and co-requisite relations. Similarly, planning a course: the objects
are topics, and the relations are prerequisites between topics (you have to understand
topic X before topic Y will make sense).
• Planning a project: the objects are tasks, and relations are relationships between tasks.
• Finding out whether two points in an electrical circuit are connected: the objects are
electrical components and the connections are wires.
• Deciding on a move in a game: the objects are board states, and the connections
16
V2 V4
V3
Graph G V5
6. What is the term given to two edges with the same initial and final nodes?
7. What is the level of a root in a tree?
8. There are six nodes between node p and q in a tree. What is the length of the path from p
to q?
9. Assignment: What the various ways of representing a tree?
10. Assignment: Give four practical use of trees in computation
11. A graph H consists of 5 vertices each with degree 3, a vertex with degree 1 and the
remaining vertices each with degree 2. If the number of edges is 10, find the number of
vertices and the density of G.
17
Page
graph, then the weights of the edges can be used where an edge exists and zero where there is
Page
no edge.
N1 N2
N3
N4 N5
Fig[2.1]: a graph
The graph in the figure above would be represented as follows:
1 2 3 45
1FTFTF
2FFTFF
3FFFFF
4FFTFT
5FFFFF
NB: In a directed graph, there is an edge between two nodes (a,b), if there is an edge that
begins from a and ends at b
N3
N7 N6
2. N1 N5
N4
19
N5
Page
N4
N3
N2 N6
CSC 312: Data Structures and Algorithm II Aps. Nixon Adu-Boahen
1.5.2 Edge Lists
The adjacent matrix representation is very simple, but it is inneficient (in terms of space) for
sparse graphs, ie, those without many edges compared with nodes (vertices). Such a graph
would still need an NxN matrix, but would be almost full of 'False's.
So an alternative is to have associated with each node a list (or set) of all the nodes it is linked
to via an edge. So, in this representation we have a one dimensional array, where each
element in the array is a list of nodes.
1 2
3 4 5
6 7
Fig [2.2]
Now comes the question of how to represent the list of nodes. One way is to use linked lists.
20
The graph would involve an array, where each element of that array is a linked list.
Page
4 5
2
3 6
4 6 7 3
5 4 7
6
7
Trials 2
1. Represent the graphs drawn in Fig [2.2] as Edge Lists.
2. What is a graph?
3. Distinguish between digraph and undirected graph.
4. Graphically represent the following graph G,
G=(V,E)
V={n1, n2, n3, n4, n5}
E={(n2,n3), (n1,n5),(n2,n3),(n4,n5),(n1,n4),(n1,n3)}
5. What is a connected graph?
6. Distinguish between complete and connected graph.
7. A graph consists of 5 vertices with equal degrees. If there are 15 edges in the graph. What
is the degree of each vertex in the graph?
8. Use adjacency matrix to represent the graph G=(V,E).
V={v1, v2, v3, v4, v5}
E={(v1,v3),(v1,v4),(v2,v3),(v2,v5),(v4,v5),(v3,v4),(v1,v2)}
9. Represent the graph in 7 using linked list.
10. Implement the ADT Graph using C++;
11. List three properties of a tree and binary tree.
12. Use Edge List to represent the graph G=(V,E).
V={v1, v2, v3, v4, v5}
E={(v1,v3),(v1,v4),(v2,v3),(v2,v5),(v4,v5),(v3,v4),(v1,v2)}
22
Page
We can also describe it in terms of family tree terminology: in depth first the node's
descendants are searched before its (unvisited) siblings; in breadth first the siblings are
searched before its descendants.
N1 N2
N3
N7 N6
N5
N4
N4
N2
N3
N9
24
N5
N8
Page
N6 N7
Fig[2.3]: An example of a tree.
Pseudocode has been used to describe putting all neighbours of the node on the stack; you'd
actually need to either use list traversal of neighbours or check through all possible nodes and
add them if they are a neighbour.
We can see how this will work for the tree in the figure above. (The trace below shows 'stack'
on entering the body of the loop each time, and 'current node' after it has been popped off
stack).
(1) 1 2,3,4
(2 3 4) 2 5,6
(5 6 3 4) 5 none
(6 3 4) 6 none
(3 4) 3 7,8
(7 8 4) 7 none
(8 4) 8 none
25
(4) 4 9
Page
()
The algorithm for breadth first is exactly the same BUT we use a queue rather than a stack of
nodes: put the neighbours on the BACK of the queue, but remove current−node from the
front.
Note
Advantage of Breadth-first search
• It can often avoid getting lost in fruitless scanning of deep parts of the tree,
Disadvantage of Breadth-first search
• The queue that is used in Breadth-first search often requires much more memory
than depth-first search’s stack
• There is always the tendency of searching deep parts of the tree fruitlessly.
Page
Avoiding revisiting previously visited nodes leads to the following modified algorithm,
which keeps track of nodes visited (using an array visited[], which would be initialised
appropriately).
stack.push(startnode);
do
{
currentnode = stack.pop();
if(! visited[currentnode])
{
visited[currentnode] = 1;
for (each neighbour n of currentnode)
if( !visited[n])
stack.push(n);
}
}while(! stack.empty() && currentnode != target)
Binary Trees
A tree is a widely-used data structure that emulates a hierarchical tree structure with a set of
linked nodes.
A node is a structure which may contain a value, a condition, or represent a separate data
structure (which could be a tree of its own). A node has at most one parent.
27
An internal node (also known as an inner node or branch node) is any node of a tree that has
Page
child nodes. Similarly, an external node (also known as an outer node, leaf node, or terminal
In binary trees, each node has at most 2 children (i.e. each node has 0, 1 or 2 children).
N1
N2
N3
N4
Binary trees have one key and two pointers in each node. The leaves of the tree
are indicated by null pointers.
E.g. The binary tree below can be represented in the c++ code below
3
8
29
Page
1
4
int main(){
Tree node[4];
node[0].key = 5;
node[1].key = 3;
node[2].key = 8;
node[3].key = 1;
node[4].key = 4;
node[0]->left = node[1];
node[0]->right = node[2];
node[1]->left = node[3];
node[1]->right = node[4];
cout<< node[1].key<<endl;
else
InsertNode(root, value)
Page
end if
The insertion algorithm is split for a good reason. The ¯rst algorithm (non-recursive) checks a
very core base case - whether or not the tree is empty. If the tree is empty then we simply create
our root node and finish. In all other cases we invoke the recursive InsertNode algorithm which
simply guides us to the first appropriate place in the tree to put value. Note that at each stage
we perform a binary chop: we either choose to recurse into the left subtree or the right by
comparing the new value with that of the current node. For any totally ordered type, no value
can simultaneously satisfy the conditions to place it in both subtrees.
Visiting a node simply means accessing it to do something with it. It could be displaying the
name of the node, changing the label etc.
31
To traverse, always start from the root of the tree and perform the visitation in the order of
Page
/
+
1
4 c d
Logically attach an arrow to all the nodes either at the left (for pre-order), below (for in order)
or right (for post-order) and then move from the root leftward, rightward and then upward
back to the root around all the nodes. When an arrow is encountered, the node is visited (or
its name is written).
= =
=
/
/ + /
+ +
1
1 4 c d 1
4 c d 4 c d
Treetraversal,
After A: Preorder
the following are the visited
Tree B:node values:
Inorder Tree C: Postorder
Preorder traversal (Tree A): =/14+cd
NB: If the expression in preorder to convert it to infix or algebraic expression, scan from
right to left until an operator is encountered
32
C++ implementation
void preorder(node t)
{
if(t == NULL)
return;
cout<<" "<<t->val;//Visiting the root
preorder(t->left);
preorder(t->right);
}
2. Inorder Traversal
Algorithm
inorder(node)
if node = null then return
inorder(node.left)
print node.value
inorder(node.right)
33
C++ implementation
Page
void inorder(tree t)
3. Postorder traversal
Algorithm
postorder(node)
if node = null then return
postorder(node.left)
postorder(node.right)
print node.value
C++ implementation
void postorder(tree t)
{
if(t == NULL)
return;
postorder(t->left);
postorder(t->right);
cout<<" "<<t->val;//Visiting root
}
• The left subtree of a node contains only nodes with keys less than the node's key.
• The right subtree of a node contains only nodes with keys greater than the node's key.
• Both the left and right subtrees must also be binary search trees.
Generally, the information represented by each node is a record rather than a single data
element. However, for sequencing purposes, nodes are compared according to their keys
rather than any part of their associated records.
The major advantage of binary search trees over other data structures is that the related
sorting algorithms and search algorithms such as in-order traversal can be very efficient.
Binary search trees are a fundamental data structure used to construct more abstract data
structures such as sets, multisets, and associative arrays.
Trials 3
1. Give two advantages and disadvantages of breadth first search and depth first search
2. Write down algorithm for Preorder, Inorder and Postorder traversal of trees
3. Define a node in C++ and write down the C++ implementation for each of the traversals in
question 2.
4. Give the breadth first search and depth first search traversal of the following graph
V1 V2
V3
V7 V6
35
Page
V5
V4
CSC 312: Data Structures and Algorithm II Aps. Nixon Adu-Boahen
5. List the properties of a binary tree.
3.0 RECURSION
Procedure may contain repetitive task which can be achieved by iteration or recursion. With
iteration, the procedure may implement looping control structure such as while, do…while, or
for but the procedure may not call itself in its definition. A procedure on the other hand may
be recursive when it directly or indirectly calls itself. Recursion is a powerful technique for
defining an algorithm. When a function calls itself it is known as recursion. This is an important
aspect in Computer Science. Recursion can sometimes lead to very simple and elegant
programs but if not well defined can create infinite loops in a program.
Generally, for a function to be recursive, the variable involved in its definition is referenced at
the right hand-side of its definition.
1. Factorial
Page
return (x * factr(x-1));
}
Page
2. Fibonacci Numbers
Fibonacci sequence is a series in which a term depends on the sum of previous two terms and
the first two terms are always given. e.g 1 1 2 3 5 8 13 …. In here, 1 and 1 are the first two
terms. It can be seen that, each term after the first two is obtained by adding the previous two
terms. 2 = 1 + 1, 3 = 1 + 2, 5 = 2 + 3, 8 = 3 + 5 and so on
The Fibonacci sequence, F(n), is defined recursively by the recurrence relation
F(0) = 1 F(1) = 1 // Here f(0) and f(1) are the first two terms
F(n) = F( n-1) + F(n-2)
When one is listing the sequence, he/she can stop at a point say t. Which means there will be t
minus 2 terms generated by the function call. The iterative and recursive definitions are
algorithmically given below :
algorithm fibo(w,u, t)//using iteration
Input: w and u are the first two terms t is the required number of terms in the sequence generation
Output: a list of Fibonacci sequence
count1
print (w, ‘ ’, u)
while cout < t -2
begin
za + b
print(‘ ’,z)
ab
bz
end
return
algorithm fibor(w, u, t)//using recursion
Input: w and u are the first two terms t is the required number of terms in the sequence generation
Output: a list of Fibonacci sequence
if (t =0) then return
print (w + u)
fibor(u, w +u, t-1)
//Assumption: it is assumed the first two terms are already printed so only the other terms are
printed
Here too, the fibo(w,u,t) implements its tasks by recursion whereas the fibor(w,u,t)
38
implements its tasks using recursion. It can again be seen that, the fibor is precise compared
to the iterative counterpart fibo, but they will all produce same output.
Page
Assignment:
1. Write A simple program (Java or C++) which implements the Fibonacci sequence
recursively. Your program should print the first 200 sequences.
2. Using C++, define a factorial function recursively
3. Implement the recursive algorithm called getLCM that accepts two numbers f and s and
returns the Least Common Multiple (LCM) of the two numbers in C++. The LCM of two
numbers is a least integer which is divisible by both two numbers. e.g. LCM of 6 and 8 is
24.
algorithm getLCM( f , s, m, n)
Input: f and s are the numbers to find their lcm, m is the maximum of the two and n is natural number
Output: returns the LCM of the two numbers f and s
if m mod f=0 and m mod s=0 then return m
if f * n > s * n then
return getLCM(f,s, s * n, n+1)
else
return getLCM(f,s, f * n, n+1)
end if
Generally, the information represented by each node is a record rather than a single data
39
element. However, for sequencing purposes, nodes are compared according to their keys
Page
Example: Represent the following sequence of numbers as a binary Search tree, 9,3, 6,1,10,
14,7,13,5
3 10
1 14
6
5 7
13
What will be the binary Search tree for the following sequence of keys?
• 80, 100, 50, 30, 23, 200, 95, 7, 9, 8, 90, 4, 1, 87, 150, 17, 43
• 8, 5, 10, 15, 4, 6, 9, 20, 2, 17, 32, 1, 0,18
We begin by examining the root node. If the tree is null, the value we are searching for does
not exist in the tree. Otherwise, if the value equals the root, the search is successful. If the
value is less than the root, search the left subtree. Similarly, if it is greater than the root,
40
Then the recursive algorithm for searching a binary search tree as shown below:
}
Page
The algorithm enters a loop, and decides whether to branch left or right depending on the
value of the node at each parent node.
//not found
return false;
}
In other words, we examine the root and recursively insert the new node to the left subtree if
the new value is less than the root, or the right subtree if the new value is greater than or
equal to the root.
Algorithm:
/* Inserts the node pointed to by "newNode" into the subtree rooted at "treeNode" */
• Deleting a leaf (node with no children): Deleting a leaf is easy, as we can simply
remove it from the tree.
• Deleting a node with one child: Delete it and replace it with its child.
• Deleting a node with two children: Call the node to be deleted "N". Do not delete N.
Instead, choose either its in-order successor node or its in-order predecessor node,
"R". Replace the value of N with the value of R, then delete R. (Note: R itself has up
to one child.)
43
Page
1. 2.
3 12 3 1
3 12
1 6 11 14 1 6 1
1 6 11 14
5
7 5 7
5
Exercises: Using the figure in the tree below as reference, what will be the new trees that will
be generated when the nodes with the following keys are deleted.
(a) 3
(b) 14 10
(c) 6
(d) 12
3 1
2
1 6 1 1
5 7
1
6
3.5 B-Trees
When working with large sets of data, it is often not possible or desirable to maintain the
entire structure in primary storage (RAM). Instead, a relatively small portion of the data
structure is maintained in primary storage, and additional data is read from secondary storage
as needed. Unfortunately, a magnetic disk, the most common form of secondary storage, is
44
significantly slower than random access memory (RAM). In fact, the system often spends
Page
The B-tree is a generalization of a binary search tree in that more than two paths diverge
from a single node.
Definition
A B-tree of order m (the maximum number of children for each node) is a tree which satisfies
the following properties:
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children.
The keys are stored in non-decreasing order. Each key has an associated child that is the root
of a subtree containing all nodes with keys less than or equal to the key but greater than the
preceeding key. A node also has an additional rightmost child that is the root for a subtree
containing all keys greater than any keys in the node.
The number of branches (or child nodes) from a node will be one more than the number of
keys stored in the node.
4 10 16
1 2 3 6 7 9 11 12 15 17 22 30
45
Page
Search the tree to find the leaf node where the new element should be added. Insert the new
element into that node with the following steps:
1. If the node contains fewer than the maximum legal number of elements, then there is
room for the new element. Insert the new element in the node, keeping the node's
elements ordered.
2. Otherwise the node is full, so evenly split it into two nodes.
1. A single median is chosen from among the leaf's elements and the new
element.
2. Values less than the median are put in the new left node and values greater
than the median are put in the new right node, with the median acting as a
separation value.
3. Insert the separation value in the node's parent, which may cause it to be split,
and so on. If the node has no parent (i.e., the node was the root), create a new
root above this node (increasing the height of the tree).
E.g: The figure below displays the steps used to insert the numbers 1, 2, 3, 4, 5, 6, 7.
46
Page
Assignment:
Using diagrams, show how the following numbers are inserted into a B-tree of order 4:
5,12, 18,7, 15, 12, 4, 3, 1, 16, 19, 20, 32, 50, 43, 28, 76, 100,85,96
47
Page
The simplest method for searching is called the sequential search. Simply move through the
array from beginning to end, stopping when you have found the value you require.
#include<iostream>
using namespace std;
int main(){
result = SearchAges(ages,10,5); //we are searching for 10 and the array size is 5)
48
Page
if(result>-1)
system("PAUSE");
}
/* A function to return an age from an array if it exists
* "ages[]" is the array containing the ages, "age" is the age
* we are looking for, and n is the size of the age array.
*/
int SearchAges(int ages[], int age, int n)
{
int j;
for(j=0; j<n; j++){
if(ages[j] == age){
return ages[j];
}
}
return -1;
}
50
Page
#include <iostream>
Using namespace std;
int main(void)
{
int a[] = {4,7,19,25,36,37,50,100,105,205,220,271,301,321};
system(“PAUSE”);
}
return m; //found
Page
else{
m = (l+r)/2;
if(k == a[m])
return m;
else{
if (k > a[m])
return RecBin(k, a, m+1, r);
else
return RecBin(k, a, l, m-1);
52
}
Page
In binary searching, we simply used the middle of an ordered list as a best guess as to where
to begin the search. Now we use an interpolation involving the key, the start of the list and
the end.
In each search step it calculates where in the remaining search space the sought item might be
based on the key values at the bounds of the search space and the value of the sought key,
through linear interpolation. The key value actually found at this estimated position is then
compared to the key value being sought. If it is not equal, then depending on the comparison,
the remaining search space is reduced to the part before or after the estimated position.
if
l = 0, r = n-1 and “k” is the value we are looking for, then the formula for
interpolation search is
m = l + ((r-l) * (k-a[l]))/(a[r]-a[l])
m = 0 + ceil((15-0)*(20-0)/(30-0) )
m = 10
a[m] = 20 which is equal to the key we are looking for.
}
Page
Searching and sorting algorithms have a complexity associated with them, called big-O.
• Sequential Search : O(n)
• Binary Search : O(log n)
• Interpolation Search : O(log log n)
55
Page
Ideally, the hash function will assign each key to a unique bucket, but most hash table designs
employ an imperfect hash function, which might cause hash collisions where the hash function
generates the same index for more than one key. Such collisions are typically accommodated
in some way.
In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is
independent of the number of elements stored in the table. Many hash table designs also allow
arbitrary insertions and deletions of key–value pairs, at (amortized) constant average cost per
operation – O(1). The following figure 5.1 shows a sample hash table
Hash functions are mostly used to speed up table lookup or data comparison tasks such as
finding items in a database. The following is an example of a hash function for inserting integer
56
keys into a hash table h(K) = K mod n. Where n is the size of the array. Therefore, a hash
Page
Example 1
As an example, let us use an array of size 11 to store some airport codes such as GHA, PHL,
DCA, FRA, ORY, GCM, etc.
From above, each airport code can be seen as a three letter string X2X1X0 and we assume
the letter ’A’ has an integer value 0, ’B’ has the value 1, ‘C’ has the value 2, etc.
If our hash function is as below:
h(K) = (X2 * 262 + X1 * 26 + X0 ) mod 11
Where mod (modulus) is the remainder after division.
Applying this to K= “DCA” we can hash DCA as follows: X2=D=3, X1=C=2 and X0=A=0
5
h("PHL ") = (15 * 676 + 7 * 26 + 11) mod 11
h("PHL ") = (10333) mod 11 6
h("PHL ") = 4
7
Applying this to K= “PHL” we can hash PHL as follows: X2=H=7, X1=K=10 and X0=G=6
Page
It can be seen that the resulting hash sum for HKG is as that of PHL. Hence, the two different
keys are having same destination index. Hence collision is occurring.
Inserting “PHL”, “ORY” and “GCM”:
In linear probing open addressing, a free slot is sought for when collision occurs by simply
Page
incrementing the array index and checking for availability of the current index.
T A F
0 1 2 3 4 5 6 7 8 9 10
I I+1 I+2 0
Quadratic probing can be a more efficient algorithm in an open addressing table, since it better
avoids the clustering problem that can occur with linear probing, although it is not immune. It
also provides good memory caching because it preserves some locality of reference; however,
linear probing has greater locality and, thus, better cache performance.
Quadratic function
Let h(k) be a hash function that maps an element k to an integer in [0, m−1], where m is the
size of the table. Let the ith probe position for a value k be given by the function
• If h(k,i)=(h(k)+i+i2) mod m, then the probe sequence will be h(k), h(k)+2, h(k)+6,...
Page
when p = 1), then h(k, i) = ( h(k)+i+ni2 ) mod m gives cycle of all distinct probes. It can
be computed in loop as: h(k, 0) = h(k), and h(k, i+1) = (h(k,i) + 2in + n + 1) mod m
• For any m, full cycle with quadratic probing can be achieved by rounding up m to closest
power of 2, compute probe index: h(k,i)=h(k) + ((i2+i)/2) mod roundUp2(m), and skip
iteration when h(k, i) ≥ m. There is maximum roundUp2(m) - m<m/2 skipped iterations,
and these iterations do not refer to memory, so it is fast operation on most modern
processors.
hash table of size 2n there is no guarantee of finding an empty cell once the table becomes
more than half full, or even before this if the table size is composite, because collisions must
be resolved using half of the table at most.
The inverse of this can be proven as such: Suppose a hash table has size p (a prime greater than
3), with an initial location h(k) and two alternative locations h(k)+x2 mod p and h(k) + y2 mod
p (where 0≤ x and y ≤p/2). If these two locations point to the same key space, but x ≠ y},
then
• h(k)+x2 =h(k) + y2 mod p
• x2 =y2 mod p
• x2 – y2=0 mod p
• (x-y)(x+y)=0 mod p.
addressing in hash tables to resolve hash collisions, by using a secondary hash of the key as an
on a table T .
The double hashing technique uses one hash value as an index into the table and then repeatedly
steps forward an interval until the desired value is located, an empty location is reached, or the
entire table has been searched; but this interval is set by a second, independent hash function.
Unlike the alternative collision-resolution methods of linear probing and quadratic probing, the
interval depends on the data, so that values mapping to the same location have different bucket
sequences; this minimizes repeated collisions and the effects of clustering.
Given two random, uniform, and independent hash functions h1 and h2, the ith location in the
bucket sequence for value k in a hash table of |T| buckets is: h(i, k) = h1(k) + i.h2(k)) mod
|T| Generally, h1 and h2 are selected from a set of universal hash functions; h1 is selected to
have a range of {0, |T|-1} and h2 to have a range of {1, |T|-1}. Double hashing approximates
a random distribution; more precisely, pair-wise independent hash functions yield a probability
of (n / |T|)2 that any pair of keys will follow the same bucket sequence.
For instance, the airport codes example will have the collision handled as shown in the figure
Page
5.3 below
Accessing an element from the hash table is random and hence the algorithm time
complexity of hashing is O(1)
6.0 SORTING
In Computer Science, a sorting algorithm is an algorithm that puts elements of a list in a
certain order. The most-used orders are numerical order and lexicographical order. Efficient
sorting is important to optimizing the use of other algorithms (such as search algorithms) that
require sorted lists to work correctly.
Bubble sort is a straightforward and simplistic method of sorting data. The algorithm starts at
Page
the beginning of the data set. It compares the first two elements, and if the first is greater than
Algorithm
do
{
swapped = 0;
for (x = 0; x < array.size -1 ; x++)
{
if (array[x] > array[x+1])
{
Swapped = 1;
tmp = array[x];
array[x] = array[x + 1];
array[x + 1] = tmp;
}
}
} while (swapped);
An example:
3 1 4 1 5 9 2 6 5 4
1 3 1 4 5 2 6 5 4 9
1 1 3 4 2 5 5 4 6 9
63
Page
1 1 2 3 4 4 5 5 6 9
Complexity: Bubble sort average case and worst case are both O(n²)
Algorithm :
int x,y,min;
for (x = 0; x < array.size-1; x++)
{
min = x;
for (y=x+1; y<array.size; y++)
{
if (array[y] < array[min])
{
min = y;
}
}
/* swap the places */
tmp = array[x];
array[x] = array[min];
array[min] = tmp;
64
}
Page
3 1 4 7 5 9 10 6 8 2
1 3 4 7 5 9 10 6 8 2
1 2 4 7 5 9 10 6 8 3
1 2 3 7 5 9 10 6 8 4
1 2 3 4 5 9 10 6 8 7
1 2 3 4 5 9 10 6 8 7
1 2 3 4 5 6 10 9 8 7
1 2 3 4 5 6 7 9 8 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
65
Page
6.3 Quicksort
Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition
an array, we choose an element, called a pivot, move all smaller elements before the pivot,
and move all greater elements after it. We then recursively sort the lesser and greater sublists
If the array contains only one element or zero elements then the array is sorted.
• Select an element from the array. This element is called the "pivot element". For
example select the element in the middle of the array.
• All elements which are smaller than the pivot element are placed in one array and all
elements which are larger are placed in another array.
• Sort both arrays by recursively applying Quicksort to them.
• Combine the arrays
Quicksort can be implemented to sort "in-place". This means that the sorting takes place in
the array and that no additional array need to be created.
/* partition */
while (i <= j) {
while (arr[i] < pivot)
i++;
while (arr[j] > pivot)
j--;
if (i <= j) {
tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
i++;
j--;
}
}
/* recursion */
if (left < j)
quickSort(arr, left, j);
if (i < right)
quickSort(arr, i, right);
}
67
Example
Page
1 12 5 26 7 14 3 7 2
i j
1 12 5 26 7 14 3 7 2
12>=7 >=2 swap 12 and 2
i j
i j
i j
j i
We partition the list using j as right index for first list and I as left index for second list.
1 2 3 5 7 7 12 14 26
list
largest (or smallest) element of the list, placing that at the end (or beginning) of the list, then
Page
continuing with the rest of the list, but accomplishes this task efficiently by using a data
Once the data list has been made into a heap, the root node is guaranteed to be the largest(or
smallest) element. When it is removed and placed at the end of the list, the heap is rearranged
so the largest element remaining moves to the root. Using the heap, finding the next largest
element takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This
allows Heapsort to run in O(n log n) time.
Assignement:
1. Write C++ implementation for Bubble Sort
2. Write C++ implementation for Binary Search
3. Write a C++ program, that accepts a list of integer numbers from the user, arrange it and
print out the sorted list.
4. Write a C++ program that accepts a list or real numbers and prints the list in a reverse
order to the screen.
5. Write down all the steps for performing binary sort on the following set of data:
5 45 3 6 23 7 9 2 15 3 1 39 50 30
71
Page
given number
Page
* Department 2
***************
1. xxxxxx xxxxx xxxx X xx.xx
2. xxxxxx xxxxx xxxx X xx.xx
...
n. xxxxxx xxxxx xxxx X xx.xx
------------------------
No. of Staff : n
------------------------
...
* Department n
***************
1. xxxxxx xxxxx xxxx X xx.xx
2. xxxxxx xxxxx xxxx X xx.xx
...
n. xxxxxx xxxxx xxxx X xx.xx
------------------------
No. of Staff : n
------------------------
7. Ask for staff number from the user and search and display the record of a staff to the
screen.
8. Request the staff number of a staff from the user and print to the screen only the name and
the department of the staff.