CS312 - ATBU Lecture Note 7 3-03-2025 Data Structure
CS312 - ATBU Lecture Note 7 3-03-2025 Data Structure
Faculty of Computing
Abubakar Tafawa Balewa University, Bauchi
INTRODUCTION
OVERVIEW
Data Structures are the programmatic way of storing data so that it can be used efficiently.
Different kinds of data structures are suited to different kinds of applications, and some are
highly specialized to specific tasks. Almost every enterprise application uses various types of
data structures in one way or the other. This course will provide to student the understanding
of the fundamentals data structures and to develop algorithm towards solving the complexity
of enterprise level applications. In a nutshell the primary concern of this course is to improve
efficiency in computing.
EFFICIENCY
A solution is said to be efficient if it solves the problem within its resource constraints. Space
and time are typical constraints of a program. To write an efficient programs, one needs to
organize the data in such a way that it can be accessed and be manipulated efficiently.
1
solutions.
3. Develop proficiency in the specification, representation, and implementation of Data
Types and Data Structures.
To solve these problems, data need to be organized in such a way that all items may not be
required to be searched and the required data can be searched almost instantly. Therefore, the
need for Data Structures is to manage the complexity of application while searching data
by creating models that allow us to describe the data that our algorithms will manipulate in a
consistent way with respect to the problem.
DATA TYPES
All data items in the computer are represented as strings of binary digits (0’s and 1’s). In order
to give these strings meaning, there is need to have data types. Data types provide an
interpretation of binary data as representatives of binary digits with respect to the problem
being solved.
ABSTRACTION
Computer scientists use abstraction to allow them focus on “bigger problem” without getting
lost in the details. Abstraction allows us to view problems and solutions by separating the so-
called logical and physical perspectives.
2
Abstract Data Type
The user interacts with the interface, using the operations that have been specified by the
abstract data type. The abstract data type is the shell that the user interacts with. The
implementation is the hidden one level deeper. The user is not concerned with the details of
the implementation.
Example: The abstract view of a television:
1. Ability to change channels, or adjust volume.
2. TV displays the show to watch
3. Don't care: who made the TV, or how circuit inside was constructed
DATA STRUCTURE
Physical implementation of an Abstract Data Type (ADT) often referred to as a data
structure. Each operation associated with ADT is implemented by one or more subroutines in
the implementation. Since there are many different ways to implement an ADT, this
implementation independence allows the programmer to switch the details of the
implementation without changing the way the user of the data interacts with it. The user can
remain focused on the problem-solving process.
3
CLASSIFICATION OF DATA STRUCTURES
Data structures can be classified as either linear or non-linear, based on how the data is
conceptually organized or aggregated.
1. Linear structures. The array, list, linked lists, queue, de-queue and stack belong to
this category. Each of them is a collection that stores its entries in a linear sequence,
and in which entries may be added or removed at will. They differ in the restrictions
they place on how these entries may be added, removed, or accessed. The common
restrictions include First-In First-Out (FIFO) and Last-In First-Out (LIFO).
2. Non-linear structures. Trees and graphs are classical non-linear structures. Data
entries are not arranged in a sequence, but with different rules.
Example: Suppose you are hired to create database of names with all company's management
and employees. You can make it inform of list or tree as illustrated below.
ALGORITHM
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output. Algorithms are generally created independent of
underlying languages, (i.e. an algorithm can be implemented in more than one programming
language). Below are some important categories of algorithms:
1. Search: Algorithm to search an item in a data structure.
2. Sort: Algorithm to sort items in a certain order.
3. Insert: Algorithm to insert item in a data structure.
4. Update: Algorithm to update an existing item in a data structure.
5. Delete: Algorithm to delete an existing item from a data structure.
CHARACTERISTICS OF AN ALGORITHM
An algorithm should have the following characteristics, meaning not all procedures can be
4
called an algorithm.
1. It must be feasible with the available resources.
2. It must be correct.
3. It must be composed of a series of concrete steps.
4. An algorithm should have step-by-step directions, no ambiguity as to which step will
be performed next.
5. It must be composed of a finite number of steps.
6. It must terminate.
5
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
In design and analysis of algorithms, usually the second method is used to describe an
algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted
definitions. There are more than one ways to design an algorithm for a particular problem.
Hence, many solution algorithms can be derived for a given problem. This is illustrated in the
figure below:
The next step is to analyze the proposed solution algorithms and implement the best suitable
one.
ALGORITHM ANALYSIS
Efficiency of an algorithm can be analyze in two different stages, before implementation and
after implementation. These are described below.
Priori Analysis: This is a theoretical analysis of an algorithm before it is implemented
on a computer system. Efficiency of an algorithm is measured by assuming that all
other factors, such as processor speed, are constant and have no effect on the
implementation.
Posterior Analysis: This is an empirical analysis of an algorithm. The selected
algorithm is implemented using programming language, then executed on computer. In this
6
analysis, actual statistics of running time and space required are collected.
Algorithm analysis deals with the running time of various operations involved. The running
time of an operation can be defined as the number of computer instructions executed per
operation.
ALGORITHM COMPLEXITY
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor: Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
Space Factor: Space is measured by counting the maximum memory space
required by the algorithm.
Therefore, complexity of an algorithm f(n) gives the running time and/or the storage space
required by the algorithm in terms of n as the size of input data.
TIME COMPLEXITY
Time complexity of an algorithm represents the amount of time required by the algorithm to
run to completion. Time requirements can be defined as a numerical function of T(n), where
T(n) can be measured as the number of steps, provided each step consumes constant time.
SPACE COMPLEXITY
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components:
Fixed part that is a space required to store certain data and variables that are
independent of the size of the problem.
Variable part is a space required by variables, whose size depends on the size of the
problem.
Space complexity S(P) of any algorithm P is S(P) = C + S(I), where C is the fixed part and
S(I) is the variable part of the algorithm, which depends on instance characteristics of (I).
ASYMPTOTIC ANALYSIS
Asymptotic analysis of an algorithm refers to computing running time of any operation in
mathematical units of computation. Using asymptotic analysis, one can conclude the best
case, average case, and worst case scenario of an algorithm. These are described below:
Best Case: Minimum time required for program execution.
Average Case: Average time required for program execution.
Worst Case: Maximum time required for program execution.
7
For example, the running time of one operation is computed as f(n) and may be for another
operation it is computed as g(n2). This means the first operation running time will increase
linearly with the increase in n and the running time of the second operation will increase
exponentially when n increases. Similarly, the running time of both operations will be nearly
the same if n is significantly small.
ASYMPTOTIC NOTATIONS
Below are the commonly used asymptotic notations to calculate the running time
complexity of an algorithm.
Ο Notation
Ω Notation
θ Notation
BIG OH NOTATION, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running
time. It measures the worst case time complexity or the longest amount of time an algorithm
can possibly take to complete.
OMEGA NOTATION, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running
time. It measures the best case time complexity or the best amount of time an algorithm can
possibly take to complete.
8
THETA NOTATION, Θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of
an algorithm's running time. It is represented as follows −
GREEDY ALGORITHM
A greedy algorithm is a problem-solving technique that makes the best local choice at each
step in the hope of finding the global optimum solution. It prioritizes immediate benefits over
long-term consequences, making decisions based on the current situation without
considering future implications. While this approach can be efficient and straightforward, it
doesn’t guarantee the best overall outcome for all problems.
However, it’s important to note that not all problems are suitable for greedy algorithms.
They work best when the problem exhibits the following properties:
9
In a nutshell, divide and conquer algorithm can be summarized and illustrated below:
10
LINEAR DATA STRUCTURES
This section presents the study of data structures by considering the five powerful concepts of
linear data structures. These are stacks, queues, de-queue, array and list methods of data
collections whose items are ordered depending on how they are added or removed. Once an item
is added, it stays in that position relative to the other elements that came before and after.
Collections of such are often referred to as linear data structures. Linear structures can be
thought of having two ends referred to the “left” and “right”, the “front” and “rear” or in some
cases “top” and the “bottom.” What distinguishes one linear structure from another is the way
in which items are added and removed, in particular the location where these additions and
removals occur.
STACK
A stack is an ordered collection of items where the addition of new items and the removal of the
existing items always take place at the same end. This end is commonly referred to the “top.”
The other end opposite of the top is known as the “base.” The base of the stack is significant
since items stored in the stack that are closer to the base represent those that have been in the
stack the longest. The most recently added item is the one that is in position to be removed first.
This ordering principle of stack is called LIFO, last-in first-out.
Many examples of stack occur in everyday situations. Almost any cafeteria has a stack of trays
or plates where the one at the top will be taken, uncovering a new tray or plate for the next
customer in line. Imagine a stack of books on a desk shown in the figure 2.1 below. The only
book whose cover is visible is the one on top. To access others in the stack, one need to remove
the ones that are sitting on top of them. Figure 2.2 shows another example of stack.
11
Figure 2.2: A Stack of Primitive Python Objects
Considering this reversal property, one can perhaps think of examples of stacks that occur while
operating a computer. For example, every web browser has a Back button. As you navigate from
web page to another web page, those pages are placed on a stack (actually it is the URLs that are
going on the stack). The current page that you are viewing is on the top and the first page you
looked at is at the base. If you click on the Back button, you begin to move in reverse order
through the pages.
STACK ABSTRACT DATA TYPE
Stack Abstract Data Type is defined as the description of logical operations of how an ordered
collection of items were added to and removed from a stack. The description of stack operations
12
and their meanings are presented in the Table 2.1 below, while Table 2.2 shows the results of the
sequence of stack operations.
APPLICATION OF STACK
Stacks are used extensively at every level of a modern computer system. Below are few among
the application of stack
1. Modern PC uses stacks at the architecture level, which are used in the basic design of an
operating system for interrupt handling and function calls.
13
2. Stacks are used to run a Java Virtual Machine, and the Java language itself has a class
called "Stack", which can be used by the programmer.
3. Another common use of stacks at the architecture level is as a means of allocating and
accessing memory.
QUEUE
A queue is an ordered collection of items where the addition of new items happens at one end,
called “rear,” and the removal of existing items occurs at the other end, commonly called
“front.” As an element enters the queue it starts at the rear and makes its way toward the front,
waiting until that time when it is the next element to be removed. The most recently added item
in the queue must wait at the end of the collection. This ordering principle is called FIFO, first-
in first-out. It is also known as “first-come first-served.” This is illustrated in figure 2.4 below
14
similar to the queue. It has two ends, a front and a rear, and the items remain positioned
in the collection. What makes a dequeue different is the unrestrictive nature of adding
and removing items. New items can be added at either the front or the rear. Likewise,
existing items can be removed from either end. In essence, this hybrid linear structure
provides all the capabilities of stacks and queues in a single data structure. Figure 2.5
below shows a dequeue of Python data objects. It is important to note that even though
the dequeue can assume many of the characteristics of stacks and queues; it does not
require the LIFO and FIFO orderings that are enforced by those data structures
(example, inserting a song into a playlist).
15
d.addRear('dog') ['dog',4,]
d.addFront('cat') ['dog',4,'cat']
d.addFront(True) ['dog',4,'cat',True]
d.size() ['dog',4,'cat',True] 4
d.isEmpty() ['dog',4,'cat',True] False
d.addRear(8.4) [8.4,'dog',4,'cat',True]
d.removeRear() ['dog',4,'cat',True] 8.4
d.removeFront() ['dog',4,'cat'] True
ARRAY
Array is a collection of similar data types, stored into a common variable. The collection forms a
data structure where objects are stored linearly, one after another in memory. Sometimes arrays
are even replicated into the memory hardware. Most of the data structures make use of arrays to
implement their algorithms. The following are the important terms to understand the concept of
array.
In a nutshell, an array is a linear data structure consisting of a group of elements that are
accessed by indexing.
ARRAY REPRESENTATION
Let consider the array below for illustration:
As per the above illustration, the followings are important points to be considered.
Index starts with 0.
Array length is 10 which mean it can store 10 elements.
Each element can be accessed via its index. For example, we can fetch an element at
index 5 as 19.
16
BASIC OPERATIONS OF ARRAY
The following are the basic operations supported by an array.
Traverse − Print all the array elements one by one.
Insertion − Adds an element at the given index.
Deletion − Remove an element at the given index.
Search − Find an element using the given index or by the value.
Update − Updates an element at the given index.
CLASSIFICATION OF ARRAYS
Arrays can be classified as static and dynamic.
1. Arrays whose size cannot change once their storage has been allocated are called static.
2. Arrays whose size can be resized even if storage has been allocated are called dynamic.
APPLICATIONS OF ARRAYS
Arrays are employed in many computer applications in which data items need to be saved in the
computer memory for subsequent reprocessing. Due to their performance characteristics, arrays
are used to implement other data structures, such as heaps, hash tables, dequeues, queues, stacks
and strings.
LIST
List is a linear data structure which contains a sequence of elements. It has property of known
length and its elements are arranged consecutively. The collection of items are accessible one
after the other, beginning at the head and ending at the tail. It is a widely used data structure for
applications which do not need random access. Unlike an array, stack and queue list is a data
structure allowing insertion and deletion of elements at any position in the list.
For example, the collection of integers 54, 26, 93, 17, 77, and 31 might represent a simple
unordered list of exam scores.
LIST IMPLEMENTATION
Lists can be implemented in many ways, depending on how the programmer will use lists in their
program. Common implementations include:
1. Array List
2. Linked List
ARRAY LISTS
This implementation stores the list in an array. The Array List has the following properties:
1. The position of each element is given by an index from 0 to n-1, where n is the number of
elements.
2. Given any index, the element with that index can be accessed in constant time. i.e. the
time to access does not depend on the size of the list.
3. To add an element at the end of the list, the time taken does not depend on the size of the
list. However, the time taken to add an element at any other point in the list does depend
17
on the size of the list, as all subsequent elements must be shifted up. Additions near the
start of the list take longer than additions near the middle or end.
4. When an element is removed, subsequent elements must be shifted down, so removals
near the start of the list take longer than removals near the middle or end.
LINKED LIST
A linked list is a linear data structure where each element is a separate object connected together
via links. Each element is considered as node consisting of two items; data and next reference to
the next node.
NODE
Node is the basic building block for linked list implementation which contains the item and a
reference to the next node. Each node object must hold at least two pieces of information called
data and field or next. This is illustrated below:
Each link contains a connection to another link. Linked list is the second most-used data
structure after array. The followings are important terms to understand the concept of Linked
List.
Link: Each link of a linked list can store a data called an element.
Next: Each link of a linked list contains a link to the next link called Next.
LinkedList: A Linked List contains the connection link to the first link called First.
18
LINKED LIST REPRESENTATION
Linked list can be visualized as a chain of nodes, where every node points to the next node.
As per the above illustration, following are the important points to be considered.
Linked List contains a link element called head or first.
Each link carries a data field(s) and a link field called next.
Each link is linked with its next link using its next link.
Last link carries a link as null to mark the end of the list.
19
INSERTION OPERATION
Adding a new node in linked list is a more than one step activity. Let’s consider the diagrams
below. First, create a node using the same structure and find the location where it has to be
inserted.
Imagine that we are inserting a node B (New Node), between A (Left Node) and C (Right Node).
Then point B, next to C −
Command 1: NewNode.next −> RightNode;
This will put the new node in the middle of the two. The new list should look like this.
20
DELETION OPERATION
Deletion is also a more than one step process. We shall learn with pictorial representation. First,
locate the target node to be removed, by using searching algorithms.
The left (previous) node of the target node now should point to the next node of the target node
Command 1: LeftNode.next −> RightNode.next;
This will remove the link that was pointing to the target node. Now, using the following code, we
will remove what the target node is pointing at.
Command 2: TargetNode.next −> NULL;
If we need to use the deleted node. We can keep that in memory otherwise we can simply
deallocate memory and wipe off the target node completely.
REVERSE OPERATION
This operation is a thorough one. We need to make the last node to be pointed by the head node
and reverse the whole linked list.
21
First, we traverse to the end of the linked list which is pointing to NULL and makes it point to its
previous node −
We have to make sure that the last node is not the lost node. So we'll have some temp node,
which looks like the head node pointing to the last node. Now, we shall make all left side nodes
point to their previous nodes one by one.
Except the node (first node) pointed by the head node, all nodes should point to their
predecessor, making them their new successor. The first node will point to NULL.
We'll make the head node point to the new first node by using the temp node.
The following are the important terms to understand the concept of doubly linked list.
Link: Each link of a linked list can store a data called an element.
Next: Each link of a linked list contains a link to the next link called Next.
22
Prev: Each link of a linked list contains a link to the previous link called Prev.
LinkedList: A Linked List contains the connection link to the first link called First and
to the last link called Last.
As per the above illustration, the following are the important points to be considered.
Doubly Linked List contains a link element called first and last.
Each link carries a data field(s) and
The two link fields called next and prev.
Each link is linked with its next link using its next link.
Each link is linked with its previous link using its previous link.
The last link carries a link as null to mark the end of the list.
BASIC OPERATIONS
Following are the basic operations supported by a double linked list.
Insertion − Adds an element at the beginning of the list.
Deletion − Remove an element at the beginning of the list.
Insert Last − Adds an element at the end of the list.
Delete Last − Remove an element from the end of the list.
Insert After − Adds an element after an item of the list.
Delete − Removes an element from the list using the key.
Display forward − Displays the complete list in a forward manner.
Display backward − Displays the complete list in a backward manner.
23
DOUBLE LINKED LIST AS CIRCULAR LINKED LIST
In doubly linked list, the next pointer of the last node points to the first node and the previous
pointer of the first node points to the last node making the circular in both directions.
BASIC OPERATIONS
Following are the important operations supported by a circular linked list.
Insert − Add an element at the start of the list.
Delete − Remove an element from the start of the list.
Display − Show the list.
UNORDERED LIST
An unordered list is a collection of related items that have no special order or sequence. It will be
built from a collection of nodes, each linked to the next by explicit references. As long as the
first node is known others can be found by successively following the next links.
24
Figure: Relative Positions Maintained by Explicit Links.
It is important to note that the location of the first item of the list must be explicitly specified.
Once the location of first item is known, the first item can tell us where the second is, and so on.
The external reference is often referred to as the head of the list. Similarly, the last item needs to
know that there is no next item.
List() Creates a new list that is empty. It needs no parameters and returns an empty
list.
add(item) Inserts a new item to the list. It needs the item and returns nothing. Assume
the item is not already in the list.
remove(item) Deletes the item from the list. It needs the item and modifies the list. Assume
the item is present in the list.
search(item) Look for the item in the list. It needs the item and returns a Boolean value.
isEmpty() Tests to see whether the list is empty. It needs no parameters and returns a
Boolean value.
size() Returns the number of items in the list. It needs no parameters and returns an
integer.
append(item) Adds a new item to the end of the list making it the last item in the collection.
It needs the item and returns nothing. Assume the item is not already in the
list.
index(item) Returns the position of item in the list. It needs the item and returns the index.
Assume the item is in the list.
25
insert(pos,item) Adds a new item to the list at position pos. It needs the item and returns
nothing. Assume the item is not already in the list and there are enough
existing items to have position pos.
pop() Removes and returns the last item in the list. It needs nothing and returns an
item. Assume the list has at least one item.
pop(pos) Removes and returns the item at position pos. It needs the position and returns
the item. Assume the item is in the list.
26
NON-LINEAR DATA STRUCTURE
TREE
A tree is often used to represent a hierarchy. The relationships between the items in the hierarchy
suggest the branches of a botanical tree. It is a collection of nodes storing elements such that the
nodes have a parent-child relationship. A tree has the following properties:
1. If tree is not empty, it has a special tree node called the root that has no parent.
2. Each node of a tree that is different from the root has a unique parent node.
3. Each element has zero or more than one children.
4. A unique path traverses from the root to each node.
In summary,
Tree stored elements in hierarchical order.
The top element is called root.
Except the root, each element has a parent.
BASIC DEFINITIONS
Node: Node is an element storing device which contains field and data contents, it can
either be internal or external node.
Internal nodes: Nodes that have children.
External nodes or leaves: Nodes that don’t have children
Edge: An edge connects two nodes together to show that there is a relationship between
them. Edges are usually drawn as simple lines, they are really directed from parent to
child. In tree drawings, this is top-to-bottom.
Root: The root of the tree is the only node in the tree that has no incoming edges.
Path: A path is an ordered list of nodes that are connected by edges.
Children: Set of nodes that have incoming edges from the same node are said to be the
children of that node.
Parent: A node is the parent of all nodes if it connects to with outgoing edges.
Sibling: Siblings: two nodes that have the same parent are called siblings
Descendants: The descendents of a node are all the nodes that are on same path from the
node to any leaf.
Ancestors: Ancestors of a node are all the nodes that are on the path from the node to the
root.
Level: The level of a node n is the number of edges on the path from the root node to n.
Height: The height of a node is the length of the longest path from the node to a leaf.
Figure 3.1 below illustrates a tree definition above. The arrowheads on the edges indicate the
direction of the connection.
27
Figure 3.1: A Tree Consisting of a Set of Nodes and Edges
APPLICATION OF TREE
1. Class hierarchy in Java.
2. File system.
3. Storing hierarchies in organizations
PARSE TREE
Parse trees can be used to represent real-world constructions such as sentences or mathematical
expressions. Figure 3.2 below is an example of how tree can be used to solve some real life
problems.
28
One can also represent a mathematical expression such as ((7+3)*(5−2)) as a parse tree, as shown
in Figure 3.3 below.
Above parenthesized expressions shows that multiplication has a higher precedence than either
addition or subtraction. Meaning, addition and subtraction expressions must be evaluated before
multiplication.
Figures below illustrate the structure and contents of the parse tree, as each new token is
processed.
29
Step 1: Create an empty tree.
Step 2: By rule 1, create a new node as the left child of the root. Make the current node
this new child.
Step 3: By rule 3, set the root value of the current node to 3 and go back up the tree to the
parent.
Step 4: Read + as the next token. By rule 2, set the root value of the current node to +
and add a new node as the right child. The new right child becomes the current node.
Step 5: Read a ( as the next token. By rule 1, create a new node as the left child of the current
node. The new left child becomes the current node.
Step 6: Read a 4 as the next token. By rule 3, set the value of the current node to 4. Make the
parent of 4 the current node.
30
Step 7: Read * as the next token. By rule 2, set the root value of the current node to * and create
a new right child. The new right child becomes the current node.
Step 8: Read 5 as the next token. By rule 3, set the root value of the current node to 5. Make the
parent of 5 the current node.
a. Read ) as the next token. By rule 4 we make the parent of * the current node.
b. Read ) as the next token. By rule 4 we make the parent of + the current node. At this
point there is no parent for + so we are done.
From the example above, it is clear that one need to keep track of the current node as well as the
parent of the current node. The tree interface provides us with a way to get children of a node,
through the getLeftChild and getRightChild methods.
Example: Hierarchically represent the mathematical expression below using tree terminology
1. (5a + 10b) / (14a4 - 28b)
31
2. (6b - 20c)/(2a+3d)/(5a+10b)7
Answer 1:
BINARY TREE
A binary tree is a finite set of data items that is either empty or partitioned into three disjoint
subsets. The first part contains a single data item referred to as the root of the binary tree, other
two data items are left and right sub-trees. These data items are referred to as nodes of the binary
tree. This is illustrated in the Figure 3.4 below.
32
Full Binary Tree (FBT): In a full binary tree all the internal nodes have equal degree, which
means that one node at the root level, two nodes at level 2, four nodes at level 3, eight nodes at
level 4, etc, as shown in the Figure 3.5 below:
Complete Binary Tree (CBT): A complete binary tree is a FBT except, that the deepest level
may not be completely filled. If not completely filled, it is filled from left-to-right as depicted in
Figure 3.6.
a. Pre-order: In preorder traversal, parent comes first followed by left child then right
child.
b. Post-order: In post-order traversal, left child comes first followed by right child then
root.
c. In-order: In an in-order traversal, left child comes first followed by root then right child.
33
Let’s look at some examples that illustrate each of these three kinds of traversals. First let’s look
at the preorder traversal by considering a book as a tree. The book is the root of the tree, and
each chapter is a child of the root. Each section within a chapter is a child of the chapter, and
each subsection is a child of its section, and so on. Figure 3.7 shows a limited version of a book
with only two chapters.
B C
D E F G
Preorder: ABDECFG
Post order: DEBFGCA
In-order: DBEAFCG
Searching
The process of finding the desired information from the set of items stored in the form of
elements in the computer memory is referred to as searching in data structure. Searching
algorithms are essential part of a programmer's toolkit. They help programmers to efficiently
locate specific elements within a collection of data. These sets of items are in various forms,
such as an array, tree, graph, or linked list.
34
2. Binary Search
3. Interpolation search
1. Linear Search
Linear search is a very simple search algorithm. In this type of search, a sequential search is
made over all items one by one. Every item is checked and if a match is found then that
particular item is returned, otherwise the search continues till the end of the data collection. This
is illustrated in the figure below:
2. Binary Search
Binary search looks for a particular item by comparing the middle most item of the collection. If
a match occurs, then the index of item is returned. If the middle item is greater than the item
under search, then the item is searched in the sub-array to the left of the middle item. Otherwise,
the item is searched for in the sub-array to the right of the middle item. This process continues
on the sub-array as well until the size of the sub-array reduces to zero.
Above is a sorted array and let’s assume that we need to search the location of a value 31 using
binary search.
35
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We find
that the value at location 4 is 27, which is not a match. As the value is greater than 27 and we
have a sorted array, so we also know that the target value must be in the upper portion (right
hand side) of the array.
We change our low to mid + 1 and find the new mid value again.
Low = mid + 1
Mid = low + (high - low) / 2
Mid = 5 + (9 - 5) /2
=5+2=7
New mid is now 7. We compare the value stored at location 7 with the target value 31.
The value stored at location 7 is not a match, rather it is more than what we are looking for. So,
the value must be in the lower part (left hand side) from this location (35). The sub-array is
illustrated below.
We compare the value stored at location 5 with our target value. We find that it is a match.
36
We conclude that the target value 31 is stored at location 5.
Binary search halves the searchable items and thus reduces the count of comparisons to be made
to very less numbers.
3. Interpolation search
Interpolation search is a searching algorithm that differs from other traditional methods, such as
linear or binary search. While linear search works sequentially from the beginning to the end of
a data set, and binary search divides the array in half at each step, interpolation search estimates
the position of the desired element based on its value.
This algorithm assumes that the elements within the collection are uniformly distributed,
enabling a more informed guess of where the target item might exist. Instead of relying on
constant intervals like binary search, interpolation search adapts its approach based on the range
of values within the data set.
37
position estimate and narrowing down the search range until we locate the target value or
exhaust all possibilities
Example:
Let's illustrate the interpolation search algorithm with a simple example. Search for the target
value of 12 in the following sorted array of integers:
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
However, there are considerations to keep in mind. Interpolation requires the data to be sorted
beforehand, which adds an initial time cost. Additionally, if data set is unevenly distributed or
has repetitive elements, interpolation search may not yield significant performance benefits.
Interpolation Algorithm
Step 1 − Start searching data from middle of the list.
Step 2 − If it is a match, return the index of the item, and exit.
Step 3 − If it is not a match, probe position.
Step 4 − Divide the list using probing formula and find the new middle.
Step 5 − If data is greater than middle, search in higher sub-list.
Step 6 − If data is smaller than middle, search in lower sub-list.
Step 7 − Repeat until match.
SORTING TECHNIQUES
Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to
arrange data in a particular order. Most common orders are in numerical or lexicographical
order. The following are some of the examples of sorting in real-life scenarios:
38
Telephone Directory: The telephone directory stores the telephone numbers of people
sorted by their names, so that the names can be searched easily.
Dictionary: The dictionary stores words in an alphabetical order so that searching of any
word becomes easy.
Important Terms
Some terms are generally coined while discussing sorting techniques, here is a brief introduction
to them:
1. Increasing Order: A sequence of values is said to be in increasing order, if the
successive element is greater than the previous one. E.g 1, 3, 4, 6, 8, 9
2. Decreasing Order: A sequence of values is said to be in decreasing order, if the
successive element is less than the current one. E.g. 9, 8, 6, 4, 3, 1
3. Non-Increasing Order: A sequence of values is said to be in non-increasing order, if the
successive element is less than or equal to its previous element in the sequence. This
order occurs when the sequence contains duplicate values. For example, 9, 8, 6, 3, 3, 1
4. Non-Decreasing Order: A sequence of values is said to be in non-decreasing order, if
the successive element is greater than or equal to its previous element in the sequence.
This order occurs when the sequence contains duplicate values. For example, 1, 3, 3, 6,
Bubble sort starts with very first two elements, comparing them to check which one is greater.
39
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we compare 33
with 27.
We find that 27 is smaller than 33 and these two values must be swapped.
Next we compare 33 and 35. We find that both are in already sorted positions.
We swap these values. We find that we have reached the end of the array. After one iteration,
the array should look like this
To be precise, we are now showing how an array should look like after each iteration. After the
second iteration, it should look like this −
40
Notice that after each iteration, at least one value moves at the end.
And when there's no swap required, bubble sorts learns that an array is completely sorted.
Insertion Sort
This is an in-place comparison-based sorting algorithm. Here, a sub-list is maintained which is
always sorted. For example, the lower part of an array is maintained to be sorted. An element
which is to be 'inserted’ in this sorted sub-list, has to find its appropriate place and then it has to
be inserted there. Hence, the name, insertion sort.
The array is searched sequentially and unsorted items are moved and inserted into the sorted
sub-list (in the same array). This algorithm is not suitable for large data sets as its average and
worst case complexity are of Ο(n2), where n is the number of items.
It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted sub-list.
41
And finds that 33 is not in the correct position.
It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see that the
sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the sorted sub-list
remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.
So we swap them.
42
We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items.
This process goes on until all the unsorted values are covered in a sorted sub-list. Now we shall
see algorithm aspects of insertion sort.
SELECTION SORT
Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place comparison-
based algorithm in which the list is divided into two parts, the sorted part at the left end and the
unsorted part at the right end. Initially, the sorted part is empty and the unsorted part is the entire
list. The smallest element is selected from the unsorted array and swapped with the leftmost
element, and that element becomes a part of the sorted array. This process continues moving
unsorted array boundary by one element to the right. This algorithm is not suitable for large data
sets as its average and worst case complexities are of Ο (n2), where n is the number of items.
For the first position in the sorted list, the whole list is scanned sequentially. The first position
where 14 is stored presently, we search the whole list and find that 10 is the lowest value.
43
So we replace 14 with 10. After one iteration 10, which happens to be the minimum value in the
list, appears in the first position of the sorted list.
For the second position, where 33 is residing, we start scanning the rest of the list in a linear
manner.
We find that 14 is the second lowest value in the list and it should appear at the second place.
We swap these values.
After two iterations, two least values are positioned at the beginning in a sorted manner.
The same process is applied to the rest of the items in the array.
Following is a pictorial depiction of the entire sorting process −
44
Now, let us see the algorithm aspects of selection sort.
Algorithm
Step 1 − Set MIN to location 0
Step 2 − Search the minimum element in the list
Step 3 − Swap with value at location MIN
Step 4 − Increment MIN to point to next element
Step 5 − Repeat until list is sorted
45
MEMORY ALLOCATION
Introduction
Memory allocation is a process of assigning blocks of memory on request. This chapter describes
the process of memory allocation as well its classifications, their advantages and disadvantages.
One should note that, memory allocation is a critical task in modern operating systems, and one of
the most commonly used techniques is contiguous memory allocation. Therefore, this module
concentrates more on contiguous memory allocation and its techniques.
Memory Allocation
A software or process requires memory space in order to run. As a result, there must be a process
that will give a specific amount of memory space that corresponds to the need of the software’s
or processes. In a nut shell, the procedure of assigning memory space to software applications is
referred to as memory allocation process.
Typically, the allocator receives memory from the operating system in a small number of large
blocks that it must divide up to the required number to satisfy the requests for smaller blocks. It
must also make any returned blocks available for reuse.
As illustrated in Figure 4.1 above, contiguous memory allocation is also divided into two
types:
46
1. Fixed (or static) Partition: In the fixed partition scheme, memory is divided into fixed
number of partitions in which on every partition only one process will be
accommodated. Maximum size of the process is restricted by maximum size of the
partition. Every partition is associated with the limit registers.
2. Variable (or dynamic) Partition: In the variable partition scheme, initially memory
will be single continuous free block. When a request by the process arrives, partition
will be made in the memory in accordance with the size of the process.
Contiguous
Contiguous memory allocation is one among the memory allocation strategies. As the name
implies, it is a strategy to allocate contiguous blocks of memory to each process. Therefore,
continuous segment is allotted from the entirely empty area to the process based on its size
whenever a process requests reached the main memory.
There are many types of contiguous operation techniques which are often being used in
combination with one another in a particular case. However, this note presents few among other
as follows.
1. First fit
2. Best fit
3. Worst fit
4. Next fit
5. Buddy system
First Fit
The first-fit algorithm searches for the first free partition that is large enough to accommodate the
process. The operating system starts searching from the beginning of the memory and allocates
the first free partition that is large enough to fit the process.
Best Fit
The best-fit algorithm searches for the smallest free partition that is large enough to accommodate
the process. The operating system searches the entire memory and selects the free partition that is
slightly greater or equal in size to the process space in the block.
Worst Fit
The worst-fit algorithm searches for the largest free partition and allocates the process to it. This
algorithm is designed to leave the largest possible free partition for future use.
Next Fit:
Next fit is similar to the first fit but it starts searching for the first sufficient partition (where
the last allocation was made) from the last (bottom) allocation point.
2. Waste of Memory
48
If a process requests a large block of memory, but only uses a portion of it, the remaining
memory is wasted. This is known as internal fragmentation.
First Fit:
300K request is allocated from 350K block, 50K is left out.
25K is allocated from the 150K block, 125K is left out.
Then 125K and 50K are allocated to the remaining left out partitions.
So, the first fit can handle requests.
Buddy System
The two smaller parts of a block are of equal size and called buddies. Buddy system is a memory
allocation technique used in computer OS to allocate and manage memory efficiently. It is
technique which divides the memory into fixed-size blocks, and when there is a memory
49
requests, the system finds the smallest available block that can accommodate the requested
memory size.
In the Buddy System, memory is split into fixed-length blocks, regularly in powers of 2 (e.g.,
1KB, 2KB, 4KB, and so on.). When a request for memory allocation is made, the device search
for the correct-sized block. If an appropriate block is determined, the space is allocated.
However, if the asked size doesn’t precisely get an appropriate block among the existing
blocks, then the device allocates a bigger block after which splits it into smaller blocks till an
accurately size is obtained.
Steps Involve in Buddy System
Below are the steps involved in the Buddy System Memory Allocation Technique:
1. The first step includes the division of memory into fixed-sized blocks that have a power
of 2 in size (such as 2, 4, 8, 16, 32, 64, 128, etc. ).
2. Each block is labeled with its size and unique identification.
3. Initially, all the memory blocks are free and are linked together in a binary tree
structure, with each node representing a block and the tree’s leaves representing the
smallest available blocks.
4. When a process is requesting a memory space, the system finds the smallest available
block that can accommodate the requested size. If the block is larger than the requested
size, the system splits the block into two equal-sized “buddy” blocks.
5. The system marks one of the buddy blocks as allocated and adds it to the process’s
memory allocation table, while the other buddy block is returned to the free memory
pool and linked back into the binary tree structure.
6. When a process releases memory, the system marks the corresponding block as free and
looks for its buddy block. If the buddy block is also free, the system merges the two
blocks into a larger block and links it back into the binary tree structure.
50
Formula : F(n) = F(n-1) + F(n-2)
Fibonacci Series
Here's the first 15 numbers in the Fibonacci series:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377
This is the basic idea behind the Fibonacci buddy system. By using Fibonacci numbers to
partition the memory, we can create a hierarchical structure that allows for efficient
allocation and deallocation of memory blocks.
Binary Buddy System: The buddy system keeps track of the free blocks of each size
(known as a free list) so that you can easily discover a block of the necessary size if one is
available. If no blocks of the requested size are available, Allocate examines the first non-
empty list for blocks of at least the requested size. In both cases, a block is deleted from the
free list. For ex: The 512 KB memory size is initially partitioned into two active partitions
of 256 KB each, with additional subdivisions based on a capacity of 2 to handle memory
requests.
Weighted Buddy System: In a weighted peer system, each memory block is associated
with a weight, which represents its size relative to other blocks. When a memory allocation
request occurs, the system searches for the appropriate block considering the size of the
requested memory and the weight of the available blocks.
51
Tertiary Buddy System : In a traditional buddy system, memory is divided into blocks of
fixed size, usually a power of 2, and allocated to these blocks but the tertiary buddy system
introduces a third memory structure, which allows flexibility large in memory allocation.
Summary
Contiguous memory allocation is an essential technique used in modern operating systems to
allocate memory space to processes. First Fit, Best Fit, Worst Fit and buddy system are popular
algorithms used for contiguous memory allocation. Each algorithm has its advantages and
disadvantages, but all are designed to optimize memory allocation and reduce fragmentation.
With these techniques, operating systems can efficiently manage memory, making them a critical
component of computer systems.
52