Wa0010.
Wa0010.
NAKURU CAMPUS
ICS 2105 DATA STRUCTURES ALGORITHMS
COURSE OUTLINE
PURPOSE
To expose learners to different types of data structures and algorithms together with their application
areas.
COURSE OBJECTIVES
At the end of the course the students should:
1. Differentiate between different types of data structures
2. Explain application areas of data structures
3. Illustrate the working of different searching and sorting algorithms
Content/ Topics
Week 1: Introduction to data structures and algorithms: Definitions, selecting data structures,
representation of algorithms
Week 2: Arrays: Properties, operations and application areas
Week 3: Lists: Structure, kind of lists, operations, application areas
Week 4: Queues: Structure, Types, operations, application areas
Week 5: CAT I
Week 6: Stacks: Structure, operations and applications areas
Week7: Trees: Definitions, structure, types, operations and application areas
Week 8: Trees: Comparison with graphs data structure
Week 9: Algorithms: Searching (Linear, binary search) and Sorting (Insertion, selection, bubble, merge
sort)
Week 10: CAT II
Week 11: Implementation of Data structures and algorithms
Week 12: Implementation of Data structures and algorithms
Week 13: End of semester examinations
TEACHING METHODOLOGY
Lectures, Discussions, Assignments, Presentations
COURSE EVALUATION
Continuous Assessment tests (CATs) 30%
University Examination 70%
REFERENCES
1. Mehlhorn, K. (2013). Data structures and algorithms 1: Sorting and searching (Vol. 1). Springer Science &
Business Media.
2. Storer, J. A. (2012). An introduction to data structures and algorithms. Springer Science & Business Media.
3. Weiss, M. A. (2012). Data structures and algorithm analysis in Java. Pearson Education, Inc.
4. Malik, D. S. (2014). C++ programming: Program design including data structures. Cengage Learning.
1
ICS 2105 DATA STRUCTURES AND ALGORITHMS
INTRODUCTION
Representing information is fundamental to computer science. The primary purpose of most computer
programs is not to perform calculations, but to store and retrieve information — usually as fast as
possible. For this reason, the study of data structures and the algorithms that manipulate them is at the
heart of computer science. Each data structure and each algorithm have costs and benefits. Practitioners
need a thorough understanding of how to assess costs and benefits to be able to adapt to new design
challenges.
NB:
1. Creating efficient programs has little to do with “programming tricks” but rather is based on good
organization of information and good algorithms.
2. Most computer curricula recognize that good programming skills begin with a strong emphasis on
fundamental software engineering principles.
3. Once a programmer has learned the principles of clear program design and implementation, the next
step is to study the effects of data organization and algorithms on program efficiency.
Guiding principles
• Each data structure and each algorithm has costs and benefits. Practitioners need a thorough
understanding of how to assess costs and benefits to be able to adapt to new design challenges.
• Related to costs and benefits is the notion of tradeoffs. For example, it is quite common to reduce
time requirements at the expense of an increase in space requirements, or vice versa.
• Programmers should know enough about common practice to avoid reinventing the wheel. Thus,
programmers need to learn the commonly used data structures, their related algorithms, and the most
frequently encountered design patterns found in programming.
• Data structures follow needs. Programmers must learn to assess application needs first, then find a
data structure with matching capabilities.
Representing information is fundamental to computer science. The primary purpose of most computer
programs is not to perform calculations, but to store and retrieve information — usually as fast as
possible. For this reason, the study of data structures and the algorithms that manipulate them is at the
heart of computer science.
There are often many approaches to solving a problem. How do we choose between them? At the heart of
computer program design are two (sometimes conflicting) goals:
1. To design an algorithm that is easy to understand, code, and debug.
2. To design an algorithm that makes efficient use of the computer’s resources.
2
Given enough space to store a collection of data items, it is always possible to
• Search for specified items within the collection
• Print or otherwise process the data items in any desired order
• modify the value of any data item
Thus, it is possible to perform all necessary operations on any data structure. However, using the proper
data structure can make the difference between a program running in a few seconds and one requiring
many days.
A solution is said to be efficient if it solves the problem within the required resource constraints.
Examples of resource constraints include:
• Total space available to store the data (main memory and disk)
• Time allowed to perform each subtask.
The cost of a solution is the amount of resources that the solution consumes. Most often, cost is measured
in terms of one key resource such as time, with the implied assumption that the solution meets the other
resource constraints. Only by first analyzing the problem to determine the performance goals that must be
achieved can there be any hope of selecting the right data structure for the job. Poor program designers
ignore this analysis step and apply a data structure that they are familiar with, but which is inappropriate
to the problem. The result is typically a slow program.
Terminologies
A type is a collection of values. For example, the Boolean type consists of the values true and false.
An aggregate type or composite type - contains several pieces of information. E.g. A bank account
record will typically contain
several pieces of information such as name, address, account number, and account balance.
A data item is a piece of information or a record whose value is drawn from a type. A data item is said to
be a member of a type.
A data type is a type together with a collection of operations to manipulate the type. For example, an
integer variable is a member
of the integer data type.
An abstract data type (ADT) is the realization of a data type as a software component. The interface of
the ADT is defined in terms of a type and a set of operations on that type. The behavior of each operation
is determined by its inputs and outputs. An ADT does not specify how the data type is implemented.
These implementation details are hidden from the user of the ADT and protected from outside access, a
concept referred to as encapsulation.
A data structure is the implementation for an ADT.
Example
An ADT for a list of integers might specify the following operations:
• Insert a new integer at a position in the list.
• Return true if the list is empty.
• Reinitialize the list.
• Return the number of integers currently in the list.
• Delete the integer at a position in the list.
3
From this description, the input and output of each operation should be clear, but the implementation for
lists has not been specified.
NB: The data structures shown in Table, except the arrays, can be thought of as Abstract Data Types, or ADTs
4
Problems, Algorithms, and Programs
Problem: A problem is a task to be performed. It is best thought of in terms of inputs and matching
outputs. A problem definition
should not include any constraints on how the problem is to be solved. Problems can be viewed as
functions in the mathematical
sense. A function is a matching between inputs (the domain) and outputs (the range).
Algorithms: An algorithm is a method or a process followed to solve a problem.
Better definition: A precise sequence of a limited number of unambiguous, executable steps that
terminates in the form of a solution.
If the problem is viewed as a function, then an algorithm is an implementation for the function that
transforms an input to the corresponding output. A problem can be solved by many different algorithms.
By definition, something can only be called an algorithm if it has all the following properties.
1. It must be correct - it must compute the desired function, converting each input to the correct output.
2. It is composed of a series of concrete steps. Concrete means that the action described by that step is
completely
understood — and doable — by the person or machine that must perform the algorithm. Each step
must also be doable in a finite amount of time.
3. There can be no ambiguity as to which step will be performed next
4. It must be composed of a finite number of steps. If the description for the algorithm were made up
of an infinite number of steps, we could never hope to write it down, nor implement it as a computer
program.
5. It must terminate. In other words, it may not go into an infinite loop.
Characteristics of an algorithm
To summarize:
A problem is a function or a mapping of inputs to outputs.
An algorithm is a recipe for solving a problem whose steps are concrete and unambiguous. Algorithms
must be correct, of finite
length, and must terminate for all inputs.
A program is an instantiation of an algorithm in a programming language.
5
Solving Problems
When faced with a problem:
1. We first clearly define the problem
2. Think of possible solutions
3. Select the one that we think is the best under the prevailing circumstances
4. And then apply that solution
5. If the solution woks as desired, fine; else we go back to step 2
Examples of algorithms
Greedy Algorithm
• An algorithm that always takes the best immediate, or local solution while finding an answer
• Greedy algorithms may find the overall or globally optimal solution for some optimization problems,
but may find less than- optimal solutions for some instances of other problems
• key advantage: Greedy algorithms are usually faster, since they don't consider the details of possible
alternatives
Example: A skier skiing downhill on a mountain wants to get to the bottom as quickly as possible
• The greedy-algorithm approach will be to always have the skies pointed towards the largest downhill
slope (dy/dx), at all times
Deterministic Algorithm
• An algorithm whose behavior can be completely predicted from the inputs
• That is, each time a certain set of input is presented, the algorithm gives the same results as any other
time the set of input is presented
Randomized Algorithm
Any algorithm whose behavior is not only determined by the input, but also values produced by a random
number generator. These algorithms are often simpler and more efficient than deterministic algorithms for
the same problem. Simpler algorithms have the advantages of being easier to analyze and implement.
These algorithm work for all practical purposes but have a theoretical chance of being wrong. Either in
the form of incorrect results or in the form of impractically long running time
REPRESENTING ALGORITHMS
Generally, SW developers represent them in one of three forms:
– Pseudo code
– Flowcharts
– Actual code
a) Pseudo Code
Language that is typically used for writing algorithms
Similar to a programming language, but not as rigid
he method of expression most suitable for a given situation is used:
– At times, plain English
– At others, a programming language like syntax
6
Example
Example
Example
b) Program Flowchart
A graphical representation of a process (e.g. an algorithm), in which graphic objects are used to indicate
the steps & decisions that are taken as the process moves along from start to finish. Individual steps are
represented by boxes and other shapes on the flowchart, with arrows between those shapes indicating the
order in which the steps are taken.
Symbols of program flowchart
7
Examples
Example
QUESTIONS
Write pseudo code and draw flowcharts
Task 1
Input the beginning and ending electric meter reading for a month in kilowatt hours (KWH). Compute
charge based on rate of 6
cents per KWH for the first 1000 KWH used and 8 cents per KWH for excess over 1000 KWH. If less
than 500 KWH was used,
output a message congratulating customer on conserving energy.
Get beginning
Get end
Set total to end - beginning
8
If total <= 1000 then
Set cost to total * 0.06
Else
Set cost to 1000 * 0.06 + (total - 1000) * 0.08
Output cost
If total < 500 then
Output "Thanks for conserving energy"
Stop
Task 2
Alice is thinking of a number between 1 and 10. Bob has three chances to guess it correctly. If he does,
Alice gives him Kshs. 500
and the game is over. If he does not, he has to give his sister Kate Kshs. 600. Write an algorithm to get
Bob's guess(es), determine
the outcome for each guess and output appropriate messages. Alice's number is fixed at the beginning of
the program.
1. Set Target to 7, Guesses to 0, and BobCorrect to false.
2. While Guesses < 3 and BobCorrect is false, Do step 3 through step 9
3. Get BobGuess.
4. Add 1 to Guesses.
5. If BobGuess equals Target then
6. Output "Bob gets a cash!"
7. Set BobCorrect to true.
8. Else
9. Output "Sorry Bob."
10. If BobCorrect is false, then
11. Output "Three strikes and you're out. Give your sister Kshs. 600."
12. Stop
Task 3
Kimani is thinking of a number between 1 and 10. Two brothers, Ben and Sam, take turns guessing what
it is until one guesses correctly. First correct guesser gets a price and then the game is over. Write an
algorithm to get Ben and Sam's guesses, determine the outcome for each guess and output appropriate
messages. Kimani's number is fixed at the beginning of the program.
1. Set target to 7 and guessed to false.
2. While guessed is false, Do step 3 through step 14
3. Get Ben Guess
4. If Ben Guess equals target then
5. Output "Bob gets the price!"
6. Set guessed to true.
7. Else
8. Output "Sorry Ben."
9. Get Sam Guess.
10. If Sam Guess equals target then
11. Output "Sam gets the price!"
12. Set guessed to true.
13. Else
14. Output "Sorry sam."
15. Stop
9
ARRAYS
The array is the most commonly used data storage structure; it’s built into most programming languages.
Because arrays are so
well known, they offer a convenient jumping off place for introducing data structures and for seeing how
programming and data
structures relate to one another.
Properties of arrays
Is physically sequential in that the data objects are stored in consecutive memory locations.
Has a fixed size in that its size can neither be increased nor reduced though the number of items it
contains can vary
Is homogeneous in that they are made up of objects that are all the same type.
Its elements have the random-access property i.e. time it takes to access one object in the structure does
not depend on what object in the structure had been accessed previously.
An array type is appropriate for representing an abstract data type when the following conditions are
satisfied:
The data objects in the abstract data type are composed of homogeneous objects
The solution requires the representation of a fixed, predetermined number of objects
Multidimensional arrays
Multidimensional arrays can be described as arrays of arrays. For example, a two-dimensional array
consists of a certain number of rows and columns:
A one-dimensional array is usually processed via a ‘for’ loop. Similarly, a two-dimensional array may be
processed with a nested for loop.
Arrays: Disadvantages
1. The size of the array is fixed —
Most often this size is specified at compile time with a simple declaration such as in the example above.
With a little extra effort, the size of the array can be deferred until the array is created at runtime, but after
that it remains fixed.
2. Because of (1), the most convenient thing for programmers to do is to allocate arrays which seem
"large enough" (e.g. the 100 items). Although convenient, this strategy has two disadvantages:
10
-most of the time there are just 20 or 30 elements in the array and 70% the space in the array really is
wasted.
-If the program ever needs process more than 100 scores, the code breaks.
3. Inserting new elements at the front or somewhere at the middle is potentially expensive because
existing elements need to be shifted over to make room.
Sample Code (C programming)
Getting the greatest value in the array
#include <stdio.h>
#define SIZE 5
int main()
{
int numbers[SIZE]={3,12,5,2,10};
int i;
int greatest;
greatest=numbers[0];
for(i=0;i<SIZE;i++)
{
if(numbers[i]>greatest)
{
greatest=numbers[i];
}
}
printf("\nThe greatest number in the array is %d\n",greatest);
return 0;
}
11
Application areas of arrays
Arrays are used to maintain multiple variables with the same name.
Used in mathematical problems like matrices, vectors etc.
They are used in the implementation of other data structures like linked lists, queues, stacks, trees etc.
Database records are usually implemented as arrays.
Used in the implementation of different algorithm e.g. sorting, searching
12
LISTS
Introduction
The most important concept related to lists is that of position. In other words, we perceive that there is a
first element in the list, a second element, and so on. We define a list to be a finite, ordered sequence of
data items known as elements. “Ordered” in this definition means that each element has a position in the
list.
Terminologies
A list is said to be empty when it contains no elements. The number of elements currently stored is called
the length of the list.
The beginning of the list is called the head, the end of the list is called the tail. There might or might not
be some relationship between the value of an element and its position in the list. For example, sorted lists
have their elements positioned in ascending order of value, while unsorted lists have no relationship
between element values and positions.
Why lists
Avoid the drawbacks of fixed size arrays with
1. Growable arrays
2. Linked lists
Growth Strategy
Double the size of the array every time is needed (I.e. capacity exceeded)
13
Operations of a list
Access the element in position k.
Insert new element into list at position k
Delete a specified k-th element (Deletion)
Combine two or more lists into a single list (Merging)
Divide the list into two or more lists
Sort the List according to the values of some field in the elements (Sorting)
Search for an element in which a certain field has a given value (Searching)
List insertion
_ The insertion operation requires that we insert a new element x at position i of the list.
_ To achieve this ai through an-1 are shifted one position up the list so that element ai moves to position
of ai +1 and an-1 to position of an
_ A total of ( n- i) )) elements are shifted.
_ An algorithm for the procedure can be represented in a flow chart
14
Insert Element x into position i in the list (V), n is the number of items in the list
15
Flow Chart: Delete an element from a list
Algorithm Steps
i and n are compared
if i >= n then error is flagged (the last element is at position n-1)
Else assign i to j (initial value of index j).
Next compare j and n-1 till j >= n-1 , end if j is greater or equal to n – 1 (we have shifted all elements)
Else shift V[j+1] to V[j] then increment j to j +1 and loop
Linked lists
The linked list is a very flexible dynamic data structure: items may be added to it or deleted from it at
will.
An array (linear list) allocates memory for all its elements lumped together as one block of memory.
In contrast, a linked list allocates space for each element separately in its own block of memory called
a "linked list element" or "node".
The list gets its overall structure by using pointers to connect all its nodes together like the links in a
chain.
Each node contains two fields: a "data" field to store whatever element type the list holds for its
client, and a "next" field, which is a pointer used to link one node _ to the next node.
16
Application areas of List data structure
1. Implementing other data structures e.g. stacks and queues
2. Representing the graphs data structure: Adjacency list representation of graphs is most popular which
is uses linked list to store adjacent vertices.
3. Implementing dynamic memory management functions of operating system
4. Circular linked list is used to implement functions that require round robin execution e.g. in operating
systems.
5. Doubly linked list is used in the implementation of forward and backward buttons in a browser
6. Allocation of space to files in the hard disk (Non-contiguous file allocation)
17
STACKS
Definition
A stack is a homogeneous collection of items of any one type, arranged linearly with access at one end
only, called the top. This means that data can be added or removed from only the top. It is a LIFO (Last In
First Out) list also known as FILO (First In Last Out).
Stacks are used when an element is not to be accessible by the index with pointer directly, as in an array,
but only through LIFO (Last in first out) mode through a stack-top pointer(SP). Each processor has at
least one stack pointer so that the instruction stack can be pointed and calling of the routines can be
facilitated
Stack Operations
The table below lists operations performed on a stack
Operation Description Requirement
Push This operation adds or pushes another item The number of items on the
onto the stack stack
is less than n.
Pop This operation removes an item from the The number of items on the
stack. stack must be greater than 0.
Top This operation returns the value of the item
at the top of the stack.
Is Empty This operation returns true if the stack is
empty and false if it is not.
Is Full This operation returns true if the stack is full
and false if it is not.
MakeNull This operation makes removes all elements from the
stack
18
The Stack Implemented as an Array
One of two ways to implement a stack is by using a one-dimensional array (also
known as a vector). When implemented this way, the data is simply stored in the
array. Top is an integer value, which contains the array index for the top of the
stack. Each time data is added or removed, top is incremented or decremented
accordingly, to keep track of the current top of the stack. By convention, an empty
stack is indicated by setting top to be equal to -1.
Stacks implemented as arrays are useful if a fixed amount of data is to be used. However, if the amount of
data is not a fixed size, or the amount of the data fluctuates widely during the stack's life time, then an
array is a poor choice for implementing a stack. For example, consider a call stack for a recursive
procedure.
First, it can be difficult to know how many times a recursive procedure will be called, making it
difficult to decide upon array bounds.
Second, it is possible for the recursive procedure to sometimes be called a small number of times,
called a large number of times at other times.
An array would be a poor choice, as you would have to declare it to be large enough that there is no
danger of it running out of storage space when the procedure recurses many times. This can waste a
significant amount of memory if the procedure normally only recurses a few times.
19
The deleted element Y may be printed out
20
Stack Applications
Very useful for finding palindromes (reading the same backward or forward) e.g. MADAM, CIVIC,
RADAR, ROTOR, LEVEL,
Reversing Data: We can use stacks to reverse data (example: files, strings)
Stacks are used for the undo buttons in various software. The recent most changes are pushed into the
stack. Even the back button on the browser works with the help of the stack where all the recently
visited web pages are pushed into the stack.
Number system conversion e.g. Decimal to Binary
Expression Evaluation: Stack is used to evaluate prefix, postfix and infix expressions.
Stack data structure is used for evaluating the given expression. For example, consider the following
expression
5 * (6 + 2) - 12 / 4
Since parenthesis has the highest precedence among the arithmetic operators, (6 +2 ) = 8 will be
evaluated first. Now, the expression becomes
5 * 8 - 12 / 4
* and / have equal precedence and their associativity is from left-to-right. So, start evaluating the
expression from left-to-right.
5 * 8 = 40 and 12 / 4 = 3
Now, the expression becomes
40 - 3
And the value returned after the subtraction operation is 37.
Expression Conversion: An expression can be represented in prefix, postfix or infix notation. Stack
can be used to convert one form of expression to another.
Memory Management
The assignment of memory takes place in contiguous memory blocks. We call this stack memory
allocation because the assignment takes place in the function call stack. The size of the memory to be
allocated is known to the compiler. When a function is called, its variables get memory allocated on
the stack. When the function call is completed, the memory for the variables is released. All this
happens with the help of some predefined routines in the compiler. The user does not have to worry
about memory allocation and release of stack variables.
int main()
{
// All these variables get memory
// allocated on stack
int f;
int a[10];
int c = 20;
int e[n];
}
The memory to be allocated to these variables is known to the compiler and when the function is
called, the allocation is done and when the function terminates, the memory is released.
21
Backtracking: Suppose we are finding a path for solving maze problem. We choose a path and after
following it we realize that it is wrong. Now we need to go back to the beginning of the path to start
with new path. This can be done with the help of stack. The data structure can be used for the
allocation of memory for most of the backtracking problems.
Parenthesis Checking: Stack is used to check the proper opening and closing of parenthesis.
Given an expression, you have to find if the parenthesis is either correctly matched or not. For
example, consider the expression (a + b) * (c + d).
In the above expression, the opening and closing of the parenthesis are given properly and hence it is
said to be a correctly matched parenthesis expression. Whereas, the expression, (a + b * [c + d) is not
a valid expression as the parenthesis are incorrectly given.
Function Call: Stack is used to keep information about the active functions or subroutines.
Syntax Parsing: Many compilers use a stack for parsing the syntax of expressions, program blocks
etc. before translating into low level code.
22
QUEUES
Definition
It is a linear list in which all insertions and deletions are restricted: All insertions into the queue take place
at one end called the rear while all deletions take place at the other end called the front. A queue stores a
set of elements in a particular order.
Operations
makeNull(q) – makes a queue empty and returns an empty queue
front(q) – returns the first element on a queue
enqueue(x,q) – inserts element q at the end of the queue
dequeue(q)– deletes the first element element of the queue
Empty(q) – returns true if the queue is empty.
23
Queues deletion (dequeue); Example
- Deletion of an element
- For this, it’s imperative to check whether queue is empty or not.
R= Rear points to rear Element
F= Front will point to position after first element (after the deletion).
M= size of the array
N= No of elements
Queues Applications
1. When a resource is shared among multiple consumers. Examples include CPU scheduling,
Disk Scheduling, Customer support
2. When data is transferred asynchronously (data not necessarily received at same rate as sent)
between two processes. Examples include IO Buffers, pipes, file IO, etc.
3. Operating systems:
Semaphores
FCFS (First come first serve) scheduling, example: FIFO queue
Spooling in printers
Buffer for devices like keyboard
4. In Networks:
Queues in routers/ switches
Mail Queues
5. Queue are used to maintain the play list in media players in order to add and remove the songs from
the play-list.
Exercise
Explain the variations of a queue given below
Circular queue
Double-ended queue (de-queue)
Priority queue
24
TREE DATA STRUCTURE
Definition
A tree is a nonlinear hierarchical data structure that consists of nodes connected by edges.
A relationship exists between the number of nodes and edges in a tree as shown below
A non-linear structure in which components may have more than one predecessor and more than one
successor is called a graph. A tree thus is a special case of a graph.
Significance of trees
Other data structures such as arrays, linked list, stack, and queue are linear data structures that store data
sequentially. In order to perform any operation in a linear data structure, the time complexity increases
with the increase in the data size. But, it is not acceptable in today's computational world.
Different tree data structures allow quicker and easier access to the data as it is a non-linear data structure.
Root: It is the top most node of a tree. There is only one root per tree and one path from the root node
to any node.
Parent: Any node except the root node has one edge upward to a node called parent.
Child: The node below a given node connected by its edge downward is called its child node.
Siblings: Nodes which belong to same Parent
25
Subtree: represents the descendants of a node. In a tree data structure, each child from a node forms a
subtree recursively. Every child node will form a subtree on its parent node.
Path: the sequence of Nodes and Edges from one node to another node
Degree of a node: the total number of children of a node
Degree of the Tree: The highest degree of a node among all the nodes.
26
Depth of a node: It is the number of edges from the root to the node.
Depth of the tree: the total number of edges from root node to a leaf node in the longest path
Height of a Node: It is the number of edges from the node to the deepest leaf (i.e. the longest path
from the node to a leaf node).
Height of a Tree: Itis the height of the root node or the depth of the deepest node.
Visiting: refers to checking the value of a node when control is on the node.
Traversing: means passing through nodes in a specific order.
Types of trees
1. General tree: It is a tree data structure where there are no constraints on the hierarchical structure. A
node can have any number of children.
27
2. Binary tree
It is a tree data structure in which a node can have at most two child nodes (children).
Usage
Used by compilers to build syntax trees.
Used to implement expression parsers and expression solvers.
Used to store router-tables in routers.
The search operation in a binary tree is faster as compared to other trees
Only two traversals are enough to provide the elements in sorted order
It is easy to pick up the maximum and minimum elements
Graph traversal also uses binary trees
Converting different postfix and prefix expressions are possible using binary trees
28
Example: Balanced binary tree
Usage
Used to implement simple sorting algorithms.
Can be used as priority queues.
Used in many search applications where data are constantly entering and leaving.
4. AVL tree
It is a self-balancing binary search tree. This is the first tree introduced which automatically balances its
height. It is named after inventors Adelson-Velsky and Landis.
Properties
Self-balancing.
Each node stores a value called a balance factor which is the difference in height between its left
subtree and right subtree.
All the nodes must have a balance factor of -1, 0 or 1.
After performing insertions or deletions, if there is at least one node that does not have a balance factor of
-1, 0 or 1 then rotations should be performed to balance the tree (self-balancing).
29
Usage
Used in situations where frequent insertions are involved.
Used in Memory management subsystem of the Linux kernel to search memory regions of processes
during preemption.
5. Red-black tree
A red-black tree is a self-balancing binary search tree, where each node has a colour; red or black. The
colours of the nodes are used to make sure that the tree remains approximately balanced during insertions
and deletions.
Properties
Self-balancing.
Each node is either red or black.
The root is black (sometimes omitted).
All leaves (denoted as NIL) are black.
If a node is red, then both its children are black.
Every path from a given node to any of its leaf nodes must go through the same number of black
nodes.
Usage
As a base for data structures used in computational geometry.
Used in the Completely Fair Scheduler used in current Linux kernels.
Used in the epoll system call implementation of Linux kernel.
30
BINARY TREES TRAVERSAL
Traversal is the process of visiting every node once. Visiting a node entails doing some processing at that
node, but when describing a traversal strategy, we need not concern ourselves with what that processing
is. Often, we want to determine the nodes of a tree and their relationship. We can do this by walking
through the tree in a prescribed order and visiting the nodes as they are encountered. This process is
called tree traversal.
Exercise
Show how the tree given can be traversed using
a) Preorder
b) Inorder
c) Postorder
d) Level order
Solutions
Preorder:
1 3 5 4 6 7 8 9 10 11 12
Inorder:
4 5 6 3 1 8 7 9 11 10 12
Postorder:
4 6 5 3 8 11 12 10 9 7
Levelorder
1 3 7 5 8 9 4 6 10 11 12
31
Exercises
For each of the trees given below how each can be traversed using
a) Preorder
b) Inorder
c) Postorder
d) Levelorder
Tree 1
Tree 2
Tree 3
32
GRAPHS
A graph consists of vertices (nodes) and edges connecting the vertices (G=(V,E)). Graphs usually have a
shape dictated by physical problems. For instance, nodes may represent cities, while edges may represent
airline flight routes between cities. Another more abstract example is a graph representing individual
tasks necessary to complete a project.
In the graph, nodes may represent tasks, while directed edges (with an arrow at one end) indicate which
task must be complete before another. Two vertices are said to be adjacent to one another if they are
connected by a single edge. In figure 1, vertices C and E are adjacent.
Adjacent vertices are also said to be neighbors
Figure 1: Graph
A path is a sequence of edges in Figure 1, ABCEF is a path. There can be more than one path between
two vertices.
Types of graphs
Connected vs Non-connected
A graph is said to be connected if there is at least one path from every vertex to every other vertex. A
non-connected graph consists of several connected components.
Directed vs Non-directed
Non-directed Graphs: The edges for these graphs do not have a direction. One can go either way on
them. Directed Graphs: used to model situations in which one can go in only one direction along an
edge, such as a one-way street. Such a graph is said to be directed. The allowed direction is typically
shown with an arrow-head at the end of the edge. In figure 2, vertex A is adjacent to C but not vice-
versa.
33
Representing a Graph in a Program
We need to represent both the vertices and the edges. Nodes/Vertices can be represent using an array. In
this sense, nodes have to be numbered from node 0, to node n-1 (where n is the number of nodes) Node 0
is stored at position 0 in the array, node 1 at position 1, until node n-1 at position n.
Each node will be represented with a structure that is appropriate to store the information associated with
a node e.g. A record /Structure (in C) or An object (object-oriented languages)
Adjacency Matrix
The adjacency matrix is a two-dimensional array in which the elements indicate whether an edge is
present between two vertices. If a graph has n vertices, then the adjacency matrix is n×n array.
Entry i,jj=0…n-1, i=0…n-1 is 1 if there is an edge going from node i to node j. the entry is zero
otherwise
For a non-directed graph, entry i,j=entry j,i
Entry i,i can be set as 0 or 1 as is convenient.
The adjacency list can easily be adapted to represent weighted graphs. The weight for an edge is stored
together with the corresponding vertex entry on the adjacency list associated with a node.
34
Examples
TRAVERSING A GRAPH
One of the most fundamental graph problems is to traverse every edge and vertex in a graph. Applications
include
Printing out the contents of each edge and vertex
Counting no of edges
Identifying connected components of a graph
For correctness we must do the traversal in a systematic way so that we don’t miss anything.
For efficiency we must make sure we visit each edge at most twice
35
Traversal techniques are determined by the data structure used and they include:
1. Depth First Search (DFS)
Uses the stack data structure
By storing the vertices in a Last in First out (LIFO) stack we explore the vertices moving along a path
constantly visiting new neighbours if one is available and backing up only if we are surrounded by
previously discovered vertices.
Example (Starting from node A)
Monitor
A -> B, D, G
B -> E, F
E - > B, G
G - > A, E [pop G]
E - > B, G[pop E]
F - > B, C, D
C -> F, H
H - > C [pop H]
C - > F, H [pop C]
F -> B, C, D
D-> A, F [pop D]
F -> B, C, D [pop F]
B -> E, F [pop B]
A - > B, D, E,G [pop A]
H
G C C
E E E F F F
B B B B B B B B
STACK A A A A A A A A A
RESULT A AB ABE ABEG ABEG ABEG ABEGF ABEGFC ABEGFCH
C D
F F F F
B B B B B B
STACK A A A A A A
RESULT ABEGFCH ABEGFCH ABEGFCHD ABEGFCHD ABEGFCHD ABEGFCHD
36
2. Breadth First Search (BFS)
Uses a queue data structure
Stores the vertices in a First In, First Out (FIFO) queue and explores the oldest unexplored vertices first.
Thus, explorations radiate out slowly form starting vertex.
Consider node A
QUEUE B BD BDG
RESULT AB ABD ABDG
Dequeue B and consider it next
QUEUE DGE DGEF
RESULT ABDGE ABDGEF
Dequeue D and consider it next
QUEUE GEF
RESULT ABDGEF
Dequeue G and consider it next
QUEUE EF
RESULT ABDGEF
Dequeue E and consider it next
QUEUE F
RESULT ABDGEF
Dequeue F and consider it next
QUEUE C
RESULT ABDGEFC
Dequeue C and consider it next
QUEUE
RESULT ABDGEFCH
BFS => ABDGEFCH
EXERCISES
Use DFS to traverse the graph given below Use BFS to traverse the graph given
below
SOLUTION
SOLUTION: s,a,b,c,e,
37
MINIMUM SPANNING TREE
Given a connected, undirected graph, a spanning tree of that graph is a subgraph that is a tree and
connects all the vertices together. A single graph can have many different spanning trees. We can also
assign a weight to each edge, which is a number representing how unfavorable it is, and use this to assign
a weight to a spanning tree by computing the sum of the weights of the edges in that spanning tree. A
minimum spanning tree (MST) or minimum weight spanning tree is then a spanning tree with weight less
than or equal to the weight of every other spanning tree.
One example would be a cable TV company laying cable to a new neighborhood. If it is constrained to
bury the cable only along certain paths, then there would be a graph representing which points are
connected by those paths. Some of those paths might be more expensive, because they are longer, or
require the cable to be buried deeper; these paths would be represented by edges with larger weights. A
spanning tree for that graph would be a subset of those paths that has no cycles but still connects to every
house. There might be several spanning trees possible. A minimum spanning tree would be one with the
lowest total cost.
A graph with its two minimum spanning trees
Example: A graph
The graph has two minimum-cost spanning trees, each with a cost of 6:
38
ALGORITHMS
Common algorithms include:
1. Searching Algorithms
Computer systems are often used to store large amounts of data from which individual records must
be retrieved according to some search criterion. Thus, the efficient storage of data to facilitate fast
searching is an important issue. Some search algorithms:
Sequential search
A simple search algorithm – compare the target key to the key of records one by one starting from the
first record.
We keep on comparing each element with the element to search until the desired element is found or
list ends. Easy to implement for Array and Linked List
Binary search
In binary search, we first compare the key with the item in the middle (after sorting them) position of
the array.
If there's a match, we can return immediately. If the key is less than the middle key, then the item
sought must lie in the lower half of the array; if it's greater than the item sought must lie in the
upper half of the array.
So, we repeat the procedure on the lower (or upper) half of the array.
2. Sorting algorithms
Sorting rearranges the elements into either ascending or descending order within the array. Examples
of sort algorithms include: Selection Sort, Bubble Sort, Insertion Sort, Quick Sort, Merge Sort and
Heap Sort.
a) Selection Sort
Divides the array into two parts: already sorted, and not yet sorted. On each pass, finds the smallest of
the unsorted elements, and swaps it into its correct place, thereby increasing the number of sorted
elements by one.
36 24 10 6 12
6 24 10 36 12
6 10 24 36 12
6 10 12 36 24
6 10 12 24 36
Bubble Sort
Compares neighboring pairs of array elements, starting with the last array element, and swaps
neighbors whenever they are not in correct order. On each pass, this causes the smallest element to
“bubble up” to its correct place in the array.
Pass 1
36 24 10 6 12
36 24 10 6 12
36 24 6 10 12
36 6 24 10 12
6 36 24 10 12
Pass 2
6 36 24 10 12
6 36 24 10 12
6 36 10 24 12
6 10 36 24 12
6 10 36 24 12
Pass 3
6 10 36 12 24
6 10 12 36 12
39
6 10 12 36 12
6 10 12 36 12
6 10 12 36 12
Process continues until the array is sorted
Insertion Sort
If the first few objects are already sorted, an unsorted object can be inserted in the sorted set in proper
place. This is called insertion sort. An algorithm consider the elements one at a time, inserting each in
its suitable place among those already considered (keeping them sorted). Insertion sort is an example
of an incremental algorithm; it builds the sorted sequence one number at a time.
36 24 10 6 12
24 36 10 6 12
10 24 36 6 12
6 10 24 36 12
6 10 12 24 36
Quick Sort
Quick sort is a divide-and-conquer method for sorting.
It works as follows:
Selection: A pivot element is selected.
Partition: Place all of the smaller values to the left of the pivot, and all of the larger values to the
right of the pivot. The pivot is now in the proper place.
Recur: Recursively sort the values to the left and to the right of the pivot.
Example
Select the pivot.
Continue recursively
Finish
40