Data structure and
Algorithm
Course code: HeIn3003, CrHr: 3, Lecture + Practical
Program: Health Informatics
Instructor:
Degif Teka (PhD), Assistant Professor,
Department of Computer Science
Email: [email protected]
Course objectives
• Evaluate algorithms and data structures in terms of time and memory
complexity of basic operations.
• Analyze complexity algorithm
• Explain the basic techniques for the design and analysis of efficient and
effective Algorithm
• Understand linear and non-linear data structures and relevant standard
algorithms for them.
• Implement linked lists, stacks, queues, trees, and graphs
• Know the Big O, Omega, and Theta notations and their usage to give
asymptotic upper, lower, and tight bounds on time and space complexity of
algorithms.
• Describe and use simple and advanced sorting and searching algorithms
Outline
• Chapter 1: Introduction to Data Structures and Algorithms
• The concept of data type
• Data objects
• Structured data objects
• Constructs for structured data objects
• Abstraction
• Data abstraction
• Abstract data types
• Algorithm basics
• Characteristics of algorithms
• Algorithm complexity
• Time and Space complexity
… Outline
• Chapter 2: Algorithm Analysis
• Introduction
• Algorithm Efficiency
• Quantitative Analysis
• Asymptotic Analysis
• Best, Worst, and Average-Case Complexity
• Chapter 3: Linear Data structures
• Linked lists
• Stacks, Queues,
… Outline
• Chapter 4: Non-linear data structures
• Trees
• Binary Tree
• Operations on Binary Tree
• Graphs
• Graph data structure
• Graph Representation
• Graph Traversals
• DFS , BFS
… Outline
• Chapter Five: Recursive Algorithms
• Recursive definitions
• Function calls and recursive implementation
• Tail recursion, Non-tail recursion
• Indirect recursion, Nested recursion
• Chapter Six: Simple Sorting and Searching Algorithms
• Simple Sorting Algorithms
• Insertion sort
• Selection Sort
• Bubble sort
• Simple Searching algorithm
• Sequential Search (Linear search)
• Binary Search
… Outline
• Chapter Seven: Advanced Sorting and Searching Algorithms
• Shell sort
• Quick sort
• Heap sort
• Merge sort
• Hashing
• Assessment
• Test1 15%
• Course Project 20%
• Practical/ Laboratory exam 15%
• Final exam 50%
• References:
• Algorithms and Data Structures: The Science of Computing by
Baldwin/Scragg. Charles River Media. 2004.
• Jean Paul Tremblay, Paul G. Soreson, “An Introduction to Data Structures with
Applications”, Mc. Graw Hill Computer Science Series (Text book)
Chapter 1
Introduction to Data Structures and Algorithms
Introduction
• Data structures are the fundamental building blocks of computer
programming.
• They define how data is organized, stored, and manipulated within a
program.
• Understanding data structures is very important for developing
efficient and effective algorithms.
• A data structure is a storage that is used to store and organize data. It
is a way of arranging data on a computer so that it can be accessed
and updated efficiently.
• It is also used for processing, retrieving, and storing data.
• There are different basic and advanced types of data structures
… Introduction
… Data structures
• Linear Data Structure: Data structure in which data elements are
arranged sequentially or linearly, where each element is attached to
its previous and next adjacent elements, is called a linear data
structure.
• Example: Array, Stack, Queue, Linked List, etc.
• Static Data Structure: Static data structure has a fixed memory size. It
is easier to access the elements in a static data structure.
• Example: array.
… Data structures
• Dynamic Data Structure: In dynamic data structure, the size is not
fixed. It can be randomly updated during the runtime which may be
considered efficient concerning the memory (space) complexity of the
code.
• Example: Queue, Stack, etc.
• Non-Linear Data Structure: Data structures where data elements are
not placed sequentially or linearly are called non-linear data
structures. In a non-linear data structure, we can’t traverse all the
elements in a single run only.
• Examples: Trees and Graphs.
Data Structures
• Computer is a programmable electronic device.
• To use the Computer we need to write programs.
• A program is written in order to solve a problem.
• A solution to a problem actually consists of two things:
• A way to organize the data
• Sequence of steps to solve the problem (Algorithms discussed above)
• The way data is organized in a computer’s memory is said to be Data
Structure and the sequence of computational steps to solve a
problem is said to be an algorithm.
• Therefore, a program is nothing but data structures plus algorithms.
• Program = Data structure + Algorithm
Abstraction
• Given a problem, the first step to solve the problem is obtaining ones
own abstract view, or model, of the problem.
• This process of modeling is called abstraction.
… Abstraction
• The model defines an abstract view to the problem.
• This implies that the model focuses only on problem related stuff and
that a programmer tries to define the properties of the problem.
• These properties include:
• The data which are affected and
• The operations that are involved in the problem.
• With abstraction you create a well-defined entity that can be properly
handled.
• These entities define the data structure of the program.
• An entity with the properties just described is called an abstract data
type (ADT).
ADT
• An ADT consists of an abstract data structure and operations.
• Put in other terms, an ADT is an abstraction of a data structure.
• The ADT specifies:
• What can be stored in the Abstract Data Type
• What operations can be done on/by the Abstract Data Type.
• For example, if we are going to model employees of an organization:
• This ADT stores employees with their relevant attributes and discarding
irrelevant attributes.
• This ADT supports hiring, firing, retiring operations.
Data structures
• A data structure is a language construct that the programmer has
defined in order to implement an abstract data type.
• There are lots of formalized and standard Abstract data types such as
Stacks, Queues, Trees, etc.
• Do all characteristics need to be modeled?
• Not at all
• It depends on the scope of the model
• It depends on the reason for developing the model
• Abstraction is a process of classifying characteristics as relevant and
irrelevant for the particular purpose at hand and ignoring the
irrelevant ones.
…Cont’d
• Applying abstraction correctly is the essence of successful
programming
• How do data structures model the world or some part of the world?
• The value held by a data structure represents some specific characteristic of
the world
• The characteristic being modeled restricts the possible values held by a data
structure
• The characteristic being modeled restricts the possible operations to be
performed on the data structure.
• Exercise
• Arrays and linked lists are basic data structures that are used as a building
block for other complex data structures like stack and queue. List the
advantages and disadvantages of these two basic data structures.
Chapter 2: Algorithm
Analysis
Introduction
• Algorithm analysis refers to the process of determining how much
computing time and storage that algorithms will require.
• In other words, it’s a process of predicting the resource requirement
of algorithms in a given environment.
• In order to solve a problem, there are many possible algorithms.
• One has to be able to choose the best algorithm for the problem at
hand using some scientific method.
• To classify some data structures and algorithms as good, we need
precise ways of analyzing them in terms of resource requirement.
… Introduction
• The main resources are:
• Running Time
• Memory Usage
• Communication Bandwidth
• The most important resource is running time in most problem domains.
Algorithms
• An algorithm is a well-defined computational procedure that takes
some value or a set of values as input and produces some value or a
set of values as output.
• Data structures model the static part of the world. They are
unchanging while the world is changing.
• In order to model the dynamic part of the world we need to work
with algorithms.
• Algorithms are the dynamic part of a program’s world model.
• An algorithm transforms data structures from one state to another
state in two ways:
• An algorithm may change the value held by a data structure
• An algorithm may change the data structure itself
… Algorithms
• The quality of a data structure is related to its ability to successfully
model the characteristics of the world.
• Similarly, the quality of an algorithm is related to its ability to
successfully simulate the changes in the world.
• However, independent of any particular world model, the quality of
data structure and algorithms is determined by their ability to work
together well.
• Generally speaking, correct data structures lead to simple and
efficient algorithms and correct algorithms lead to accurate and
efficient data structures.
Properties of an algorithm
• Finiteness: Algorithm must complete after a finite number of steps.
• Definiteness: Each step must be clearly defined, having one and only
one interpretation. At each point in computation, one should be able to
tell exactly what happens next.
• Sequence: Each step must have a unique defined preceding and
succeeding step. The first step (start step) and last step (halt step) must
be clearly noted.
• Feasibility: It must be possible to perform each instruction.
• Correctness: It must compute correct answer for all possible legal
inputs.
• Language Independence: It must not depend on any one programming
language.
… Properties of an algorithm
• Completeness: It must solve the problem completely.
• Effectiveness: It must be possible to perform each step exactly and in
a finite amount of time.
• Efficiency: It must solve with the least amount of computational
resources such as time and space.
• Generality: Algorithm should be valid on all possible inputs.
• Input/Output: There must be a specified number of input values, and
one or more result values.
Algorithm Efficiency
• At the design stage of solving a particular problem, there are two
conflicting goals. These are:
1. To design an algorithm that is easy to understand, code and design.
(In terms of software engineering or qualitative aspects of
algorithm.) This goal is the concern of software engineers.
2. To design an algorithm that makes efficient use of the computer
resources such as CPU and memory. (In terms of hardware.) This is
a factor of time and space and results a quantitative analysis of
algorithm. This goal is the concern of data structure and algorithm
analysis
Techniques of Analysis
• There are two approaches to measure the efficiency of algorithms:
• Empirical Analysis:
• Programming competing algorithms and trying them on different
instances.
• However, it is difficult to use actual clock-time as a consistent measure of
an algorithm’s efficiency, because clock-time can vary based on many
things. For example,
• Specific processor speed
• Current processor load
• Specific data for a particular run of the program
• Input size,
• Input properties
• Operating environment
… Techniques of Analysis
• Accordingly, we can analyze an algorithm according to the number of
operations required, rather than according to an absolute amount of
time involved.
• This can show how an algorithm’s efficiency changes according to the
size of the input.
• Theoretical Analysis:
• Involves the process of mathematically determining the quantity of
resources required (such as execution time, memory space, etc.) needed by
each algorithm.
Qualitative Versus Quantitative Analysis of Algorithmes
• Qualitative Analysis
• A good algorithm should have the following qualities:
1) Simple but powerful.
2) Easily understandable.
3) Easily modifiable.
4) Correct in all cases.
5) Well documented
a. Internal documentation (Comments)
b. External documentation (User Manual)
6) Modular.
Quantitative Analysis
• Complexity Analysis is the systematic study of the cost of
computation, measured either in time units or in operations
performed, or in the amount of storage space required.
• The goal is to have a meaningful measure that permits comparison of
algorithms independent of operating platform.
• There are two things to consider:
• Time Complexity: Determine the approximate number of operations required
to solve a problem of size n.
• Space Complexity: Determine the approximate memory required to solve a
problem of size n.
… Cont’d
• Complexity analysis involves two distinct phases:
• Algorithm Analysis
• Analysis of the algorithm or data structure to produce a function T (n) that
describes the algorithm in terms of the operations performed in order to
measure the complexity of the algorithm.
• There is no generally accepted set of rules for algorithm analysis.
• However, an exact count of operations is commonly used.
Heuristics for analyzing algorithm code
• Assume an arbitrary time unit.
• Execution of one of the following operations takes time 1:
• Assignment Operation
• Single Input/Output Operation
• Single Boolean Operations
• Single Arithmetic Operations
• Function Return
• Running time of a selection statement (if, switch) is the time for the
condition evaluation + the maximum of the running times for the
individual clauses in the selection.
… Heuristics for analyzing
algorithm code
• Running time for a loop is equal to the running time for the
statements inside the loop multiplied by number of iterations.
• The total running time of a statement inside a group of nested
loops is the running time of the statements multiplied by the
product of the sizes of all the loops.
• For nested loops, analyze inside out.
• Always assume that the loop executes the maximum number of
iterations possible.
• Running time of a function call is 1 for setup + the time for any
parameter calculations + the time required for the execution of the
function body.
… Cont’d
• Example 1 Time Units to Compute
1 for the assignment statement:
int count(){ 1 for the output statement.
int k=0; 1 for the input statement.
In the for loop:
cout<< “Enter an integer”; 1 assignment, n+1 tests, and n increments.
cin>>n; n loops of 2 units for an assignment, and an addition.
for (i=0;i<n;i++) 1 for the return statement.
k=k+1;
return 0;
}
T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6
… Cont’d
• Example 2
int total(int n) {
int sum=0;
T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4 = O(n)
for (int i=1;i<=n;i++)
sum=sum+1;
return sum;
}
• Example 3
int sum (int n) {
int partial_sum = 0;
for (int i = 1; i <= n; i++)
partial_sum = partial_sum +(i * i * i);
return partial_sum;
} T (n)= 1+(1+n+1+n)+4n+1 = 6n+4 = O(n)
… Cont’d
• Example 4
void func() {
int x=0;
int i=0;
int j=1;
cout<< “Enter an Integer value”;
cin>>n;
while (i<n){
x++; T (n)= 1+1+1+1+1+n+1+2n+n+n-1 = 5n+5
i++;
}
while (j<n) {
j++;
}
}
Asymptotic Analysis
• The performance of an algorithm is related to a function T(n) of the
information that must be processed.
• The smaller the value of T(n) for a given n, the better the algorithm
performance.
• Complexity Analysis: rate at which storage or time grows as a
function of the problem size: describes the inherent complexity of a
program, independent of machine and compiler
• can be described as a simple proportionality to some known function
•
… Asymptotic Analysis
• Asymptotic analysis is a method of describing the limiting behavior of
an algorithm.
• It refers to solving of the problem approximately up to such
equivalence.
• It is concerned with how the running time of an algorithm increases
with the size of the input in the limit, as the size of the input increases
without bound.
• There are five notations used to describe a running time function.
• These are:
… Asymptotic notations
• Big-Oh Notation (O)
• Big-Omega Notation (Ω)
• Theta Notation (Θ)
• Little-o Notation (o)
• Little-Omega Notation (ω)
The Big-Oh Notation
• Big-Oh notation is a way of comparing algorithms and is used for
computing the complexity of algorithms; i.e., the amount of time that
it takes for computer program to run.
• It’s only concerned with what happens for very a large value of n.
• Therefore, only the largest term in the expression (function) is
needed.
• Big-O notation: a function f(n) is of order (or has complexity) O(g(n))
if and only if there exist constants n0 > 0 and c > 0 such that
f(n) ≤ c x g(n) for all n > n0
… The Big-Oh Notation
• For example, if the number of operations
in an algorithm is n2 – n, n is insignificant
compared to n2 for large values of n.
Hence the n term is ignored.
• Example:
• 1<=n for all n>=1
• n<=n2 for all n>=1
• 2n <=n! for all n>=4
• log2n<=n for all n>=2
• n<=nlog2n for all n>=2
… The Big-Oh Notation
• Examples
1. f(n)=10n+5 and g(n)=n. Show that f(n) is O(g(n)).
To show that f(n) is O(g(n)) we must find and show that constants c and
k such that f(n) <=c.g(n) for all n>=k
Or 10n+5<=c.n for all n>=k
Try c=15. Then we need to show that 10n+5<=15n
Solving for n we get: 5<5n or 1<=n.
So f(n) =10n+5 <=15.g(n) for all n>=1.
(c=15,k=1).
… The Big-Oh Notation
• 2. f(n) = 3n2 +4n+1. Show that f(n)=O(n2).
• 4n <=4n2 for all n>=1 and 1<=n2 for all n>=1
3n2 +4n+1<=3n2+4n2+n2 for all n>=1
• <=8n2 for all n>=1
So we have shown that f(n)<=8n2 for all n>=1
Therefore, f (n) is O(n2) (c=8,k=1)
Typical Orders
• Comparing functions
Properties of Big Oh
Properties of Big Oh
Theta Notation ( Θ-Notation)
• A function f (n) belongs to the set of Θ (g(n)) if there exist positive
constants c1 and c2 such that it can be sandwiched between c1.g(n)
and c2.g(n), for sufficiently large values of n
• Formal Definition: A function f (n) is Θ (g(n)) if it is both O( g(n) ) and
Ω ( g(n) ).
• In other words, there exist constants c1, c2, and k >0 such that c1.g
(n)<=f(n)<=c2. g(n) for all n >= k
• If f(n)= Θ (g(n)), then g(n) is an asymptotically tight bound for f(n).
• In simple terms, f(n)= Θ (g(n)) means that f(n) and g(n) have the same
rate of growth.
… Theta Notation ( Θ-Notation)
• Example
• If f(n)=2n+1, then f(n) = Θ (n)
• f(n) =2n2 then
f(n)=O(n4)
f(n)=O(n3)
f(n)=O(n2)
• All these are technically correct, but the last expression is the best
and tight one. Since 2n2 and n2 have the same growth rate, it can be
written as f(n)= Θ(n2).
Best, Worst, and Average-Case Complexity
• We can say that we are looking for the most suitable algorithm for a
specific purpose.
• For this, we need to analysis the algorithm under specific constraints.
• An algorithm can be analysed under three specific cases:
• Best case analysis
• Average case analysis
• Worst case analysis
Best case analysis
• We analyze the performance of the algorithm under the
circumstances on which it works best.
• In that way, we can determine the upper-bound of its performance.
• However, you should note that we may obtain these results under
very unusual or special circumstances and it may be difficult to find
the optimum input data for such an analysis.
• The best case complexity of the algorithm is the function defined by
the minimum number of steps taken on any instance of size n.
Average case analysis
• This gives an indication on how the algorithm performs with an
average data set.
• It is possible that this analysis is made by taking all possible
combinations of data, experimenting with them, and finally averaging
them.
• However, such an analysis may not reflect the exact behavior of the
algorithm you expect from a real-life data set.
• Nevertheless, this analysis gives you a better idea how this algorithm
works for your problem.
• The average-case complexity of the algorithm is the function defined
by an average number of steps taken on any instance of size n.
Worst case analysis
• In contrast to the best-case analysis, this gives you an indication on
how bad this algorithm can go, or in other words, gives a lower-bound
for its performance.
• Sometimes, this could be useful in determining the applicability of an
algorithm on a mission-critical application.
• However, this analysis may be too pessimistic for a general
application, and even it may be difficult to find a test data set that
produces the worst case.
• The worst case complexity of the algorithm is the function defined by
the maximum number of steps taken on any instance of size n.
Exercises
• Determine the run time equation and complexity of each of the
following code segments.
1. for (i=0;i<n;i++)
for (j=0;j<n; j++)
sum=sum+i+j;
2. int k=0;
for (int i=0; i<n; i++)
for (int j=i; j<n; j++)
k++;